The computational mass spectrometry group, headed by Professors Vineet Bafna and Pavel Pevzner, focuses on developing algorithms to process mass spectrometry data.
In our lab we have developed a number of tools for computational proteomics. Each one has it's own
purpose and setting. These tools are free for download, or are also integrated into a
MixDB December 6, 2011
Most computational approaches makes the assumption that each MS/MS
spectrum comes from one peptide while there are numerous situations
where one MS/MS spectrum can contain fragment ions corresponding to
two or more peptides. Examples include mixture spectra from co-eluting
peptides in complex samples, spectra generated from data-independent
acquisition methods and spectra from peptides with complex PTMs and
cross-linked peptides. We propose a new database search tool
that is able to identify mixture MS/MS spectra from more than one
peptide. We show that peptides can be reliably identified with up to
95% accuracy from mixture spectra while considering only a small
fraction of all possible peptide pairs (speedup of four orders of
Resurrection of a clinical antibody
January 5, 2011
Using the tool, GenoMS, we were able to sequence a new mouse hybridoma
antibody directed against a member of the TNF-superfamily, lymphotoxin alpha
(LT-𝛂). Details of the protein sequencing effort can be found
GenoMSJune 1, 2010
We developed the tool, GenoMS, for sequencing small samples of proteins.
Protein sequence templates (i.e. proteins or genomic sequences that are
similar to the target protein) are identified using the database search tool
InsPecT. The templates are then used to recruit, align, and de novo sequence
regions of the target protein that have diverged from the database or are
missing. We used GenoMS to reconstruct the full sequence of an antibody by
using spectra acquired from multiple digests using different proteases.
Antibodies are a prime example of proteins that confound standard database
identification techniques. The mature antibody genes result from large-scale
genome rearrangements with flexible fusion boundaries and somatic
hypermutation. Using GenoMS we automatically reconstruct the complete
sequences of two immunoglobulin chains with accuracy greater than 98% using
a diverged protein database. Using the genome as the template, we achieve
accuracy exceeding 97%. More details can be found
Nonribosomal Peptides Dereplication and SequencingAugust 7, 2009
Nonribosomal peptides (NRPs) are of great pharmacological importance, but there is currently no technology for high-throughput NRP 'dereplication' and sequencing. We used multistage mass spectrometry followed by spectral alignment algorithms for sequencing of cyclic NRPs. We also developed an algorithm for comparative NRP dereplication that establishes similarities between newly isolated and previously identified similar but nonidentical NRPs, substantially reducing dereplication efforts. The homepage for this project can be found here
Arabidopsis ProteogenomicsJanuary 8, 2009
Our study of the Arabidopsis proteome through tandem mass spectrometry revealed over 18,000 novel peptides not in the TAIR7 genome annotation release. Using Inspect, we identified over 144,000 peptides from 3 sequence databases; the six-frame translation of the Arabidopsis genome, an exon-graph based on ab initio gene predictions, and the TAIR7 proteome. From the novel peptides we predicted over 700 new gene models and over 600 corrections to current gene models. The peptides and predicted models can be accessed
Multistage mass spectrometryMay 29, 2008
Multistage mass spectrometry (collecting multiple MS^3 spectra from each MS^2 spectrum) and accurate precursor masses
(but inaccurate fragment masses) have been demonstrated to lead to significant gains in peptide identification via database
search but have had a limited impact in de novo peptide sequencing. Our Multi-stage Spectral Networks package addresses both
of these in a rigorous probabilistic framework for analyzing spectra of overlapping peptides, resulting in both accurate de
novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS^3 spectra) and improved interpretation
of spectral networks. Additional details and the open-source package are available here
Phosphate Localization ScoreApril 4, 2008
Phosphate Localization Score is an algorithm which determines the confidence of the placement
of a phosphate on a given residue. This method is similar to the AScore, and is described in
Albuquerque et al., Mol Cell Proteomics 2008. The program is integrated with the Inspect package,
download available here
. A tutorial for using the
program is in the Inspect documentation, here
MS-DictionaryNovember 30, 2007
MS-Dictionary is a software to generate all plausible de novo interpretations of a tandem mass spectrum(spectral dictionary) and matches them against a protein database quickly. It enables proteogenomic searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches.
MS-GeneratingFunctionNovember 28, 2007
MS-GF is a software for computing the generating function of a tandem mass spectrum.The generating functions and their derivatives represent new features of tandem mass spectra that improve peptide identifications. Further, they enable one to rigorously compute error rates of peptide identifications and get better sensitivity-specificity trade-off of existing MS/MS search tools.
MS-ClusteringNovember 26, 2007
MS-Clustering is a new program aimed at improving the analysis large MS/MS datasets by removing many of their redundant or low quality spectra. MS-Clustering is capable of reducing the number of spectra submitted for analysis from a large 10+ million dataset by 90% while increasing the number peptide/protein identifications by up to 10%.
Spectral NetworksOctober 24, 2007
Spectral networks are a novel approach to the identification of MS/MS spectra
that detects and combines spectra from overlapping peptides or
modified variants of the same peptide. This approach allows for the
blind indentification of unexpected post-translational modifications and
highly modified peptides. The spectral networks software package is now
available in open-source and Windows-binary versions.
PepNovoOctober 4, 2007
A new version of PepNovo is released. It contains optional quality filtering and models for several MS instrument types.
Web serverJuly 25, 2007
The web server
hosting all of our software is up and running. Users may sign up
for an account and search spectra. The server posts jobs to a large compute grid.
Phosphorylation searchJuly 10, 2007
Inspect has been trained to score phosphorylated MS/MS spectra. The new scoring function
has been trained on LTQ machines, and works great.
HHMI suppports the Bioinformatics [Under]graduate Research Consortium in Comparative Proteogenomics at UCSD. Proteogenomics is a new research area that utilizes the whole genome MS/MS datasets to better characterize the genomic and proteomic annotations on a global scale. The consortium provides an opportunity to the undergraduate and fresh graduate students to get hands-on research experience with real and unsolved bioinformatics problems in this upcoming field. More information