ProteoSAFeMain Page, Download, Copyright Notice
Although computational mass spectrometry (MS) has greatly boosted proteomics research, software for MS searches is generally underutilized. When using MS software, scientists face three obstacles: fiddly software chaining, time-consuming execution, and unfriendly user interfaces. Due to these obstacles, using such software becomes a laborious, tedious, and error-prone experience. To alleviate the struggling, some researchers bring in integrated software platforms that automate the worst of the whole process. However, MS searches grow complex quickly and so do obstacles, which grow faster than most platforms can accommodate and therefore we often end up tailoring them, but in vain. To address this problem, our ProteoSAFe Proteomics platform is Scalable in utilizing distributed computing, Accessible via reconfigurable, easy-to-learn user interfaces, and Flexible in tool chaining.
Bacterial Proteogenomic AnnotationMain Page, Download, Copyright Notice, Documentation
While bacterial genome annotations have significantly improved in recent years, techniques for bacterial proteome annotation (including post-translational chemical modifications, signal peptides, proteolytic events, etc.) are still in their infancy. The number of sequenced bacterial genomes is rising sharply, far outpacing our ability to validate the predicted genes or annotate bacterial proteomes. In this project, we use tandem mass spectrometry (MS/MS) to annotate the proteome of bacterial genomes and provide a comprehensive map of post-translational modifications. We also detect multiple genes that were missed or suggest corrections to improve the gene annotation.
Comparative Shotgun Protein SequencingMain Page, Download, Copyright Notice
De novo sequencing of monoclonal antibodies is an important step in the drug discovery process in when the cDNA or original cell line are not available, or when the characterization of unexpected post translational modifications is needed to verify the integrity of the antibody. Despite being time-consuming, the fifty year old technique of Edman degradation has remained the primary tool for de novo protein sequencing. Here we demonstrate that Shotgun Protein Sequencing (SPS), a recently developed approach employing tandem mass spectrometry, represents a fast and accurate protein analysis technique with the potential to dramatically reduce the reliance on Edman degradation in the studies of unknown proteins. We illustrate the application of SPS for sequencing monoclonal antibodies and introduce Comparative Shotgun Protein Sequencing (CSPS) to assemble multiple protein contigs into complete antibodies using related antibodies as templates. We estimate that CSPS leads to one-two orders of magnitude reduction in protein sequencing effort as compared to conventional Edman degradation approaches. Furthermore, rather than being hindered by post-translational modifications, this approach allows one to automatically discover unexpected modifications.
CycloquestMain Page, Download, Copyright Notice
Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization and different fragmentation patterns of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque. The method is available online at cyclo.ucsd.edu.
GenoMSMain Page, Download, Copyright Notice, Documentation
GenoMS is a tool for sequencing a small set of proteins using an imperfect database and tandem mass spectra. The database need not contain full protein sequence, but instead can contain exons or partial sequences. The database can also be a small region of the genome. Tandem mass spectra should be from overlapping peptides produced from digestion by multiple proteases.
InspectMain Page, Download, Copyright Notice, Documentation
Inspect is a general purpose database search algorithm, with an emphasis on efficiently and confidently identifying modified peptides. It includes special scoring models for phosphorylation which allow for increased accuracy. In addition, Inspect implements the MS-Alignment algorithm for discovery of unanticipated modifications in blind mode.
Meta-SPSMain Page, Copyright Notice
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our Shotgun Protein Sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But while SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Utilizing low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS datasets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
MixDBMain Page, Download, Copyright Notice
In high-throughput proteomics, the development of computational methods and novel experimental strategies and their application often rely on each other. However, most computational approaches still make the assumption that each MS/MS spectrum comes from one peptide while there are numeroussituations where one MS/MS spectrum can contain fragment ions corresponding to two or more peptides. Examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods and spectra from peptides with complex PTMs. We propose a new database search tool (MixDB) that is able to identify mixture MS/MS spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a small fraction of all possible peptide pairs (speedup of four orders of magnitude). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while being able to identify 20% more mixture spectra at significantly higher precision.
MS-Align+Main Page, Copyright Notice
MS-Align+ is a software tool for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. MS-Align+ is fast in identifying unexpected post-translational modifications. In addition, MS-Align+ reports statistical significance of top-down protein identifications.
MS-ClusteringMain Page, Download, Copyright Notice
Tandem mass spectrometry (MS/MS) experiments often generate redundant datasets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. The new version of MSCluster also supports the creation of spectral archives. For more details see downloadable zip file.
MS-Deconv is a software tool for top down spectral deconvolution. MS-Deconv uses a combinatorial algorithm. The algorithm first generates a large set of candidate isotopomer envelopes for a spectrum, then represents the spectrum as a graph, and finally selects its highest scoring subset of envelopes as a heaviest path in the graph. In contrast with other approaches, the algorithm scores sets of envelopes rather than individual envelopes.
MS-GappedDictionaryMain Page, Download, Copyright Notice
MS-GappedDictionary is a tool to generate the set of all plausible gapped peptides (i.e., de novo reconstructions with mass gaps) from MS/MS spectra. The generated set is called a Pocket Dictionary and consists of 25-100 gapped peptides. A constructed Pocket Dictionary can be used to filter databases with huge sizes (e.g., the six frame translation of the human genome) quickly. All database matches reported by MS-GappedDictionary are re-scored by MS-GF to retain only statistically significant PSMs. The current version (ver. 102010) only allows non-modified peptide searches. However, searches for modified/mutated peptides are under development and soon will be released.
MS-GeneratingFunctionMain Page, Download, Copyright Notice, Documentation
MS-GF is a software for computing the generating function of a tandem mass spectrum.The generating functions and their derivatives represent new features of tandem mass spectra that improve peptide identifications. Further, they enable one to rigorously compute error rates of peptide identifications and get better sensitivity-specificity trade-off of existing MS/MS search tools.
MS-GF+Main Page, Download, Copyright Notice
MS-GF+ is a successor of MS-GFDB. It supports the HUPO PSI standard input (mzML) and output (mzIdentML). Compared to early versions of MS-GFDB, MS-GF+ is much faster and works better for high-precision MS/MS spectra.
MS-GFDBMain Page, Download, Copyright Notice
MS-GFDB is an MS/MS database search tool that uses MS-GF p-values as a scoring function. MS-GFDB outperforms existing database search engines in the analysis CID and ETD spectra, and performs equally well on non-tryptic peptides (e.g. Lys-N peptides). MS-GFDB scoring parameters can be easily derived from a set of annotated MS/MS spectra.
MS Top DownDownload, Copyright Notice
Recent advances in mass spectrometry instrumentation, such as FT-ICR and OrbiTrap, have made it possible to generate high resolution spectra of entire proteins. While these methods offer new opportunities for performing "top-down" studies of proteins, the computational tools for analyzing top-down data are still scarce. MS-TopDown is a new algorithm for sequencing such data. It implements a version of the Spectral Alignment algorithm specially suited for the problem of identifying protein forms in top-down mass spectra (i.e., identifying the modifications, mutations, insertions and deletions). MS-TopDown can efficiently discover protein forms even in the presence of numerous modifications, and it can also recover positional isomers from spectra of mixtures of isobaric protein forms.
M-SPLITMain Page, Download, Copyright Notice
Most computational methods make the assumption that each MS/MS spectrum comes from one peptide while there are numerous situations where one MS/MS spectra contain fragment ions corresponding to two or more peptides. Examples include co-eluting peptides in complex samples, spectra genreated from data-independent method and spectra from peptides with complex PTMs (e.g. SUMOylation). M-SPLIT is a spectral-library search tool for identification of mixture spectra of up to two peptides.
In our approach, a mixture spectrum M is modelled as a linear combination of two spectra: A + aB, where A and B are single-peptide spectra from two different peptides and a indicates their relative abundances. The program selects the A + aB combination with highest cosine similarity to M by using branch-and-bound filtration techniques. Using branch-and-bound technique, M-SPLIT is able to identify the correct matches by considering only a minuscule fraction of all possible matches, and simultaneously, reliably quantify the relative abundances of co-eluting peptides. The performance of our approach varies with a, but is able to select the correct peptides for 89-99% of all cases with a varying from 0.1 to 1.0.
Multistage Sequencing of Cyclic PeptidesMain Page, Download, Copyright Notice
Some of the most effective antibiotics (e.g. Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of MS-based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. Our multistage cyclic peptide sequencing tool has shown its advantages over single-stage sequencing. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria. The method is available online at cyclo.ucsd.edu.
PepNovoMain Page, Download, Copyright Notice
PepNovo is a software tool for de novo sequencing of peptides from mass spectra. PepNovo uses a probabilistic network to model the peptide fragmentation events in a mass spectrometer. In addition, it uses a likelihood ratio hypothesis test to determine if the peaks observed in the mass spectrum are more likely to have been produced under the fragmentation model, than under a probabilistic model that treats the appearance of peaks as random events.
Shotgun Protein SequencingDownload, Copyright Notice, Documentation
Analysis of MS/MS spectra from multiple overlapping peptides opens the possibility of assembling MS/MS spectra into entire proteins, similar to teh assembly of DNA into genomes. This software recovers all or parts of the protein sequence through clustering, pairwise alighment, assembly and de novo interpretation of the input MS/MS spectra.
Spectral NetworksMain Page, Download, Copyright Notice, Documentation
Spectral networks are based on the idea of performing an MS/MS database search without comparing a spectrum against a database. Spectral newtorks capitalize on spectral pairs, which allow for the identification of prefix and suffix ladders and greatly reduce noise.
Splice Graph Proteomics Tools (Beta)Main Page, Download, Copyright Notice, Documentation
The advent of inexpensive RNA-Seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our tool addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads.
TremoloMain Page, Download, Copyright Notice, Documentation
Tremolo is a spectral library search tool that leverages the Spectral Library Generating Function (SLGF) concept to identify spectrum-spectrum matches (SSMs). The SLGF models the variability of replicate spectra as compared to reference library spectra. Given a similarity function (in our case cosine), SLGF yields an expected score distribution for each reference library spectrum. Tremolo is able to assign p-values to SSMs and it has been shown to increase the sensitivity of spectral library searches.
UniNovoMain Page, Download, Copyright Notice
UniNovo is a universal de novo peptide sequencing tool that works well for various types of spectra and spectral pairs (e.g., CID, ETD, HCD, CID/ETD, etc). The accuracy of de novo reconstructions generated by UniNovo is better than or comparable to PepNovo+ or PEAKS. Moreover, UniNovo also estimates the error rate of the reported reconstruction.
A powerful tool for PTM discovery (Jan 2008, Journal of Proteome research, Vol 7. Issue 1)
From spectral networks to shotgun sequencing (June 2007, Nature Methods, Vol. 4 No. 6)
Identifying peptides without a database (May 2007, Journal of Proteome Research)
UCSD Computer Scientist Wins Young Investigator Award, Research on Snake Venom Proteins Highlighted (Nov 2006, UCSD)