Software Tools & Datasets
MSPLIT: Tool web page ProteoSAFe workflowMixDB: Tool web page ProteoSAFe workflow
Key Publications
Despite substantial advances in the sensitivity of protein identification, only approximately 15% of peptides detectable in MS1 scans are targeted for MS/MS analysis in complex samples. The semi-stochastic nature of typically used Data Dependent Acquisition (DDA) workflows also leads to MS/MS sampling of different peptide subsets each time a sample is analyzed, leading to only 30% to 60% of identified peptides overlapping between technical replicates. This is particularly problematic as the focus in proteomics moves from cataloging lists of proteins to quantitatively measuring the dynamics of the proteome across many MS runs. Data Independent Acquisition (DIA) workflows aim to meet these challenges by generating multiplexed MS/MS spectra resulting from co-fragmentation of all peptide precursors in a predetermined m/z range, thus generating a nearly-complete MS/MS “record” of all peptides in a given sample. In difference from the nearly-ubiquitous assumption in computational methods that each MS/MS spectrum comes from only one peptide, this Technology Research and Development component focuses on the development of algorithms for peptide identification from multiplexed spectra.
Despite the growing importance and the enormous potential of multiplexed spectra, there is still a shortage of computational methods to analyze them. CCMS aims to address these challenges in identifying multiplexed spectra using two complementary approaches:
- MSPLIT spectral library search. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), MSPLIT is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1.Using theoretical bounds on spectral similarity, MSPLIT avoids the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrated that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries.
- MixDB database search . MixDB is a database search tool design to identify mixture tandem mass spectra from more than one peptide. MixDB can reliably identify peptides with up to 95% accuracy from mixture spectra while considering only 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with contemporary database search methods indicates that MixDB has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.