MixDB

Download   Publications   Documentation
Contact: Jian Wang [jiw006 (at) ucsd.edu]

Summary

In high-throughput proteomics, the development of computational methods and novel experimental strategies and their application often rely on each other. However, most computational approaches still make the assumption that each MS/MS spectrum comes from one peptide while there are numeroussituations where one MS/MS spectrum can contain fragment ions corresponding to two or more peptides. Examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods and spectra from peptides with complex PTMs. We propose a new database search tool (MixDB) that is able to identify mixture MS/MS spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a small fraction of all possible peptide pairs (speedup of four orders of magnitude). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while being able to identify 20% more mixture spectra at significantly higher precision.

Documentation

  • spectrumMatchClassify

    Run MixDB as follows:
    java -Xmx1200M -jar MixDB.jar [fasta file] [query file]
    	[precursor mass tolerance] [outputfile]
    

    This will search the sequence database and find the pair of peptides that best matched to the query spectrum. Precursor mass tolerance is in unit Da. Usually one should use a relatively large tolerance like 3Da to allow for the identification of mixture spectra even if the query is on high accuracy MS data.

    After the search, MixDB uses a SVM to determine whether a match is significant.

    • SVM classification is done using the svm-light package. Please download the binaries at http://svmlight.joachims.org/. Then, put the appropriate binaries ("svm_learn" and "svm_classify") into the "svm_light_linux" or "svm_light_windows" folder, depending on your system.

    • Use the mixdbSVMClassify.pl script to perform the classification. Run the script as follows:
      ./mixdbSVMClassify.pl [search result file] [outputFile]
      Note: you might need to change the first line in the mixdbSVMClassify.pl script to specify the correct path for the svm_light binary.

  • Output

    Outputs are in tab-delimited format. Each column has the following meanings. We denote M as the query spectrum and A and B as the pair of peptides best matched to M. In the case of mixture matches some result columns have two values, separated by a "!".
    Column	Content
    1-6	Query spectrum scan number 
    7	Query spectrum precursor mass
    8	Precursor of A
    9	Precursor of B
    10	Peptide A (number after . is charge of peptide)
    11	Peptide B
    12	Protein name for peptide A
    13	Protein name for peptide B
    14	Raw score between M and A+B
    15	Raw score between M and A only
    16	Raw score between M and B only
    17	Raw score between M and A divided by length of A
    18	Raw score between M and B divided by length of B
    19	Total explained intensity by A and B in M
    20	Fraction of b presented in M for A
    21	Fraction of y presented in M for A
    22	Fraction of b presented in M for B
    23	Fraction of y presented in M for B
    24	Longest consecutive series of b for A
    25	Longest consecutive series of y for A
    26	Longest consecutive series of b for B
    27	Longest consecutive series of y for B
    28	Average fragment mass errors for A
    29	Average fragment mass error for B
    30	svm-score for matches, high score means at least peptide A or B is a significant match to M
    31	svm-score for mixture matches, higher score mean both A and B are significant matches to M
    

Downloads

MixDB v1.0

Publications

Peptide identification by database search of mixture tandem mass spectra.
Wang, J., Bourne, P. E., Bandeira, N.
Mol. Cell. Proteomics, 2011