MSPLIT

Contacts

Jian Wang [jiw006 (at) ucsd.edu]

Summary

Most computational methods make the assumption that each MS/MS spectrum comes from one peptide while there are numerous situations where one MS/MS spectra contain fragment ions corresponding to two or more peptides. Examples include co-eluting peptides in complex samples, spectra genreated from data-independent method and spectra from peptides with complex PTMs (e.g. SUMOylation). M-SPLIT is a spectral-library search tool for identification of mixture spectra of up to two peptides.

In our approach, a mixture spectrum M is modelled as a linear combination of two spectra: A + aB, where A and B are single-peptide spectra from two different peptides and a indicates their relative abundances. The program selects the A + aB combination with highest cosine similarity to M by using branch-and-bound filtration techniques. Using branch-and-bound technique, M-SPLIT is able to identify the correct matches by considering only a minuscule fraction of all possible matches, and simultaneously, reliably quantify the relative abundances of co-eluting peptides. The performance of our approach varies with a, but is able to select the correct peptides for 89-99% of all cases with a varying from 0.1 to 1.0.

Documentation

  • Spectral LibraryDownload the spectral library from NIST (or any source).
    M-SPLIT supports spectral libraries in .msp and .sptxt format.Use SpectraST to generate a decoy spectral library. This allows the use of the target/decoy strategy to choose a scoring threshold and estimate FDR after searching with M-SPLIT. One of the outputs from SpectraST will be a .sptxt format library file. Note that M-SPLIT does not require the decoy library, but without the decoy library it cannot estimate FDR.
  • M-SPLITAfter the spectral library and decoy library are constructed, run M-SPLIT as follows:
    Usage: java -Xmx800M -jar MSPLIT_v1.0.jar
    	LibraryFile
    	QueryFile
    	PrecursorMassTolerance
    	[OutputFile]

    Example command line:

    java -Xmx800M -jar MSPLIT_v1.0.jar NIST_e_coli.msp in.mgf 2.0 out.txt

    This will search the library and find the pair of spectra in the library that best matches to the query spectrum.

    Precursor mass tolerance is in unit Da. Usually one should use a relatively large tolerance, such as 2.0 Da, to allow for the identification of mixture spectra even if the query is high accuracy MS data.

    If the output file parameter is omitted, then the output will be written to standard out.

  • spectrumMatchClassifyAfter the search, M-SPLIT uses a SVM to determine whether a match is significant. SVM classification is now done using the svm-light package, which can be downloaded at http://svmlight.joachims.org/. Once downloaded, place the appropriate binaries (svm_learn and svm_classify) in the “svm_light_linux” or “svm_light_windows” folder under the M-SPLIT installation.Use the spectrumMatchClassify.pl script to perform the classification as well as to estimate FDR using the target/decoy method. Run the script as follows:
    Usage: spectrumMatchClassify.pl
    	SearchResultFile
    	FilteredOutputFile
    	FDR

    FDR is in fraction, i.e. to enforce 1% FDR use 0.01.

  • OutputOutputs are in tab-delimited format. Each column has the following meanings:We denote M as the query spectrum (putative mixture spectrum) and A and B as the pair of spectra best matched to M. In the case of mixture matches some result columns have two values, separated by a “!”.
    Column	Content
    1	Query spectrum file
    2	Scan number of query spectrum
    3	Peptide annotation
    4	Protein
    5	Charge of peptide
    6	cosine(M, A+B)
    7	cosine(M, A)[!cosine(M,B)] 
    8	cosine(A,B)
    9	alpha (estimated by optimal cosine)
    10	alpha (estimated by residual method)
    11	# of peaks that account of 85% of total intensity
    12	dot-bias(M, A+B) (see paper)
    13	dot-bias(M, A) [!dot-bias(M,B)]
    14	Projected-cosine(M,A+B)
    15	Projected-cosine(M,A) [!projected-cosine(M,B)
    16	mean cosine (of all candidates considered during the search)
    17	mean delta cosine (i.e. cosine(M,A+B)-cosine(M,A))
    18	Precursor m/z
    19	precursor m/z of A (!precursor m/z of B)
    20	Svm1-score (this score tells whether result is a match)
    21	Svm2-score (this score tells whether result is a mixture match)

Downloads

M-SPLIT

Publications

Peptide identification from mixture tandem mass spectra. Wang J, Pérez-Santiago J, Katz JE, Mallick P, Bandeira N. Mol Cell Proteomics. 2010 Jul;9(7):1476-85. doi: 10.1074/mcp.M000136-MCP201. Epub 2010 Mar 27.