Spectral Archives

This page contains the latest version of the supplemental material for the paper

"Spectral Archives: A Novel Approach To Analyzing Tandem Mass Spectra",

by Ari M. Frank, Matthew E. Monroe, Anuj R. Shah, Jeremy J. Carver, Nuno F. Bandeira,

Ronald J. Moore, Gordon R. Anderson, Richard D. Smith and Pavel A. Pevzner.

Software: MSCluster.zip (source code, manual, and Windows executable)

Spectra: short_peptides.zip (A collection of almost 60000 spectra of peptides length 4-6)

Web-server: We have implemented a web-server for querying spectral archives. It currently resides on a the developmental server of UCSD's Center for Computational Mass Spectrometry web-site. In the next release of the web-site, the spectral archives will be made available to all users.

There are currently 3 archives that can be queried:

- Shewanellas - Clusters generated from data of 3 Shewanella species from the PNNL dataset.
- A. thaliana - Clusters from spectra from A. thaliana dataset from the PNNL dataset.
- PNNLARCH - The whole PNNL archive (originally 1.18 billion spectra), reduced to ~280 million clusters.

Currently all 3 archives contain only unidentified clusters. Once the web-server is officially released, we plan to add curated archives that include confidently identified spectra that will offer library-like searches.

Querying the Spectral Archive:

� Follow this link to the CCMS website. You can create a user account and login, or run your jobs as a "guest".

� Select "Spectral Archives" from the "Tool" drop-down menu.

� Push the "select input files" button. This will open a new window with two tabs.

� Go to the "upload files" tab, press the "browse" button and select the spectra files to uploaded (only MGF and mzXML files are supported, zip files containing these formats can also be used).

� Press the "Select Input Files" tab. Open the library with the files (e.g., guest), and mark the files you want to search against the archives. Press the top button on the right (under the blue "i" symbol) this will transfer the file name to the right side of the window. After this is done with all input files, hit the "Finish Selection" button on the bottom right side of the window.

� Add a description for the dataset, and set other parameters like minimum p-value for a reported match (default 0.01), maximum number of results (default 10), and the email to which a notification can be sent when the run is complete.

� Finally, to submit the job, hit the "Search" button.

Note that the running time for the search depends on the archive being searched and the number of query spectra. While typical runs against Shewanellas and A. thaliana might require only minutes to complete, a search against PNNLARCH might take hours (and even a day for large inputs).

Spectral Archive Output: Once the run is complete, the search results can be on the web-server (by following the emailed link or from the job submission page). The results on the website show only the top match for each query spectrum and can be sorted according to a variety of fields (scan number, similarity, p-value, organism, etc.) The complete set of results (multiple matches per spectrum) can be downloaded in text format by hitting the "download" button above the results table. Note that while the web-results present only the most abundant organism in each cluster, the downloaded results contain the complete breakdown of the members of each cluster.