This page contains the latest
version of the supplemental material for the paper
"Spectral Archives: A Novel
Approach To Analyzing Tandem Mass Spectra",
by Ari M. Frank, Matthew E.
Monroe,
Anuj R.
Shah,
Jeremy J.
Carver,
Nuno F.
Bandeira,
Ronald J. Moore, Gordon R.
Anderson,
Richard D.
Smith
and Pavel A.
Pevzner.
Software: MSCluster.zip (source
code, manual, and Windows executable)
Spectra:
short_peptides.zip
(A collection of almost 60000 spectra of peptides length
4-6)
Web-server:
We have
implemented a web-server for
querying spectral archives. It currently resides on a the developmental server
of UCSD's Center for Computational Mass Spectrometry web-site. In the next
release of the web-site, the spectral archives will be made available to all
users.
There
are currently 3 archives that can be queried:
- Shewanellas - Clusters generated from data of 3 Shewanella species from the PNNL dataset.
- A. thaliana -
Clusters from spectra from A. thaliana dataset from the PNNL dataset.
-
PNNLARCH - The whole PNNL archive (originally 1.18 billion spectra), reduced to
~280 million clusters.
Currently
all 3 archives contain only unidentified clusters. Once the web-server is
officially released, we plan to add curated archives
that include confidently identified spectra that will offer library-like
searches.
Querying
the Spectral Archive:
�
Follow
this link to the CCMS
website. You can create a user account and login, or run your jobs as a
"guest".
�
Select
"Spectral Archives" from the "Tool" drop-down menu.
�
Push the
"select input files" button. This will open a new window with two
tabs.
�
Go to
the "upload files" tab, press the "browse" button and select the spectra files
to uploaded (only MGF and mzXML files are supported,
zip files containing these formats can also be used).
�
Press
the "Select Input Files" tab. Open the library with the files (e.g., guest), and
mark the files you want to search against the archives. Press the top button on
the right (under the blue "i" symbol) this will
transfer the file name to the right side of the window. After this is done with
all input files, hit the "Finish Selection" button on the bottom right side of
the window.
�
Add a
description for the dataset, and set other parameters like minimum p-value for a
reported match (default 0.01), maximum number of results (default 10), and the
email to which a notification can be sent when the run is
complete.
�
Finally,
to submit the job, hit the "Search" button.
Note
that the running time for the search depends on the archive being searched and
the number of query spectra. While typical runs against Shewanellas and A. thaliana might require only minutes to
complete, a search against PNNLARCH might take hours (and even a day for large
inputs).
Spectral
Archive Output: Once the
run is complete, the search results can be on the web-server (by following the
emailed link or from the job submission page). The results on the website show
only the top match for each query spectrum and can be sorted according to a
variety of fields (scan number, similarity, p-value, organism, etc.) The
complete set of results (multiple matches per spectrum) can be downloaded in
text format by hitting the "download" button above the results table. Note that
while the web-results present only the most abundant organism in each cluster,
the downloaded results contain the complete breakdown of the members of each
cluster.