Contacts
Adrian Guthals [aguthals (at) cs.ucsd.edu]Summary
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Documentation
Quick Start Instructions
See the “Docs” directory in the download package for full documentation on executing the binaries and interpreting results. A heavily summarized version is given here, as well as default parameter files that should be used for running MetaSPS as described in the papers.
Execution: MetaSPS automatically generates a number of statically-named sub-directories and files from where ever it is run, so it is best to run from a clean directory by calling:
<installation dir>/bin/main_specnets <params file> [OPTIONS]
Call “<installation dir>/bin/main_specnets –help” for a list of available options. Besides invoking the “-lf” and “-ll” parameters for controlling log output, remaining default parameter values will run MetaSPS from start to finish. Input parameter files can further control execution, but the attached “MetaSPS_*” files contain all the default parameter values necessary so all that should be needed is to specify the path to the installation directory, input MS/MS spectra, database of contaminant and (optionally) homologous proteins, and peak/parent mass tolerances. Please use the appropriate default parameter file for the type of input spectra: “MetaSPS_IT_CID” is for CID MS/MS spectra where MS/MS fragments were collected in the Ion Trap (low-res), “MetaSPS_FT_CID” is for CID MS/MS spectra where MS/MS fragments were collected in the Orbitrap (high-res), and “MetaSPS_FT_HCD” is for HCD MS/MS spectra. ‘MetaSPS_FT_CID_HCD_ETD” is for processing paired or triplet CID/HCD/ETD scans as described in the JPR paper (see section “CID/HCD/ETD”).
Output: See the installation package for how to read the suite of output html reports. However, one can also convert the set of meta-contig de novo sequences to MGF format (as PRM spectra) by issuing the following command from the project directory with the supplied parameters file. The file “meta_contigs.mgf” will be output to the same directory.
<installation dir>/bin/main_execmodule ExecMergeConvert convertContigs.params
Execution of a SGE grid: If the number of input MS/MS spectra is much higher than 20,000, SPS may take days to compute the all-to-all alignment of spectra on a single thread. See the release documentation for how to run this step in parallel if you also have SGE installed on your system. If SGE is not available, you can also add the parameter PARTIAL_OVERLAPS=0 to the parameters file, which may sacrifice de novo sequencing coverage/length to reduce running time by orders of magnitude and make it more feasible to run the alignment step on a single thread.
CID/HCD/ETD
If you have paired CID/ETD, HCD/ETD, or triplet CID/HCD/ETD, the first step is to use the MetaSPS_FT_CID_HCD_ETD.params file. Inside that file, there is a parameter “NUM_CONSECUTIVE”, which must be set to 2 for paired CID/ETD or HCD/ETD, and set to 3 for triplet CID/HCD/ETD. The most critical aspect of processing this type of data is that the input spectra have properly assigned activation fields so MetaSPS knows which spectra are CID, HCD, or ETD. If you have mzXML-formatted spectra that are directly converted from .RAW (or some other vendor-specific format), those fields should be set. But if you inspect the mzXMLs and you do not see activationMethod set for ALL MS/MS scans (activationMethod=”CID”, activationMethod=”HCD”, activationMethod=”ETD”), then you can add in the fields yourself or use MGF format. If you use MGF format, then each spectrum must have a parameter named ACTIVATION set to “CID”, “HCD”, or “ETD” somewhere after each BEGIN IONS:
BEGIN IONS PEPMASS=617.80536 CHARGE=2+ MSLEVEL=2 ACTIVATION=ETD TITLE=Scan Number: 2 166.530609 171.586746 171.898941 194.881088 ... END IONS
Deployment Instructions
Unzip the Spectral Networks package to a directory <installation directory> (any directory of your choice). This directory should then contain the following directories:
- sps/bin – Contains the binary executables
- sps/cgi – Contains CGI scripts used by the program
- sps/Doc – Documentation
- sps/example – Test project
Web Server
SPS is a command line tool that outputs reports in HTML format. Report pages may be accessed using a web browser to render the generated HTML report files. These files should be made available by SPS using a web server such as Apache.
To enable interactivity in protein sequencing reports (see results documentation), there are several CGI scripts needed that should be included in the web server’s configuration file:
- <installation directory>/sps/cgi/specplot.cgi
- <installation directory>/sps/cgi/contplot.cgi
- <installation directory>/sps/cgi/spsReports.cgi
Configuration
The following changes should be made in the installed scripts:
- Edit <installation directory>/cgi/spsReports.cgi at line 12. The line should be:
- $ENV{‘LD_LIBRARY_PATH’} = “<installation directory>/sps/bin/libs”;
- Edit <installation directory>/cgi/spsReports.cgi at line 8. The line should be:
- $SPS_DIR = “<installation directory>/sps/”;
- Edit <installation directory>/cgi/specplot.cgi at line 32. The line should be:
- $ENV{‘LD_LIBRARY_PATH’} = “<installation directory>/sps/bin/libs”;
- Edit <installation directory>/cgi/specplot.cgi at line 27. The line should be:
- $TMP = “<TMP_DIRECTORY>”;
where TMP_DIRECTORY is a directory in the file system where the server process has write permissions
- $TMP = “<TMP_DIRECTORY>”;
- Edit <installation directory>/cgi/specplot.cgi at line 28. The line should be:
- $SPS_DIR = “<installation directory>/sps/”;
- Edit <installation directory>/cgi/contplot.cgi at line 32. The line should be:
- $ENV{‘LD_LIBRARY_PATH’} = “<installation directory>/sps/bin/libs”;
- Edit <installation directory>/cgi/contplot.cgi at line 27. The line should be:
- $TMP = “<TMP_DIRECTORY>”;
where TMP_DIRECTORY is a directory in the file system where the server process has write permissions
- $TMP = “<TMP_DIRECTORY>”;
- Edit <installation directory>/cgi/contplot.cgi at line 28. The line should be:
- $SPS_DIR = “<installation directory>/sps/”;
For Windows installation, <installation directory>/sps/bin must be added to the PATH environment variable.
Testing the Installation
SPS installation may be tested by downloading the SPS test package.
- Download the package
- Unzip it to <installation directory>
- Open a shell
- cd to <installation directory>/sps/example
- edit the sps.params file.
- EXE_DIR should point to <installation directory>/sps/bin (should be an absolute path).
- REPORT_DIR defines the output directory for report files, should be in the webserver path, allowing for report pages to be served by the webserver (e.g. Apache).
- GRID_SGE_EXE_DIR DIR should point to where SGE binaries are located (qstat, qsub, etc.).
- GRID_EXE_DIR should point to where SPS binaries (the same pointed by EXE_DIR) are seen on SGE.
- REPORT_SERVER should point to the server’s CGI directory. Example:
REPORT_SERVER=http://myserver.com/cgi-bin/ - REPORT_DIR_SERVER should point to the project directory on the Server.
- run go.sh on linux systems or go.bat in windows systems.
- From a webserver, open ‘<URL path in webserver>/index.html‘ which is located inside the specified report location directory, considering your webserver path specifications. The report initial page should be displayed.
Sample Report
Downloads
Installation Packages
- Linux 32-bit (72.6 MB)
- Linux 64-bit (73.3 MB)
- Windows 32/64-bit (97.8 MB)