GenoMS : Sequencing whole proteins using a template databases

Preprocessing of MS/MS spectra

Overview of steps:

Cluster MS/MS spectra (optional)
Convert MS/MS spectra to PRM spectra (optional)
Create parameter files for InsPecT (optional)

Clustering of MS/MS spectra (Optional)

MS/MS spectral datasets often contain redundant spectra from the same
peptide species. Individually, these spectra may contain many noise peaks
or may miss peaks for expected ion types. In order to decrease the noise
and increase the signal for expected ions, redundant spectra can be clustered
into a consensus spectrum. Not only does clustering reduce the number of
spectra to be analyzed, but also increases their quality and their probability
of identification.

MSCluster is a tool which can simultaneously filter the spectra for quality
and cluster them into a reduced collection of consensus spectra. If you choose
to use clustering, the resulting consensus spectra can be processed as regular
MS/MS spectra in the rest of the GenoMS analysis. More details of MSCluster
can be found here

Converting MS/MS spectra to PRM spectra (Optional)

MS/MS spectra contain ions produced by both suffix and prefix fragments
of the peptide species. In contrast, Prefix Residue Mass (PRM) spectra
only contain masses of prefix fragments of the peptide. For the MS/MS
spectra to be analyzed by GenoMS, they must first be converted to PRM
spectra. PepNovo+ can be used to generate PRM spectra from the MS/MS
spectra. Each PRM also has a corresponding score which is the normalized
log likelihood of the PRM being correct. More details on PepNovo+ can be
found here

GenoMS can generate the PRM spectra on the fly. But for repeated runs
it is more efficient to generate the PRM spectra once. The GenoMS package
provides a tool, GenerateAllPRMs.jar, for automatically running
PepNovo+ and generating PRM spectra. Also, if the parameters are different
for different spectrum files (e.g. LTQ versus Orbitrap) it is best to run
GenerateAllPRMs.jar on each set of files separately

    Usage: java -jar GenerateAllPRMs.jar

                -r DIR The directory containing the MS/MS spectrum files
                -w DIR the directory to write the PRM spectrum files
                -x FILE The full path to the PepNovo executable
                -m DIR Path to the PepNovo model directory
                [-y NUM Specifies the parent mass tolerance (in Daltons).  Default is 3.0 Da]
                [-x NUM Specifies the fragment mass tolerance (in Daltons).  Default is 0.5 Da]
                [-t FILE A file containing a list of spectra to use.  The
                    default is to run PepNovo+ on all spectrum files in the
                    input directory.]
                [-c NUM The mass of the cysteine protecting group (default 0)]
                [-p Tryptic digest was used on the samples.  Default is to guess from the file names.]

Creating InsPect parameter files (Optional)

Every spectrum will need to be searched against the database by InsPecT. This
means that every spectrum file must have a file that specifies the search
parameters to InsPecT. These files can be generated on the fly by GenoMS,
and it is recommended to do so if the spectra were all generated by the same
instrument and either all use the same protease or have the protease clearly
noted in the spectrum file name.

In most cases, however, you will have heterogenous spectrum files requiring
different search parameters. You can generate these parameter files separately
for each experimental condition using the included script CreateInspectInputFiles.jar.

    Usage: java -jar CreateInspectInputFiles.jar
        [REQUIRED]:
        -r [DIR] Directory containing spectra to search
        -w [DIR] Directory to write input files to

        [OPTIONAL]:
        -m [NUM] Parent mass tolerance in Daltons (default: 3.0 Da)
        -f [NUM] Fragment mass tolerance in Daltons (default: 0.5 Da)
        -c [NUM] Specifies mass of protecting group on Cysteines (default is 57.0 Da)
        -p [PROTEASE] Specify the protease, (Trypsin,Chymotrypsin,or None).
                      Default is to guess from the file names (contain 'chymo' or 'try')
        -v Run in verbose mode