GenoMS : Sequencing whole proteins using a template databases


Examples

Included with this release is a small collection of example data.
These are contained in the ./Examples/ directory. Below are examples of
common usage of GenoMS using the example data of GenoMS on the example data.
Before running these examples you must have installed PepNovo and InsPect
(See Installation). The examples below are show for use on a MAC or Unix/Linux.
The Windows usage will use '\' instead of '/'. In all cases, it is important
to use the full path to the input data.

The Examples directory contains several subdirectories containing the sample data.

  1. DB - Contains a database consisting of 2 files (TestSequence.1
    and TestSequence.2), and a genomic sequence TestGenomeSeq.fasta.
    The file extensions of the TestSequence database tell GenoMS the
    order for the sequences (the sequences in TestSequence.1 precede
    the sequences in TestSequence.2 in the final protein reconstruction).
    The sequences within each file are also mutually exclusive, so only 1
    sequence can be used as a template from TestSequence.1.
  2. Spectra - contains the 2 spectrum files for the examples.
    The protease used is clearly noted in the spectrum file names.
  3. InspectFiles - currently empty
  4. PRMs - currently empty

Example 1: Using the genome to create templates

For the first example we will use a nucleotide sequence as the input
database. Let's generate the PRM spectra and Inspect files before running GenoMS.

First we will generate the PRM spectrum, and put them in the directory
PRMs using the script GenerateAllPRMs.jar

    java -jar GenerateAllPRMs.jar -r ~/GenoMS/Examples/Spectra -w ~/GenoMS/Examples/PRMs
        -x path-to-pepnovo-executable -m -path-to-pepnovo-model-dir -c 57
        

Be patient, generating the PRM spectra can take several minutes. This
command created the PRM spectrum files in the PRMs directory. Next we will
generate the Inspect input files using CreateInspectInputFiles.jar

    java -jar CreateInspectInputFiles.jar -r ~/GenoMS/Examples/Spectra
        -w ~/GenoMS/Examples/InspectFiles -c 57
        
Next we create the configuration file for GenoMS.

    java -jar CreateConfigFile.jar -s ~/GenoMS/Examples/Spectra -x path-to-inspect-directory
        -o ~/GenoMS/Examples/Example1Config.in -g ~/GenoMS/Examples/DB/TestGenomeSeq.fasta -m C+57
        -p ~/GenoMS/Examples/PRMs -i ~/GenoMS/Examples/InspectFiles
        

This script created the configuration file ./Examples/Example1Config.in.
You could always add or remove options from the configuration file before
running GenoMS. Just pay attention to the required info described in Usage.
The next step is to run GenoMS. From the GenoMS directory, run

    java -jar GenoMS.jar -i ~/GenoMS/Examples/Example1Config.in -o ~/GenoMS/Examples/Example1Output.txt -x -p
        

The GenoMS may take about an hour to run depending on the size of the
input (for the example it should only take about 5 minutes). It will generate 3 result files; Example2Output.txt, Example2Output.csv,
Example2Output.Details.txt. The details of each of the files is explained
in Analysis.

Example 2: Generating PRMs and Inspect parameter files on the fly with GenoMS

For some experiments, the PRM spectra can be generated on the fly by
GenoMS (by calling PepNovo). As mentioned in the Preprocessing
document, this is useful if all of the spectra were generated on the same
instrument and the same protease was used (or the protease is clearly
noted in the spectrum file name).

Since we do not need to create PRM spectra or Inspect parameter files,
the first step is to create a config file. From the GenoMS directory, run

    java -jar CreateConfigFile.jar -s ~/GenoMS/Examples/Spectra -x path-to-inspect-directory
        -o ~/GenoMS/Examples/Example2Config.in -n ~/GenoMS/Examples/DB/TestSequence -m C+57
        -f path-to-pepnovo-model-directory -r path-to-pepnovo-executable
        

The next step is to run GenoMS. From the GenoMS directory, run

    java -jar GenoMS.jar -i ~/GenoMS/Examples/Example2Config.in -o ~/GenoMS/Examples/Example2Output.txt -x -p
        

GenoMS generated the same results files as before. It also generated PRM spectrum files and Inspect input files in the
Spectra directory. In the DB directory, you will find a prepared database
TrieSequence_combined.fasta (and the InsPecT specific file formats .trie and
.index) and the corresponding constraint file.For future runs, it is handy
to reuse these files to save time.

I hope these examples have been helpful. Good luck!