GenoMS produces 3 output files; the results file, the alignment (csv) file, and the details file.
The result file contans the final sequence(s) reconstruction. Regions
of the sequence that could be ordered but not merged are separated by '...'
For each predicted amino acid, or mass gap, the number of spectra supporting
that gap, and the number of spectra overlapping the gap are shown in the
subsequence lines (Sitewise Support and Sitewise Overlap). Our notion of support
is very conservative. A spectrum supports an amino acid or mass gap if it contains both
adjacent PRMs
# ----FINAL SEQUENCE INFO---- # Final Seq: DIVMSQSPSSLAVSVGEKVT ... NYLAWYQQKPGQSPK[226.045] # Sitewise Support: 2 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 # Sitewise Overlap: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 # FinalLen: 82
The alignment file shows the alignment of the final reconstructed sequence
to the database templates used. The best way to view this file is to use a
spreadsheet.
The first lines of the result file contain the list of input files used
and the parameters. These lines all begin with a '#'.
Next, the result file contains all of the details of the attempted
extensions. These appear in the following form:
(LEFT)Round: 0 for seed: SSQSLLYSTNQKNFLAWYQQKPGQSPKLLIYWASTRDSGVPDRFTGSGSGTDFTLTISSVKAEDLAVYYCQQYYSYPRTFGGGTKL
This line tells us that GenoMS is attempting to extend this seed sequence to
the left (towards the 5' end of the protein). The extension may proceed for
several rounds, but this is the first round. Next we might see
[0]:PRMSpectrumFile1.prm:43 Score: 102.17699999999999, AlignedPeaks: 302143 430193 517241 630327 743411 906476 993508 1094555 1208602 (A space-delimited list of PRMs scaled by 1000 Da) (A space-delimited list of PRM log-likelihoods scaled) [1]:/PRMSpectrumFile1.prm:153 Score: 74.02300000000001, AlignedPeaks: 170090 257122 344153 472217 559243 672328 785416 948478 1035502 1136559 (A space-delimited list of PRMs scaled by 1000 Da) (A space-delimited list of PRM log-likelihoods scaled) [2]:PRMSpectrumFile5.prm:195 Score: 72.055, AlignedPeaks: 128076 215109 302140 430198 517231 630315 743405 906469 993495 (A space-delimited list of PRMs scaled by 1000 Da) (A space-delimited list of PRM log-likelihoods scaled)
These lines indicate that 3 spectra were found to overlap the left end of
the seed sequence. They are listed in order of the strength of the alignment
to the seed sequence. The first line of each indicates the PRM file and scan
number of the spectrum. The second line indicates the score of the alignment,
and the specific peaks of the PRM spectrum which were aligned. The 3rd and
4th rows show the full PRM spectrum including the PRMs and the PRM scores.
If there are no spectra which overlap the seed sequence, then these lines
would not appear.
Once GenoMS has identified spectra which overlap the seed sequence, it
attempts to align them to the HMM. The next few lines of the result file
show the output of the final HMM after the spectra have been aligned to it.
SpectrumHMM Model: * M[0] - Mass:-128077.5 Overlap:2/3 LogOddsScore:3.885748146743283, SumScore:30.0, InsertsAfter: 1 * M[1] - Mass:0.0 Overlap:4/5 LogOddsScore:6.904684018837878, SumScore:61.202, InsertsAfter: 2 * M[2] - Mass:87032.4 Overlap:4/5 LogOddsScore:6.904684018837878, SumScore:49.9, InsertsAfter: 1 * M[3] - Mass:174063.7142857143 Overlap:5/5 LogOddsScore:6.941968467069715, SumScore:42.647999999999996, InsertsAfter: 2 * M[4] - Mass:302121.14285714284 Overlap:5/5 LogOddsScore:6.941968467069715, SumScore:55.234, InsertsAfter: 5 * M[5] - Mass:389156.2857142856 Overlap:5/5 LogOddsScore:6.941968467069715, SumScore:51.249, InsertsAfter: 3 * M[6] - Mass:502241.14285714284 Overlap:5/5 LogOddsScore:6.941968467069715, SumScore:49.699, InsertsAfter: 7 * M[7] - Mass:615328.0 Overlap:5/5 LogOddsScore:8.152710893403263, SumScore:57.512, InsertsAfter: 15 * M[8] - Mass:778391.5714285715 Overlap:5/5 LogOddsScore:9.363453319736813, SumScore:71.84700000000001, InsertsAfter: 16 * M[9] - Mass:865419.5714285714 Overlap:5/5 LogOddsScore:6.941968467069715, SumScore:47.128, InsertsAfter: 5 * M[10] - Mass:966470.8 Overlap:4/4 LogOddsScore:7.24861417052274, SumScore:49.357, InsertsAfter: 7 * M[11] - Mass:1080517.3333333333 Overlap:3/3 LogOddsScore:5.133775021308668, SumScore:42.479, InsertsAfter: 2
The HMM contains several states which correspond to masses in the
extension. Each state is listed in order of increasing mass in the result
file. Each state, written in the form M[i] for state i has an associated
mass (scaled by 1000 Da), and fraction of spectra which support it, 2 scores,
and the number of noise peaks in the aligned spectra which appear between
the states.
Once GenoMS learns this model, it can intrepret the extension.
ConsensusPRM: -128077 0 87032 174063 302121 389156 502241 615328 778391 865419 966470 1080517 Consensus: -128077 0 87032 174063 302121 389156 502241 615328 778391 865419 966470 1080517 De Novo: KSSQSIIYSTN, Merged: KSSQSIIYSTNQKNFIAWYQQKPGQSPKIIIYWASTRDSGVPDRFTGSGSGTDFTITISSVKAEDIAVYYCQQYYSYPRTFGGGTKI (A space-delimited list of PRMs scaled by 1000 Da) (A space-delimited list of PRM log-likelihoods scaled)
The final 4 lines of the extension show the consensus spectrum found from the
alignment of all of the overlaping spectra to the seed sequence. The consensus
PRM spectrum is interpreted using a de novo algorithm (shown here as KSSQSIIYSTN).
The de novo interpreted sequence is then merged with the original seed
sequence to produce a new seed sequence which has been extended by one.
The result file contains this information for every seed in both the left and
right extension directions. Some extensions may proceed for multiple rounds.
After all of the extensions, the results file contains the summary of the
final reconstruction.
# Templates 0:MyTemplateSequence seeds: # Template: DIVMSQSPSSLAVSVGEKVTMSCKSSQSLLYSTNQKNFLAWYQQKPGQSPKLLIYWASTRDSGVPDRFTGSGSGTDFTLTISSVKAEDLAVYYCQQYYSYPRTFGGGTKLEIK # Seed[0][0]: DIVMSQSPSSLAVSVGEKVTM -> DIVMSQSPSSLAVSVGEKVTM [0]: DIVMSQSPSSLAVSVGEK [1]: DIVMSQSPSSLAVSVGEK [2]: SLAVSVGEK [3]: VSVGEKVTM [4]: LAVSVGEKVTM [5]: DIVMSQSPSSL [6]: SQSPSSLAVSVGEK Peptides: SQSPSSLAVSVGEK,DIVMSQSPSSL,LAVSVGEKVTM,VSVGEKVTM,DIVMSQSPSSLAVSVGEK,SLAVSVGEK Total Peptides: 6
The summary shows the results by template. Above, the first selected template
(MyTemplateSequence) has a long sequence. However, only a small portion
(called a seed sequence) of the template was identifed as un-mutated from the
protein in the sample. The seed shown above was identified from 7 MS/MS spectra with 6 distinct peptide sequences.