GenoMS

Download    Documentation    Publications
Contact: Natalie Castellana [ncastell(at)cs.ucsd.edu]

Database search algorithms are the primary workhorses for the identification of tandem mass spectra. However, these methods are limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. A variety of methods has been developed to search a spectrum against a sequence allowing for variations. Some tools determine the sequence of the homologous protein in the related species but do not report the peptide in the target organism. Other tools consider variations, including modifications and mutations, in reconstructing the target sequence. However, these tools will not work if the template (homologous peptide) is missing in the database, and they do not attempt to reconstruct the entire protein target sequence. De novo identification of peptide sequences is another possibility, because it does not require a protein database. However, the lack of database reduces the accuracy. We present a novel proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing. We used GenoMS to reconstruct the full sequence of an antibody by using spectra acquired from multiple digests using different proteases. Antibodies are a prime example of proteins that confound standard database identification techniques. The mature antibody genes result from large-scale genome rearrangements with flexible fusion boundaries and somatic hypermutation.

Documentation

Documentation is included in the software package, and is also available online.

Publications

Template Proteogenomics: sequencing whole proteins using an imperfect database.
NE Castellana, V Pham, D Arnott, JR Lill, V Bafna. (2010).
Mol. Cell. Proteomics, 9, 6:1260-70.

Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin alpha antibody.
NE Castellana, K McCutcheon, V Pham, K Harden, A Nguyen, J Young, C Adams, K Schroeder, D Arnott, V Bafna, JL Grogan, JR Lill. (2010).
Proteomics, epub ahead of print.

Latest Releases

ProteoSAFe

1.2.4

GenoMS

2012.01.10

Inspect, MS-Alignment

2012.01.09

Meta-SPS

2013.04.30

MixDB

2011.12.06

MS-Clustering

2011.03.27

MS-Dictionary

2007.11.30

MS-GappedDictionary

2011.06.15

MS-GeneratingFunction

2010.10.14

MS-GFDB

2012.06.07

MS-GF+

2013.04.10

M-SPLIT

2011.06.05

PepNovo

2012.04.23

Spectral Networks

Sept 2007

UniNovo

2013.03.10

Copyright Notice

 

Media Coverage


Nonribosomal Peptide Dereplication and Sequencing (Scientific American, Genetic Engineering News, Natural Products Industry Insider and Genome Web Daily News)

A powerful tool for PTM discovery (Jan 2008, Journal of Proteome research, Vol 7. Issue 1)

From spectral networks to shotgun sequencing (June 2007, Nature Methods, Vol. 4 No. 6)

Identifying peptides without a database (May 2007, Journal of Proteome Research)

UCSD Computer Scientist Wins Young Investigator Award, Research on Snake Venom Proteins Highlighted (Nov 2006, UCSD)