PepNovo

ProteoSAFe    Download   Publications
Contact: Ari Frank

De novo sequencing of low precision MS/MS data

The standard tool for high throughput sequencing of MS/MS data is the database search (tools like Sequest, Mascot, and InsPecT). However, there are cases where the traditional database search cannot be used. For instance, to be able to offer the relevant candidate peptides for a database search, the target organism's genome must be sequenced. Though the number of sequenced genomes is constantly growing, many organisms are still not sequenced, and do not even have a sequenced close homologue. In addition, even if a genome is sequenced, all its alternative splice variants of genes also need to be known to be able to identify all peptides in a sample.

The de novo sequencing method is useful in the situations described above since it does not require any knowledge of the sequenced genome, rather it performs all its sequencing effort using only the information present in the mass spectrum itself. In addition de novo sequencing can serve as an independent verification stage for the database search results.

We developed PepNovo to serve as a high throughput de novo peptide sequencing tool for tandem mass spectrometry data. PepNovo typically runs in less than 0.2 seconds per spectrum. PepNovo uses a probabilistic network to model the peptide fragmentation events in a mass spectrometer. In addition, it uses a likelihood ratio hypothesis test to determine if the peaks observed in the mass spectrum are more likely to have been produced under the fragmentation model, than under a probabilistic model that treats the appearance of peaks as random events. In benchmark experiments, PepNovo was found to outperform several of leading de novo algorithms such as SHERENGA, Peaks, and Lutefisk.

Download

New PepNovo+ : de novo sequencing, quality filtering and MS-Blast query generation

Publications

Predicting Intensity Ranks of Peptide Fragment Ions
Frank, A.M.
J. Proteome Research, 8:2226-2240, 2009
Paper Pubmed

Ranking-Based Scoring Models for Peptide-Spectrum Matches
Frank, A.M.
J. Proteome Research, 8:2241-2252, 2009
Paper Pubmed

De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry
Ari M. Frank, Mikhail M. Savitski, Michael L. Nielsen, Roman A. Zubarev, and Pavel A. Pevzner
J. Proteome Res. 6:114-123, 2007.
Paper Pubmed

PepNovo: De Novo Peptide Sequencing via Probabilistic Network Modeling
Frank, A. and Pevzner, P.
Analytical Chemistry 77:964-973, 2005.
Paper Pubmed

Peptide sequence tags for fast database search in mass-spectrometry.
Frank A, Tanner S, Bafna V, Pevzner P.
J Proteome Res. 2005 Jul-Aug;4(4):1287-95.
Paper Pubmed

Peptide Sequence Tags for Fast Database Search in Mass Spectrometry.
Ari Frank, Stephen Tanner, and Pavel Pevzner.
Conference on Research in Computational Molecular Biology (RECOMB) 2005.