Arabidopsis Proteogenomics

Download
Contact: Natalie Castellana [ncastell (at) ucsd.edu]

Summary

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides were determined using Inspect to search against 3 different representations of the genome; a six-frame translation, an exon splice-graph, and the currently annotated proteome. Using the gene finding program, AUGUSTUS, and our novel peptides that occurred in clusters we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models.


Download

  • The newly predicted gene models and the support each model received from ESTs, peptides, and current annotation can be downloaded here AUGUSTUS_Corrected_Genes.gff, AUGUSTUS_Novel_Genes.gff.

  • Tracks of the peptides (both novel and those confirming current models) can be uploaded to TAIR GBrowse for visualization.

  • GFF Formated files of all novel and TAIR peptides can be downloaded here.

  • The exon splice graph building and searching is a functionality uniquely provided by Inspect. The documentation can be found here.

  • The complete list of non-novel peptides, their mappings to TAIR9 genes, and supporting spectra can be accessed here.

Correction: 126,055 peptides are reported confirming TAIR proteins, however, 403 of these peptides are in fact derived from common contaminants such as trypsin or keratin. The number of TAIR peptides, therefore, is 125,652 and all other results are unaffected.

Latest Releases

ProteoSAFe

1.2.4

GenoMS

2012.01.10

Inspect, MS-Alignment

2012.01.09

Meta-SPS

2013.04.30

MixDB

2011.12.06

MS-Clustering

2011.03.27

MS-Dictionary

2007.11.30

MS-GappedDictionary

2011.06.15

MS-GeneratingFunction

2010.10.14

MS-GFDB

2012.06.07

MS-GF+

2013.04.10

M-SPLIT

2011.06.05

PepNovo

2012.04.23

Spectral Networks

Sept 2007

UniNovo

2013.03.10

Copyright Notice

 

Media Coverage


Nonribosomal Peptide Dereplication and Sequencing (Scientific American, Genetic Engineering News, Natural Products Industry Insider and Genome Web Daily News)

A powerful tool for PTM discovery (Jan 2008, Journal of Proteome research, Vol 7. Issue 1)

From spectral networks to shotgun sequencing (June 2007, Nature Methods, Vol. 4 No. 6)

Identifying peptides without a database (May 2007, Journal of Proteome Research)

UCSD Computer Scientist Wins Young Investigator Award, Research on Snake Venom Proteins Highlighted (Nov 2006, UCSD)