Arabidopsis Proteogenomics

Download
Contact: Natalie Castellana [ncastell (at) ucsd.edu]

Summary

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides were determined using Inspect to search against 3 different representations of the genome; a six-frame translation, an exon splice-graph, and the currently annotated proteome. Using the gene finding program, AUGUSTUS, and our novel peptides that occurred in clusters we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models.


Download

  • The newly predicted gene models and the support each model received from ESTs, peptides, and current annotation can be downloaded here AUGUSTUS_Corrected_Genes.gff, AUGUSTUS_Novel_Genes.gff.

  • Tracks of the peptides (both novel and those confirming current models) can be uploaded to TAIR GBrowse for visualization.

  • GFF Formated files of all novel and TAIR peptides can be downloaded here.

  • The exon splice graph building and searching is a functionality uniquely provided by Inspect. The documentation can be found here.

Correction: 126,055 peptides are reported confirming TAIR proteins, however, 403 of these peptides are in fact derived from common contaminants such as trypsin or keratin. The number of TAIR peptides, therefore, is 125,652 and all other results are unaffected.

Latest Releases

Inspect, MS-Alignment

2009.11.18

MS-GeneratingFunction

2008.09.04

PepNovo

2009.10.29

MS-Clustering

2008.06.09

MS-Dictionary

2007.11.30

Spectral Networks

Sept 2007

Copyright Notice

 

Media Coverage


Nonribosomal Peptide Dereplication and Sequencing (Scientific American, Genetic Engineering News, Natural Products Industry Insider and Genome Web Daily News)

A powerful tool for PTM discovery (Jan 2008, Journal of Proteome research, Vol 7. Issue 1)

From spectral networks to shotgun sequencing (June 2007, Nature Methods, Vol. 4 No. 6)

Identifying peptides without a database (May 2007, Journal of Proteome Research)

UCSD Computer Scientist Wins Young Investigator Award, Research on Snake Venom Proteins Highlighted (Nov 2006, UCSD)