Enosi

Contacts

Sunghee Woo [suwoo (at) ucsd.edu]Seong Won Cha [s3cha (at) ucsd.edu]

Summary

Our proteogenomics software has two distinct toolkits. The first involves the creation of specialized databases that can capture variant gene events, and is available in the SpliceDB tool . These customized databases can be searched against tandem mass spectra using any tool, but we recommend MS-GF+. Finally, the Enosi pipeline analyzes the identified peptides. Enosi is developed to automize the spectrum-peptide match to be recognized more intuitive sense. Because peptide sequence itself are not very informative for novel peptides, Enosi classifies the peptides as known and novel and finds the genomic locus for the novel. And it compares these novel locations with known gene sets and classify them into event to make a novel peptide to be more intuitively recognizable. It also contain its own method to filtering out some uncredible events.

  • Input: Output of MS-GF+ or other search tool
  • Output: List of event. Each event contains following information – Genomic location, Related known gene, Supporting RNA information, UCSC genome browser link, number of spectrum, number of novel and known peptides supporting current event, etc.
  • Event List: Alternative Splice, Novel Splice, Fusion Gene, Insertion, Deletion, Mutation, Translated UTR, Gene boundary, Exon boundary, Novel exon, Frame shift, Reverse strand, Novel gene

Documentation

Version 1.0

Manual – See section 7

Version 0.1

Documentation – Document for Enosi V0.1 can be found on the link.

SpliceDB

The database creation tool can be found here.

Direct Link to ProteoSAFe

Please click here.

Downloads

Version 1.0

Executable Jar Python files – Version 1.0 is coming soon. Part of Enosi is included in the same bundle of SpliceDB. Please see the manual section 7.

Version 0.1

Enosi V0.1 was used to annotate model organisms, including Arabidopsis and Maize. The projects used EST and other transcript data (but not RNA-seq) to create custom databases, and used InsPecT to search spectra directly against splice graphs. While we are working on a cleaner version of the code, a user-manual and executable jar files for Enosi V0.1 can be found here.

Publications

An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays. Castellana NE, Shen Z, He Y, Walley JW, Cassidy CJ, Briggs SP, Bafna V. Mol Cell Proteomics. 2014 Jan;13(1):157-67. doi: 10.1074/mcp.M113.031260. Epub 2013 Oct 18.
Proteogenomic database construction driven from large scale RNA-seq data Woo S, Cha SW, Merrihew G, He Y, Castellana N, Guest C, MacCoss M, Bafna V. J Proteome Res. 2014 Jan 3;13(1):21-8. doi: 10.1021/pr400294c. Epub 2013 Jul 17.
Template proteogenomics: sequencing whole proteins using an imperfect database. Castellana NE, Pham V, Arnott D, Lill JR, Bafna V. Mol Cell Proteomics. 2010 Jun;9(6):1260-70. doi: 10.1074/mcp.M900504-MCP200. Epub 2010 Feb 17.