CCMS Proteogenomics

Software Tools & Datasets

SpliceDB: Tool web page ProteoSAFe workflow (beta)
ENOSI: Tool web page ProteoSAFe workflow (beta)

Key Publications

An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays. Castellana NE, Shen Z, He Y, Walley JW, Cassidy CJ, Briggs SP, Bafna V. Mol Cell Proteomics. 2014 Jan;13(1):157-67. doi: 10.1074/mcp.M113.031260. Epub 2013 Oct 18.

Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody.Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody. Castellana NE, McCutcheon K, Pham VC, Harden K, Nguyen A, Young J, Adams C, Schroeder K, Arnott D, Bafna V, Grogan JL, Lill JR. Proteomics. 2011 Feb;11(3):395-405. doi: 10.1002/pmic.201000487. Epub 2011 Jan 5.

N-terminal protein processing: a comparative proteogenomic analysis. Bonissone S, Gupta N, Romine M, Bradshaw RA, Pevzner PA. Mol Cell Proteomics. 2013 Jan;12(1):14-28. doi: 10.1074/mcp.M112.019075. Epub 2012 Sep 23.

Proteogenomic database construction driven from large scale RNA-seq data Woo S, Cha SW, Merrihew G, He Y, Castellana N, Guest C, MacCoss M, Bafna V. J Proteome Res. 2014 Jan 3;13(1):21-8. doi: 10.1021/pr400294c. Epub 2013 Jul 17.

Template proteogenomics: sequencing whole proteins using an imperfect database. Castellana NE, Pham V, Arnott D, Lill JR, Bafna V. Mol Cell Proteomics. 2010 Jun;9(6):1260-70. doi: 10.1074/mcp.M900504-MCP200. Epub 2010 Feb 17.

Unexpected diversity of signal peptides in prokaryotes. Payne SH, Bonissone S, Wu S, Brown RN, Ivankov DN, Frishman D, Pasa-Tolić L, Smith RD, Pevzner PA. MBio. 2012 Nov 20;3(6). pii: e00339-12. doi: 10.1128/mBio.00339-12.

View All CCMS Publictions

Discovery of Aberrant Cancer Genes and Revealing Antibody Repertoires

The central dogma of Biology is that DNA contains the code for making proteins. Traditionally, genomic and computational (gene-finding) methods have been used to predict genes and encoded proteins. Mass spectrometry has been used to validate and quantify expressed proteins in samples. However, the human genome is at best, an incomplete template for protein synthesis. Mutations change the amino-acid code of proteins. Exons splice-together in different ways to make new protein products. Recombination, splicing and non-templated insertions are used to make antibody proteins. Large structural variation delete insert, translocate and duplicate genomic fragments changing gene structure and copy number.

CCMS is pioneering the use of proteogenomics techniques to identify expressed protein sequences using tandem mass spectra of expressed peptides, searched against customized databases of genomic information as a partial. CCMS developed proteogenomics annotations are now routinely used to annotate model organism. However, the use of next generation sequencing has greatly enhanced the landscape of proteome variation within a population, and in diseases like cancer.

Recent research has shown tremendous plasticity in cancer genomes. Various sequencing projects such as The Cancer Genome Atlas (TCGA) have suggested large structural abnormalities in tumor genomes, with a consequent impact on the expressed transcriptome and proteome. The validation of these aberrant (or still unnannotated) genes remains challenging due to erroneous and variable data, Big (petabyte scale) data-sets, and importantly, lack of direct protein level confirmation. Our current focus is on mass spectrometric validation of these gene aberrations.

Revealing Antibody Repertoires

CCMS, in collaboration with Genentech researchers, has pioneered new approaches to antibody sequencing via Spectral Networks, and Template Proteogenomics. As a result, automated sequencing of purified monoclonal antibodies (MAbs) has now become routine and has been commercialized by Digital Proteomics (that licensed our antibody sequencing technology from UCSD). However, the CCMS approach to antibody sequencing has not capitalized yet on recent breakthroughs in antibody analysis via NGS. While NGS has been used for analyzing antibody repertoires, until recently there were no attempts to integrate NGS and MS approaches for antibody analysis. We plan to integrate NGS and MS for antibody sequencing thus enabling identification of antibodies from complex samples. These bioinformatics developments currently represent a bottleneck for an emerging immunosequencing technology that promises to transform the antibody industry towards the focus on polyclonal antibodies.