Service

Enabling High-Troughput Big Data Proteomics

Decoding Proteomics Big Data

We face a “Big data, many tools, no solution” conundrum in the MS community. New analysis tools are continually being developed and made available, including by us at CCMS. However, the vast majority of these tools require an extensive series of manual steps to translate data into meaningful results. As such, many smaller labs are forced to forfeit innovative tools in favor of stable software platforms with limited search capabilities. The CCMS ProteoSAFe platform is designed to meet this challenge by delivering a computational framework designed for easy integration of new tools (Flexibility) which can be easily accessed (Accessibility) and executed on laptops or large compute cluster (Scalability).

In addition to software solutions, there is also a pressing need for structured access and reuse of raw data. Today, a researcher analyzing the human kidney proteome at Harvard does not benefit from spectra released by the MIT researcher who might also be analyzing the human (let alone, mouse) kidney proteome. Even the simple question of whether a spectrum (identified or not) has been seen before (and under what circumstances) cannot be easily answered today. This widespread introversion is troublesome. Consider genomics, where Genbank and other databases link every gene sequence to all publications that make use of the sequence. Similarly, we could potentially annotate each spectrum with knowledge from all laboratories that generated it. However, there are few solutions available to capitalize on the value of “Big Data” in proteomics. The CCMS MassIVE repository is designed to meet this challenge by providing a Petabyte-scale platform for mass spectrometry data sharing. The MassIVE repository already enables the sharing of multiple Terabytes of mass spectrometry data, including all the public data that could be recovered from the now-extinct Tranche repository.

Resources

How can CCMS help your research?

Big Compute

ProteoSAFe: Proteomics Scalable, Accessible and Flexible Environment

ProteoSAFe addresses three key challenges in computational mass spectrometry: Flexibility in integration of tools, an Accessible user interface and Scalability to distributed compute clusters.

Big Data

MassIVE - Mass Spectrometry Interactive Virtual Environment

The MassIVE data repository was designed to facilitate the storage, reanalysis and collaborative annotation of all public mass spectrometry (MS) data in the world.