NeuroPedia: Neuropeptide database and spectra library

Downloads
Browse NeuroPedia spectral libraries
Search your data using NeuroPedia

Contact: Yoona Kim [yok002 (at) ucsd.edu], Nuno Bandeira [bandeira (at) ucsd.edu]

Summary

Neuropeptides are essential for cell-cell communication in neurological and endocrine physiological processes in health and disease. While many neuropeptides have been identified in previous studies, the resulting data has not been structured to facilitate further analysis by tandem mass spectrometry (MS/MS), the main technology for high throughput neuropeptide identification. Many neuropeptides are difficult to identify when searching MS/MS spectra against large protein databases because of their atypical lengths (e.g., shorter/longer than common tryptic peptides) and lack of tryptic residues to facilitate peptide ionization/fragmentation. NeuroPedia is a neuropeptide encyclopedia of peptide sequences (including genomic and taxonomic information) and spectral libraries of identified MS/MS spectra of homolog neuropeptides from multiple species. Searching neuropeptide MS/MS data against known NeuroPedia sequences will improve the sensitivity of database search tools. Moreover, the availability of neuropeptide spectral libraries will also enable the utilization of spectral library search tools, which are known to further improve the sensitivity of peptide identification. These will also reinforce the confidence in peptide identifications by enabling visual comparisons between new and previously identified neuropeptide MS/MS spectra.

[Reference] NeuroPedia: Neuropeptide database and spectra library.
Y. Kim, S. Bark, V. Hook, and N. Bandeira
(In submission)


Downloads

Neuropeptide sequence databases

Sequences, taxonomic, and genomic information for 847 neuropeptides Download in Excel format

Match Types of pairs of sequences, total 340,725 pairs:    

Types Number of pairs
Identical 531 Download
Overlapping 5,020 Download
Homolog 9,185 Download

FASTA files per species:    

Species Number of sequences
Human 270 Download
Rat 195 Download
Mouse 188 Download
Bovine 154 Download
Rhesus macaque 20 Download
Chimpanzee 17 Download
California sea hare 2 Download
Leech 1 Download

Neuropeptide spectra

All identified 3,401 neuropeptide spectra:

Species Instrument Enzyme Number of spectra from NIST Number of spectra from In-housed Total Number of spectra Browse spectra Spectral library Decoy library
Human IT trypsin 385 a 221 606 Browse Download Download
Human IT v8 0 91 91 Browse Download Download
Human IT none 0 1,630 1,630 Browse Download Download
Human QTOF trypsin 41b 454 495 Browse Download Download
Human QTOF v8 0 202 202 Browse Download Download
Human QTOF none 0 160 160 Browse Download Download
Mouse IT trypsin 67c 0 67 Browse Download Download
Rat IT trypsin 4 0 4 Browse Download Download
Bovine IT none 0 145 145 Browse Download Download
Leech QTOF none 0 1e 1 Browse Download Download

a, b, c NIST library spectra (NIST_human_IT_2010-01-14, NIST_human_QTOF_2010-03-02, NIST_mouse_IT_2009-12-10, and NIST_rat_IT_v3.0_2009-05-21)
d Gupta et al., (2010) Mass Spectrometry-Based Neuropeptidomics of Secretory Vesicles from Human Adrenal Medullary Pheochromocytoma Reveals Novel Peptide Products of Prohormone Processing. J. Proteome Res., 9, 5065-5075.
e Bruand. et al., (2011) Automated Querying and Identification of Novel Peptides using MALDI Mass Spectrometric Imaging. J. Proteome Res., Article ASAP

Neuropeptide spectral libraries

527 neuropeptide spectra for unique peptide/charge-state pairs.
Download the spectral libraries and spectral libraries + decoy libraries in High Quality (HQ) and Low Quality (LQ).

Species Instrument Enzyme Number of spectra from NIST Number of spectra from In-house Total Number of spectra for unique peptide/charge-state pairs Total Number of spectra in HQ spectral library Browse HQ spectra HQ Spectral library HQ Spectral library + Decoy library Total Number of spectra in LQ spectral library Browse LQ spectra LQ Spectral library LQ Spectral library + Decoy library
Human IT trypsin 296 68 364 303 Browse Download Download 61 Browse Download Download
Human IT v8 0 53 53 24 Browse Download Download 29 Browse Download Download
Human IT none 0 121 121 54 Browse Download Download 67 Browse Download Download
Human QTOF trypsin 37 109 146 50 Browse Download Download 96 Browse Download Download
Human QTOF v8 0 69 69 14 Browse Download Download 55 Browse Download Download
Human QTOF none 0 44 44 13 Browse Download Download 31 Browse Download Download
Mouse IT trypsin 60 0 60 42 Browse Download Download 18 Browse Download Download
Rat IT trypsin 4 0 4 2 Browse Download Download 2 Browse Download Download
Bovine IT none 0 33 33 24 Browse Download Download 9 Browse Download Download
Leech QTOF none 0 1 1 1 Browse Download Download 0 Browse Download Download
All IT All 360 275 635 449 Browse Download Download 186 Browse Download Download
All QTOF All 37 223 260 78 Browse Download Download 182 Browse Download Download

Browsing NeuroPedia Spectral library

  1. Click a 'Browse' link in the 'Browse spectra' column in the above 'Neuropeptide spectra' table.
  2. In the browsing page, click 'group by peptide' in the 'Status' section.
  3. You can now see images for each spectrum in the library by clicking on the image icons on the leftmost column.
  4. some_text
  5. There are two ways to filter the spectral library:
  6. a. By 'Filename', 'Scan', 'Peptide', 'Protein', 'Charge', 'Score', 'ΔScore', 'FDR', 'PepFDR' and 'FPR' at the top of the table,
    some_text
    - Sort by search result values (letter or number) using the two upper and lower small triangles next to column titles.
    - Filter by specific values using the text boxes below column headers.
    b. It is also possible to use 'checked only' to show only spectral selected in the first column of table.
    some_text
  7. Click the 'Filter' button if all your filtering conditions are filled or selected.

Searching your data using NeuroPedia

M-SPLIT search  +  FDR:

  1. Download the NeuroPedia spectra library + decoy library in the 'HQ Spectral library +Decoy library' or 'LQ Spectral library +Decoy library' column in the above 'Neuropeptide spectral libraries' table.    
  2. Download M-SPLIT package from here: [http://proteomics.ucsd.edu/Software/MSPLIT/#Downloads]
  3. Run M-SPLIT as follows:  
  4. java –Xmx800M –jar MSPLIT_v1.0.jar  < NeuroPedia library file >  < Your spectrum file >  < precursor mass tolerance >  < output file > 
    - This will search the library and find the spectra in the library that best match to each query spectrum.
    - < Your spectrum file  >  can be in mgf or mzXML file format.
    - < precursor mass tolerance  > is in Da units. Usually one should use a relative large tolerance like 2 Da to allow for identification even if the data was acquired with high accuracy MS survey scans. (to allow for 13C errors in determination of monoisotopic masses)
    - < output file > is in text file format. If the output file parameter is omitted, then the output will be written to standard out.
  5. After the search, M-SPLIT uses a SVM to determine whether a match is significant: 
  6. - SVM classification is done using the svm-light package.
    - The binaries can be obtained at http://svmlight.joachims.org/
    - Place the binary files in the appropriate M-SPLIT directory depending on which operating system is being used: svm_learn and svm_classify in the svm_light_linux or svm_light_window subdirectories of the M-SPLIT directory
    - Use spectrumMatchClassify.pl script to score spectrum/spectra matches as well as estimate FDR by the target/decoy method and run the script as follows:
    ./spectrumMatchClassify.pl  < search result file >    < filteredOutputFile >  < FDR > 
    - < FDR > is a number between zero and one, i.e. to enforce 1% FDR use 0.01.
  7. For more detailed information about M-SPLIT please see: [http://proteomics.ucsd.edu/Software/MSPLIT/]

InsPecT Search:

  1. Download sequence FASTA files in the above 'FASTA files per species' table in the 'Neuropeptide sequence databases' section.
  2. Go to the CCMS website at http://proteomics.ucsd.edu/ProteoSAFe
  3. Log on to the website. If you don't have an account, it is free and straightforward to register using the link on the top right
  4. some_text
  5. In the 'Tool Selection',
  6. a. Select 'InsPecT' as the search tool.
    b. Upload spectrum and database files by clicking 'Select Input Files'.
    - In the 'Upload Files' tab, upload one or more of your spectrum files and the FASTA files downloaded from NeuroPedia.
    - In the 'Select Input Files' tab, select the spectrum files on the left panel of 'Select Input Files' and clicking the small button of 'Select Spectrum Files' to add them to the right panel of 'Selected files'.
    some_text
    - Also select the FASTA files on the left panel and click the small button of 'Select Sequence Files' to add them to the right panel.
    some_text
    - Click 'Finish Selection' if you finish selecting all files for searching.
    c. Describe shortly about your search.
    d. Adjust your search parameters in 'Instrument', 'Cysteine protecting group', 'Protease', 'Parent mass tolerance', and 'Ion tolerance'.
  7. In the 'Allowed Post-Translational Modifications', consider your choices of Post-Translational Modifications.
  8. In 'More options', check 'Include common contaminants' and set 'Spectrum-Level FDR' according to your False Discovery Rate.
  9. Click the 'Search' button at the bottom of the page.
  10. The page will continue to be updated automatically until the search completes. You will also receive an email notification at your registered email address once the search is finished or fails for some unexpected reason.