Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;18(7):768-770.
doi: 10.1038/s41592-021-01184-6. Epub 2021 Jun 28.

Universal Spectrum Identifier for mass spectra

Affiliations

Universal Spectrum Identifier for mass spectra

Eric W Deutsch et al. Nat Methods. 2021 Jul.

Abstract

Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.

PubMed Disclaimer

Figures

Extended Data Figure 1.
Extended Data Figure 1.
Example use cases for Universal Spectrum Identifiers (USIs), providing a set of 13 example USIs along with a brief comment on each. These same 13 USIs can be easily viewed as the “Box 1 example USIs” select list at http://proteomecentral.proteomexchange.org/usi. Example 4c in Box 1 provides the USI for the demonstrated correct PSM of an ordinary UniProtKB protein Q9UQ35 from Mylonas et al. Figure 2B (example 4d is the corresponding synthetic peptide spectrum). Example 4a in Box 1 provides the USI for the same spectrum as example 4c, but annotated with the previously, incorrectly reported HLA (Human Leukocyte Antigen) peptide as described in Mylonas et al. Figure 2A. The non-matching synthetic peptide spectrum for the incorrect sequence is given as Box 1 as example 4b. The Human Proteome Project (HPP) has set a high bar for data quality and evidence in support of its goal to provide high-stringency detections for all human proteins. The latest version of its MS data interpretation guidelines 3.0 have set a requirement that key detection claims of proteins not previously seen via MS must be accompanied by USIs referencing the key spectra for each claim, so that the peptide-spectrum matches can be transparently inspected by the community to verify their veracity. For example, the BioPlex dataset was important for detecting novel proteins that had not been previously observed but it was crucial to consider the provenance of every single identification to exclude all files from experiments where the protein was intentionally overexpressed (as per the standard protocol for analysis of protein-protein interactions). Example 3a in Box 1 provides a PSM derived from a prey protein pulled down as a binding partner to bait protein C5orf38. Example 3b provides a PSM of the same peptide as above, but derived from a recombinant protein used as a bait. This PSM provides a much higher signal-to-noise ratio synthetic peptide reference spectrum as required by HPP guidelines. Illustrating this application of USIs at a community-wide scale, MassIVE further provides an extensive list of USIs for 1,296,916 MassIVE-KB entries in support of HPP Protein Existence (PE) classifications for 16,393 proteins (available at http://massive.ucsd.edu/hpp), including USIs for matching spectra of synthetic peptides (when available in public datasets); an abridged version of this table is also provided as Supplementary Table 1.
Figure 1:
Figure 1:
(A) Graphical overview of the general format of the USI, including the mzspec prefix, the collection component, the MS run component, the indexType, the indexNumber, and the optional interpretation. (B) A USI example using the spectrum scan number, once with and once without the optional spectrum interpretation. (C) Visual representation of this spectrum (mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2) in ProteomeCentral, accompanied with the ion table indicating the m/z values of the identified b- and y-ions. The Lorikeet spectrum viewer is used.
Figure 2.
Figure 2.
Graphical depiction of USI application ecosystem. Members of the community can uniquely identify spectra from journal articles and other sources using USIs. A USI can be resolved potentially at any of several different repositories that store datasets, or spectra can be obtained and viewed using independent applications, such as at ProteomeCentral, which store no spectra themselves, but can fetch spectral data from repositories using USIs. Hundreds of millions of peptide-spectrum matches (PSMs) and spectra without matches are accessible via USIs at the various repositories. PSMs or spectra can even be uniquely identified with QR (Quick Response) codes.

References

    1. Deutsch EW et al.The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020). - PMC - PubMed
    1. Wilkinson MD et al.The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). - PMC - PubMed
    1. Ezkurdia I, Vázquez J, Valencia A & Tress M Analyzing the first drafts of the human proteome. J. Proteome Res 13, 3854–3855 (2014). - PMC - PubMed
    1. Mylonas R et al.Estimating the Contribution of Proteasomal Spliced Peptides to the HLA-I Ligandome. Mol. Cell. Proteomics MCP 17, 2347–2357 (2018). - PMC - PubMed
    1. Wohlgemuth G et al.SPLASH, a hashed identifier for mass spectra. Nat. Biotechnol 34, 1099–1101 (2016). - PMC - PubMed

Publication types