Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 4;18(10):3671-3680.
doi: 10.1021/acs.jproteome.9b00339. Epub 2019 Sep 18.

Constructing Human Proteoform Families Using Intact-Mass and Top-Down Proteomics with a Multi-Protease Global Post-Translational Modification Discovery Database

Affiliations

Constructing Human Proteoform Families Using Intact-Mass and Top-Down Proteomics with a Multi-Protease Global Post-Translational Modification Discovery Database

Yunxiang Dai et al. J Proteome Res. .

Abstract

Complex human biomolecular processes are made possible by the diversity of human proteoforms. Constructing proteoform families, groups of proteoforms derived from the same gene, is one way to represent this diversity. Comprehensive, high-confidence identification of human proteoforms remains a central challenge in mass spectrometry-based proteomics. We have previously reported a strategy for proteoform identification using intact-mass measurements, and we have since improved that strategy by mass calibration based on search results, the use of a global post-translational modification discovery database, and the integration of top-down proteomics results with intact-mass analysis. In the present study, we combine these strategies for enhanced proteoform identification in total cell lysate from the Jurkat human T lymphocyte cell line. We collected, processed, and integrated three types of proteomics data (NeuCode-labeled intact-mass, label-free top-down, and multi-protease bottom-up) to maximize the number of confident proteoform identifications. The integrated analysis revealed 5950 unique experimentally observed proteoforms, which were assembled into 848 proteoform families. Twenty percent of the observed proteoforms were confidently identified at a 3.9% false discovery rate, representing 1207 unique proteoforms derived from 484 genes.

Keywords: Jurkat; MetaMorpheus; NeuCode; Proteoform Suite; global PTM discovery; human proteoform; intact-mass; multi-protease; proteoform family; top-down.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of sample preparation for intact-mass, top-down, and bottom-up proteomics. In this study, “intact-mass proteomics” refers to MS1-only analysis with no precursor fragmentation, while “top-down proteomics” refers to tandem MS analysis with precursor fragmentation and MS2 analysis.
Figure 2.
Figure 2.
Schematic of data processing and analysis for proteoform identification and family construction using intact-mass, top-down, and bottom-up proteomics data.
Figure 3.
Figure 3.
Number of identified NeuCode intact-mass experimental proteoforms from Proteoform Suite analyses using three different protein databases. Identified proteoforms were grouped by PTM count (upper panel). The theoretical proteoform catalog size for each analysis is indicated (lower panel). The overall identification FDR for these three analyses was maintained at ~5%.
Figure 4.
Figure 4.
Two examples of proteoform families constructed using NeuCode intact-mass data. Three separate Proteoform Suite analyses were performed using UniProt, trypsin-only G-PTM-D, and multi-protease G-PTM-D databases. Gene names (pink squares) connect to all theoretical proteoforms (green nodes) in the family. Theoretical proteoforms are labeled “unmodified” or with PTM information and any terminal amino acid losses. Intact-mass experimental proteoforms (blue nodes) are labeled with their masses and PTMs, as deduced by Proteoform Suite. Experimental proteoforms are arranged counterclockwise in ascending order of mass. The size of each node corresponds to the integrated intensity of that proteoform’s spectral peaks. The edges are labeled with the mass difference of the two connected proteoforms (Da). The accepted mass differences are the result of selecting low-FDR ET and EE pairs during the Proteoform Suite analyses. Turquoise annotations are from the UniProt analysis, while red annotations are new findings or PTM corrections gleaned from analyses using G-PTM-D databases.
Figure 5.
Figure 5.
Stepwise results of the Proteoform Suite integration of intact-mass, top-down, and bottom-up data. Overall, 1,207 unique proteoforms were identified, representing 484 genes. In the bottom box, only the 496 unique intact-mass proteoforms are depicted, so as to eliminate the common identifications between intact-mass and top-down. See the Supporting Information, Table S-12 for more detailed results of this analysis.
Figure 6.
Figure 6.
Array of the 438 unambiguously identified (i.e., assigned to a single gene) proteoform families that were constructed by the integrated intact-mass/top-down analysis (left) and four example families (right). In addition to the symbols utilized in Figure 4, here we add purple nodes to represent top-down experimental proteoforms, and the blue nodes with red annotations denote new intact-mass identifications arising from the inclusion of top-down data in the Proteoform Suite analysis. Previous versions of families A and B were presented in Figure 4. The versions presented here show new developments in the families upon integrating top-down data. Families C and D are newly identified proteoform families. Note: the families in this figure were modified slightly from the automated output of Proteoform Suite (i.e., some nodes and edges were removed), as described in the Supplementary Experimental Methods.

References

    1. Aebersold R; Agar JN; Amster IJ; Baker MS; Bertozzi CR; Boja ES; Costello CE; Cravatt BF; Fenselau C; Garcia BA; Ge Y; Gunawardena J; Hendrickson RC; Hergenrother PJ; Huber CG; Ivanov AR; Jensen ON; Jewett MC; Kelleher NL; Kiessling LL; Krogan NJ; Larsen MR; Loo JA; Ogorzalek Loo RR; Lundberg E; MacCoss MJ; Mallick P; Mootha VK; Mrksich M; Muir TW; Patrie SM; Pesavento JJ; Pitteri SJ; Rodriguez H; Saghatelian A; Sandoval W; Schlüter H; Sechi S; Slavoff SA; Smith LM; Snyder MP; Thomas PM; Uhlén M; Van Eyk JE; Vidal M; Walt DR; White FM; Williams ER; Wohlschlager T; Wysocki VH; Yates NA; Young NL; Zhang B; How many human proteoforms are there? Nat. Chem. Biol. 2018, 14, 206–214. - PMC - PubMed
    1. Smith LM; Kelleher NL; The Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat. Methods 2013, 10, 186–187. - PMC - PubMed
    1. Shortreed MR; Frey BL; Scalf M; Knoener RA; Cesnik AJ; Smith LM; Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements. J. Proteome Res. 2016, 15, 1213–1221. - PMC - PubMed
    1. Toby TK; Fornelli L; Kelleher NL; Progress in Top-Down Proteomics and the Analysis of Proteoforms. Annu. Rev. Anal. Chem. 2016, 9, 499–519. - PMC - PubMed
    1. Siuti N; Kelleher NL; Decoding protein modifications using top-down mass spectrometry. Nat. Methods 2007, 4, 817–821. - PMC - PubMed

Publication types