Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 16;90(2):1325-1333.
doi: 10.1021/acs.analchem.7b04221. Epub 2017 Dec 22.

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Affiliations

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Leah V Schaffer et al. Anal Chem. .

Abstract

In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the strategy for constructing proteoform families in Proteoform Suite with top-down proteomic data. Proteoform Suite contains a workflow consisting of four steps. Top-down proteomic data is first analyzed by TD Portal to produce a list of top-down proteoform identifications (step 1), and MS1 spectra from the same MS files are deconvoluted to produce a list of observed intact-mass experimental proteoforms (step 2). Then, a protein database is used to create a list of theoretical proteoforms (step 3). Finally, the masses of these three types of proteoforms are compared (step 4). Proteoform Suite outputs a list of identified proteoforms and visualized proteoform families, proteoforms derived from the same gene.
Figure 2
Figure 2
Proteoform pairs are formed between experimental proteoform masses and theoretical proteoform masses. Experimental proteoforms are composed of top-down measurements (purple circles, identifications from TDPortal) and intact-mass measurements (blue circles, observed by deconvolution, not identified by TDPortal). Theoretical nodes are generated from a UniProt database (green circles). Experimental proteoform masses are compared to theoretical proteoform masses (ET pairs, lines between green and blue or purple circles), as well as to one another (EE pairs, lines between blue or purple circles). The lines representing the EE and ET pairs are labeled in orange with the mass difference between the two proteoforms connected. Proteoform pairs that correspond to a known set of modifications are accepted and joined to form proteoform families.
Figure 3
Figure 3
Proteoform and protein identification results. The top graphic displays how Proteoform Suite increased the number of proteoform identifications by 40% (570 new identifications) using intact-mass determinations from a top-down (MS2) data set. The bottom graphic displays how the number of unique protein IDs (each corresponding to a particular gene) increased by 18% (68 newly identified proteoform families).
Figure 4
Figure 4
(A) Visualization of the 1022 proteoform families, composed of 3903 proteoforms from the integration of top-down and intact-mass experimental proteoforms. The visualization of proteoform families allows all identified proteoforms from a given gene to be viewed in a single graphic, illustrating the combinations of PTMs and/or cleavage products present in the family. In this figure, proteoform families are arranged with a gene at the bottom; moving counterclockwise, any theoretical proteoforms are arranged by decreasing mass. Continuing counterclockwise, any experimental proteoforms are arranged by increasing mass. (B) The proteoform family for yeast gene RPL11A was identified by top-down analysis in TDPortal but not in Proteoform Suite due to mass error of the precursor proteoform masses. Therefore, the top-down proteoforms were not formed into accepted ET pairs. (C) The proteoform family for yeast gene STF2 was missed by top-down analysis but was identified by intact-mass analysis. The lines between visualized theoretical and experimental proteoforms allow the user to know which proteoforms were identified by ET pairs (were present in the theoretical database) and which proteoforms were identified by the EE comparison (a mass shift from previously identified experimental proteoforms). This demonstrates how intact-mass analysis of proteoforms observed in MS1 spectra can identify additional proteoforms missed by top-down analysis. (D) The proteoform family for yeast gene LSM5 shows how a proteoform identified by top-down analysis can be leveraged to identify additional experimental proteoforms by their intact masses alone. In this case, the top-down analysis identified the acetylated form of LSM5 and comparison to other experimental proteoform masses revealed an acetylated form with a cleaved C-terminal leucine residue.

Similar articles

Cited by

References

    1. Smith LM, Kelleher NL, et al. Nat Methods. 2013;10:186–187. - PMC - PubMed
    1. Shortreed MR, Frey BL, Scalf M, Knoener RA, Cesnik AJ, Smith LM. J Proteome Res. 2016;15:1213–1221. - PMC - PubMed
    1. Yang X, Coulombe-Huntington J, Kang S, Sheynkman GM, Hao T, Richardson A, Sun S, Yang F, Shen YA, Murray RR, Spirohn K, Begg BE, Duran-Frigola M, MacWilliams A, Pevzner SJ, Zhong Q, Trigg SA, Tam S, Ghamsari L, Sahni N, Yi S, Rodriguez MD, Balcha D, Tan G, Costanzo M, Andrews B, Boone C, Zhou XJ, Salehi-Ashtiani K, Charloteaux B, Chen AA, Calderwood MA, Aloy P, Roth FP, Hill DE, Iakoucheva LM, Xia Y, Vidal M. Cell. 2016;164:805–817. - PMC - PubMed
    1. Mylona A, Theillet F-X, Foster C, Cheng TM, Miralles F, Bates PA, Selenko P, Treisman R. Science. 2016;354:233–237. - PMC - PubMed
    1. Jenuwein T, Allis CD. Science. 2001;293:1074–1080. - PubMed

Publication types

MeSH terms

LinkOut - more resources