. 2018 Jan 16;90(2):1325-1333.

doi: 10.1021/acs.analchem.7b04221. Epub 2017 Dec 22.

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Leah V Schaffer¹, Michael R Shortreed¹, Anthony J Cesnik¹, Brian L Frey¹, Stefan K Solntsev¹, Mark Scalf¹, Lloyd M Smith^{1

2}

Affiliations

¹ Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.
² Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States.

PMID: 29227670
PMCID: PMC5807004
DOI: 10.1021/acs.analchem.7b04221

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Leah V Schaffer et al. Anal Chem. 2018.

. 2018 Jan 16;90(2):1325-1333.

doi: 10.1021/acs.analchem.7b04221. Epub 2017 Dec 22.

Authors

Leah V Schaffer¹, Michael R Shortreed¹, Anthony J Cesnik¹, Brian L Frey¹, Stefan K Solntsev¹, Mark Scalf¹, Lloyd M Smith^{1

2}

Affiliations

¹ Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.
² Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States.

PMID: 29227670
PMCID: PMC5807004
DOI: 10.1021/acs.analchem.7b04221

Abstract

In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.

PubMed Disclaimer

Figures

**Figure 1**
Overview of the strategy for constructing proteoform families in Proteoform Suite with top-down proteomic data. Proteoform Suite contains a workflow consisting of four steps. Top-down proteomic data is first analyzed by TD Portal to produce a list of top-down proteoform identifications (step 1), and MS1 spectra from the same MS files are deconvoluted to produce a list of observed intact-mass experimental proteoforms (step 2). Then, a protein database is used to create a list of theoretical proteoforms (step 3). Finally, the masses of these three types of proteoforms are compared (step 4). Proteoform Suite outputs a list of identified proteoforms and visualized proteoform families, proteoforms derived from the same gene.

**Figure 2**
Proteoform pairs are formed between experimental proteoform masses and theoretical proteoform masses. Experimental proteoforms are composed of top-down measurements (purple circles, identifications from TDPortal) and intact-mass measurements (blue circles, observed by deconvolution, not identified by TDPortal). Theoretical nodes are generated from a UniProt database (green circles). Experimental proteoform masses are compared to theoretical proteoform masses (ET pairs, lines between green and blue or purple circles), as well as to one another (EE pairs, lines between blue or purple circles). The lines representing the EE and ET pairs are labeled in orange with the mass difference between the two proteoforms connected. Proteoform pairs that correspond to a known set of modifications are accepted and joined to form proteoform families.

**Figure 3**
Proteoform and protein identification results. The top graphic displays how Proteoform Suite increased the number of proteoform identifications by 40% (570 new identifications) using intact-mass determinations from a top-down (MS2) data set. The bottom graphic displays how the number of unique protein IDs (each corresponding to a particular gene) increased by 18% (68 newly identified proteoform families).

**Figure 4**
(A) Visualization of the 1022 proteoform families, composed of 3903 proteoforms from the integration of top-down and intact-mass experimental proteoforms. The visualization of proteoform families allows all identified proteoforms from a given gene to be viewed in a single graphic, illustrating the combinations of PTMs and/or cleavage products present in the family. In this figure, proteoform families are arranged with a gene at the bottom; moving counterclockwise, any theoretical proteoforms are arranged by decreasing mass. Continuing counterclockwise, any experimental proteoforms are arranged by increasing mass. (B) The proteoform family for yeast gene *RPL11A* was identified by top-down analysis in TDPortal but not in Proteoform Suite due to mass error of the precursor proteoform masses. Therefore, the top-down proteoforms were not formed into accepted ET pairs. (C) The proteoform family for yeast gene *STF2* was missed by top-down analysis but was identified by intact-mass analysis. The lines between visualized theoretical and experimental proteoforms allow the user to know which proteoforms were identified by ET pairs (were present in the theoretical database) and which proteoforms were identified by the EE comparison (a mass shift from previously identified experimental proteoforms). This demonstrates how intact-mass analysis of proteoforms observed in MS1 spectra can identify additional proteoforms missed by top-down analysis. (D) The proteoform family for yeast gene *LSM5* shows how a proteoform identified by top-down analysis can be leveraged to identify additional experimental proteoforms by their intact masses alone. In this case, the top-down analysis identified the acetylated form of *LSM5* and comparison to other experimental proteoform masses revealed an acetylated form with a cleaved C-terminal leucine residue.

See this image and copyright information in PMC

Cited by

Mass Spectrometry-Based Proteomic Technology and Its Application to Study Skeletal Muscle Cell Biology.
Dowling P, Swandulla D, Ohlendieck K. Dowling P, et al. Cells. 2023 Nov 1;12(21):2560. doi: 10.3390/cells12212560. Cells. 2023. PMID: 37947638 Free PMC article. Review.
Proteoform Analysis and Construction of Proteoform Families in Proteoform Suite.
Schaffer LV, Shortreed MR, Smith LM. Schaffer LV, et al. Methods Mol Biol. 2022;2500:67-81. doi: 10.1007/978-1-0716-2325-1_7. Methods Mol Biol. 2022. PMID: 35657588 Free PMC article.
Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics.
Berg P, McConnell EW, Hicks LM, Popescu SC, Popescu GV. Berg P, et al. BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):102. doi: 10.1186/s12859-019-2619-6. BMC Bioinformatics. 2019. PMID: 30871482 Free PMC article.
Proteoform Suite: Software for Constructing, Quantifying, and Visualizing Proteoform Families.
Cesnik AJ, Shortreed MR, Schaffer LV, Knoener RA, Frey BL, Scalf M, Solntsev SK, Dai Y, Gasch AP, Smith LM. Cesnik AJ, et al. J Proteome Res. 2018 Jan 5;17(1):568-578. doi: 10.1021/acs.jproteome.7b00685. Epub 2017 Dec 15. J Proteome Res. 2018. PMID: 29195273 Free PMC article.
Identification and Quantification of Murine Mitochondrial Proteoforms Using an Integrated Top-Down and Intact-Mass Strategy.
Schaffer LV, Rensvold JW, Shortreed MR, Cesnik AJ, Jochem A, Scalf M, Frey BL, Pagliarini DJ, Smith LM. Schaffer LV, et al. J Proteome Res. 2018 Oct 5;17(10):3526-3536. doi: 10.1021/acs.jproteome.8b00469. Epub 2018 Sep 18. J Proteome Res. 2018. PMID: 30180576 Free PMC article.

See all "Cited by" articles

References

1. Smith LM, Kelleher NL, et al. Nat Methods. 2013;10:186–187. - PMC - PubMed
1. Shortreed MR, Frey BL, Scalf M, Knoener RA, Cesnik AJ, Smith LM. J Proteome Res. 2016;15:1213–1221. - PMC - PubMed
1. Yang X, Coulombe-Huntington J, Kang S, Sheynkman GM, Hao T, Richardson A, Sun S, Yang F, Shen YA, Murray RR, Spirohn K, Begg BE, Duran-Frigola M, MacWilliams A, Pevzner SJ, Zhong Q, Trigg SA, Tam S, Ghamsari L, Sahni N, Yi S, Rodriguez MD, Balcha D, Tan G, Costanzo M, Andrews B, Boone C, Zhou XJ, Salehi-Ashtiani K, Charloteaux B, Chen AA, Calderwood MA, Aloy P, Roth FP, Hill DE, Iakoucheva LM, Xia Y, Vidal M. Cell. 2016;164:805–817. - PMC - PubMed
1. Mylona A, Theillet F-X, Foster C, Cheng TM, Miralles F, Bates PA, Selenko P, Treisman R. Science. 2016;354:233–237. - PMC - PubMed
1. Jenuwein T, Allis CD. Science. 2001;293:1074–1080. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Affiliations

Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases