Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct;22(10):2127-2137.
doi: 10.1038/s41592-025-02846-5. Epub 2025 Sep 29.

Uncovering hidden protein modifications with native top-down mass spectrometry

Affiliations

Uncovering hidden protein modifications with native top-down mass spectrometry

Jack L Bennett et al. Nat Methods. 2025 Oct.

Abstract

Protein modifications drive dynamic cellular processes by modulating biomolecular interactions, yet capturing these modifications within their native structural context remains a significant challenge. Native top-down mass spectrometry promises to preserve the critical link between modifications and interactions. However, current methods often fail to detect uncharacterized or low-abundance modifications, limiting insights into proteoform diversity. To address this gap, we introduce precise and accurate Identification Of Native proteoforms (precisION), an interactive end-to-end software package that leverages a robust, data-driven fragment-level open search to detect, localize and quantify 'hidden' modifications within intact protein complexes. Applying precisION to four therapeutically relevant targets-PDE6, ACE2, osteopontin (SPP1) and a GABA transporter (GAT1)-we discover undocumented phosphorylation, glycosylation and lipidation, and resolve previously uninterpretable density in an electron cryo-microscopy map of GAT1. As an open-source software package, precisION offers an intuitive means for interpreting complex protein fragmentation data. This tool will empower the community to unlock the potential of native top-down mass spectrometry, advancing integrative structural biology, molecular pathology and drug development.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.L.B., T.J.E., C.A.L. and C.V.R. are listed as inventors on a pending European patent application EP24190466 entitled ‘Improved mass spectrometry methods’, assigned to the University of Oxford, describing approaches for analyzing top-down mass spectra. C.V.R. is a cofounder of and scientific advisor at OMass Therapeutics. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Data-driven interpretation of native top-down mass spectra in precisION.
a, Overview of an nTDMS measurement, illustrated here with endogenous bovine PDE6, a heterotetrameric complex composed of three distinct subunits. Intact protein complexes, exhibiting peak widths of 10–1,000-Da full width at half maximum in deconvolved spectra, are selected and activated in the gas phase, yielding fragment ions measured with both: (1) sufficient resolving power to distinguish between similar protein modifications, and (2) high mass accuracy (errors typically less than 20 mDa) to confidently identify these modifications. precisION interprets the resulting spectra, assigning (modified) sequence ions which can be reassembled to uncover the diversity of proteoforms assembled within the antecedent complexes. Here, all three subunits of bovine PDE6 were identified from isolated bovine rod outer segment membranes after the intact protein complex was activated using infrared photons. Co- and post-translational modifications including N-terminal methionine exclusion, N-terminal acetylation and cysteine lipidation were confidently identified from the fragment mass spectrum alone. Regions corresponding to the observed fragments are highlighted on the AlphaFold 3 (ref. ) structure of the assembled complex. b, Schematic overview of the workflows employed by precisION for spectral deconvolution, protein isoform identification, fragment assignment and modification discovery. Isotopic envelopes are first picked and filtered (deconvolution), before the identity of the fragmented protein complex is determined using de novo sequencing and/or an open database search (isoform identification). Unmodified terminal fragments are then assigned through a semi-supervised hierarchical scheme (fragment assignment) before a fragment-level open search is used to discover and assign sets of internal fragments and co-/post-translationally modified terminal fragments (modification discovery). Mod mass, modification mass; Prob., probability. Source data
Fig. 2
Fig. 2. precisION localizes N-glycosylation on the dimeric human ACE2.
a, AlphaFold 3 structure of the ACE2 N432KO ectodomain dimer produced in HEK293 GNTI−/− cells. Man5GlcNAc2 glycans are displayed in blue. b, Mass spectrum of ACE2 N432KO (300 mM ammonium acetate, pH 7.0). An ensemble of ions assigned to the ACE2 dimer with 10–11 N-glycans and other unknown modifications (27+) were selected using the quadrupole and activated using ion–neutral collisions (HCD 130–200 V). The annotated MS2 spectrum generated at an HCD acceleration voltage of 150 V is displayed below. Inset are isotopic envelopes corresponding to sequence ions arising from distinctly modified forms of ACE2. c, Multinotch fragment-level open search used to identify the C terminus of the ACE2 construct. Variable proteolytic processing was observed. MS1- and MS2-based quantifications of the different truncated forms are shown inset. d, Fragment map illustrating the position of the fragmentation sites along the backbone of ACE2. Individual horizontal lines correspond to sequence ions assigned from the combined HCD dataset. N-glycosylation sites detected using bottom-up proteomics are displayed above the map. In some cases, we observed unmodified fragments enclosing glycosylation sites—this is likely due to the complete loss of the glycan upon activation. The right panel displays the positions of glycosylated and nonglycosylated sequence ions along the N terminus of ACE2. N-glycan sequons (N-X-S/T) are highlighted with dashed lines. e, Fragment-level open search results for the ACE2 dimer at different HCD acceleration voltages. Multiple significant offsets were observed. NGS, Asn-Gly-Ser; NIT, Asn-Ile-Thr; NLT, Asn-Leu-Thr. Source data
Fig. 3
Fig. 3. Human SPP1 is variably truncated and phosphorylated beneath a layer of glycosylation.
a, Schematic illustrating the maturation and function of human SPP1. During and after secretion, SPP1 is extensively modified to generate a diverse range of proteoforms that interact with receptors on surrounding cells. b, Mass spectrum of SPP1 (1 M ammonium acetate, pH 7.0). A broad range of ions (m/z 3,400 ± 600) were selected using the quadrupole and activated using ion–neutral collisions (sceHCD 105–135 V). The annotated MS2 spectrum generated at an HCD acceleration voltage of 120 V is displayed below. Two truncated forms of the protein are annotated in different shades of green. Inset are the isotopic envelopes detected for y191+ with 0, 1 or 2 phosphate groups. Intensities are scaled in each case to ensure each envelope can be clearly observed. c, Multinotch fragment-level open search used to identify the termini of SPP1. Data from three HCD acceleration voltages were combined before conducting the search. The upper plot displays the b-type ion search used to characterize the N terminus, while the lower plot displays the y-type ion search used to characterize the C terminus. A linear representation of the protein is displayed above the plot. PTMs with a score of 4 on iPTMnet are marked at their respective positions, along with the detected cleavage sites. d, Fragment-level open search results examining the mass offsets corresponding to one or two phosphate groups. Data from three HCD acceleration voltages were combined before conducting the search. e, Scatter plot illustrating the mean number of phosphates between SPP1 residues and the nearest terminus (purple squares for the N terminus and pink circles for the C terminus) for full-length SPP1, as measured by nTDMS. Data are presented as mean ± s.d. from n = 8 independent MS2 spectra (acquired with different isolation windows and HCD acceleration voltages) of the same purified protein preparation. Previously observed phosphosites from iPTMnet are displayed above the plot. The observed sequence ions do not span the central region of the protein. Source data
Fig. 4
Fig. 4. The human GABA transporter GAT1 is covalently lipidated to fill an interaction site within the plasma membrane.
a, Schematic illustrating the action of GAT1 at synapses. GABA is released into the synaptic cleft where it activates ionotropic and metabotropic GABA receptors. After its release, GABA is transported into surrounding glial cells and the presynaptic neuron by GAT1. The neurotransmitter is then packaged back into vesicles by the vesicular GABA transporter (VGAT). b, Mass spectrum of GAT1 (400 mM ammonium acetate, 2 × CMC DDM/CHS, pH 7.0). Post-translationally modified ions (m/z 3,983 ± 2.5) were selected using the quadrupole and activated with infrared photons (stepped laser power IRMPD (slpIRMPD) 6.0–8.4 W, 10 ms). The annotated MS2 spectrum generated at a laser output power of 7.2 W is displayed below. Inset is the y1546+ sequence ion envelope. Differentially lipidated forms of this ion were detected and modeled. c, AlphaFold database structure of human GAT1 with representative glycan structures from GlycoSHIELD. d, Fragment-level open search results for the C terminus of GAT1. e, Fragment map illustrating the position of the unmodified (gray) and lipidated (blues) sequence ions along the C terminus of GAT1. Individual horizontal lines correspond to sequence ions assigned from the combined slpIRMPD dataset. Potential cysteine residues where lipid modifications may occur are highlighted above the plot. f, Bar chart illustrating the relative abundances of different fatty acids in HEK293 GNTI−/− cells (black) or conjugated to Cys493 (blues). Data are presented as mean ± s.d.; for protein-bound lipids, n = 6 independent isotopic envelope fits, and for free fatty acids, n = 3 independent lipid extractions. g, EM density map (PTMs and lipid density contoured at 3σ) of the modeled palmitoyl cysteine and PtdEtn lipid on human GAT1 (EMD-33674, PDB 7Y7Y). The palmitate group was modeled covalently on Cys493 and a PtdEtn lipid was modeled adjacent to this moiety. FA, fatty acid. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Compositional analysis of endogenous PDE6 with precisION reveals insights into complex maturation.
a, Annotated native top-down mass spectrum of endogenous bovine PDE6 isolated from rod outer segment disc membranes (200 mM ammonium acetate, pH 7.0). The heterotetrameric complex (30+; m/z 7250 ± 5; see Fig. 1a) was selected using the quadrupole and activated with infrared photons (IRMPD, 8.4 W, 5 ms). Inset are the isotopic envelope of the geranylgeranylated y101+ sequence ion, the top three spectrum matches ranked by native fragmentation propensity score (nFPS), and a MS3 spectrum supporting the geranylgeranylated y101+ assignment. b, Sequence coverage maps for all three bovine PDE6 subunits highlighting observed terminal fragments. Regions without terminal coverage are noted. c, Fragment- level open search results for the C-terminus of PDE6β. A peak consistent with geranylgeranylation can be observed just above the noise. d, MS2 sequencing depth per residue for PDE6α, PDE6β, and PDE6γ subunits, indicating how often each residue was observed within the set of sequence ions. e, Bar plots displaying total terminal fragment ion intensity for each cleavage site along the protein sequence, with projected scatter plots above indicating the charge states of the observed ions. b-type ions are displayed in blue while y-type ions are indicated in red.
Extended Data Fig. 2
Extended Data Fig. 2. Denaturing and native top-down mass spectrometry measurements offer distinct data for proteoform identification.
a, Schematic overview of a typical denaturing top-down mass spectrometry measurement. Protein ions are generated via electrospray ionization from acidified mixtures of aqueous and organic solvents, often following online separation (for example, liquid chromatography). The resulting ions are typically monomeric, highly-charged, and often small and homogenous, facilitating analysis by high- resolution mass spectrometry. Fragmenting protein ions from denaturing solutions generally yields product ion spectra with high signal-to-noise ratios. The masses of the precursor and fragments can be used to identify the precursor ion by searching a library of theoretical proteoforms, such as through a traditional open search with a ±1–500 Da precursor tolerance. b, Schematic overview of a native top-down mass spectrometry (nTDMS) measurement. Intact protein complexes are ionized from electrolyte solutions at physiological pH, typically using a static nanoelectrospray ion source. The resulting ions can comprise of multiple subunits (it is often not possible to dissociate individual subunits in the gas phase), each potentially bearing diverse modifications, affording a heterogenous set of unresolvable molecular species. Product ion spectra are typically of lower quality when compared to denaturing analyses, with reduced sequence coverage. Additionally, in targeted nTDMS studies, proteoforms with unexpected or uncommon post- translational modifications (PTMs) are frequently of particular interest—these PTMs are not represented in standard proteoform libraries. PrSM, proteoform–spectrum match.
Extended Data Fig. 3
Extended Data Fig. 3. Real and artefactual isotopic envelopes detected by deconvolution algorithms cannot be readily discriminated using single scoring measures.
Summary of isotopic envelopes classified as true or false by precisION’s supervised voting classifier. The envelopes were identified from a fragment spectrum of the ACE2 dimer activated using beam-type collision induced dissociation (HCD 130 V). a, UpSet plot illustrating the intersections between the isotopic envelopes detected by precisION’s six deconvolution algorithms. The Thrash algorithm is executed with various scoring thresholds due to its iterative nature; stricter score requirements do not yield a subset of the envelopes identified with more lenient requirements. On the left side, horizontal bars represent the total number of envelopes each algorithm detected, with purple indicating true envelopes and blue indicating false ones. In the main plot, columns of filled circles show which algorithms detected the same sets of envelopes. Each column represents a unique combination of algorithms, and the bar above each column shows the number of envelopes consistently detected by that combination. Sets with less than 15 members were excluded from the plot. b, Frequency histograms showing the distribution of envelope fit scores for both true and false envelopes. Notably, many artefactual envelopes exhibit higher fit scores than true envelopes. Isotopic envelopes are often filtered by fit score in established workflows. c, Scatter plot depicting the distribution of intensity and interference scores for true and false envelopes. There is substantial overlap between the two classifications, illustrating the difficulty in distinguishing between real and artefactual envelopes.
Extended Data Fig. 4
Extended Data Fig. 4. precisION’s fragment- level open search can detect a diverse range of fragment ion modifications without constraint.
Schematic overview of the fragment-level open search. In a first pass search, observed ions are assigned to unmodified sequence ions. After this search, the residual unassigned ions are assumed to mainly consist of modified sequence ions. To identify these modifications, a mass offset is applied to each of the theoretical sequence ions, then the number of matches between the observed ions and the offset theoretical ions is counted. By scanning across a continuous range of mass offsets using a sliding window approach, the algorithm can identify offsets that produce a significantly high number of matches. Peaks in this scan indicate sets of sequence ions with a common modification. Here an example mass offset of +79.965 is shown to result in an increased number of matches compared to the background. Searching for this mass offset in UniMod would suggest it to correspond to a set of phosphorylated sequence ions.
Extended Data Fig. 5
Extended Data Fig. 5. Internal fragments generated by the collisional activation of native protein complexes share common terminal fragmentation sites.
a, Counts of internal fragments sharing specific N- or C- terminal fragmentation sites observed upon collisional activation of the dimeric ACE2 complex. Counts for each set of internal fragments sharing a N- (upper) or C-terminal (lower) fragmentation site are presented for the ‘true’ target set of theoretical ions (blue), as well as for an example set of decoy ions generated by adding the mass of acetate (42.0105 Da) to each theoretical ion (grey). Across all examined collision energies, there are sets of internal fragments with a shared fragmentation site that were larger in population than anticipated when assuming random matching. Sets of internal fragments with a shared fragmentation site that were deemed statistically significant (expectation value < 0.01) were selected for assignment (purple marks). b, Distribution of the assigned internal fragments along the length of ACE2. Horizontal lines represent individual sequence ions, with ions generated at HCD 150 V shown in grey and at HCD 200 V shown in black. Internal fragments are formed from localized areas, and are primarily generated by cleavage at high-propensity fragmentation sites (for example, D|P fragments). c, Number of statistically significant (expectation value < 0.01) sets of internal fragments with a common terminal fragmentation site identified at increasing HCD acceleration voltages. d, Number of internal fragments assigned at increasing HCD acceleration voltages.
Extended Data Fig. 6
Extended Data Fig. 6. precisION’s shared termini filter enables robust internal fragment assignment.
Fragment false discovery rate (FDR) for internal fragments generated upon collision activation of the ACE2 dimer (HCD 130–200 V). Without applying the shared termini filter, FDRs exceed 40%, whether using precisION or TDValidator (which additionally incorporates an envelope fitting score filter). By applying precisION’s shared termini filter with an E-value threshold of 0.01, the FDR is reduced up to ~20-fold, even before further examination of poor envelope fits.
Extended Data Fig. 7
Extended Data Fig. 7. N-glycans can be confidently identified and localized on intact, denatured proteins using top- down mass spectrometry with collisional activation.
a, Mass spectrum of Gallus gallus avidin acquired under denaturing and reducing conditions (50% MeCN, 1% formic acid in H2O (v/v), 5 mM TCEP). An envelope of highly-charged proteoforms was selected (19+; m/z 830 ± 30) with the quadrupole and activated using ion–neutral collisions (HCD 10-15 V) to yield abundant sequence ions. The resulting MS2 spectrum generated at an HCD acceleration voltage of 12.5 V is displayed below. Inset are isotopic envelopes corresponding to sequence ions that have retained a HexNAc moiety throughout fragmentation. b, Deconvoluted mass spectrum illustrating the distribution of avidin proteoforms existing in the sample. All assignments were made on the basis of findings from top-down MS measurements. c, Multinotch fragment-level open search used to identify avidin termini. Signal peptide cleavage can be observed along with variable C-terminal truncation. d, Fragment-level open search used to identify N-terminal modifications on avidin. b-type ions retaining HexNAc or large glycans can be observed. These ions were used to inform compositional assignments at the MS1 level. e, Fragment map illustrating the position of the fragmentation sites along the backbone of avidin. Individual horizontal lines correspond to sequence ions assigned from the combined HCD dataset. Sequence ions retaining glycan structures are displayed in red. N-glycan sequons (N-X-S/T) are highlighted with dashed lines.
Extended Data Fig. 8
Extended Data Fig. 8. Deglycosylation of SPP1 confirms low protein phosphorylation.
Native mass spectrum of human SPP1 after treatment with a cocktail of glycosidases to completely remove all O- glycans. Two truncated proteoforms could be observed above the noise: SPP1[17-314] and SPP1[17-246]. Proteoforms with up to six phosphate groups could be observed for the full-length protein. However, the major stoichiometric form had no phosphate modifications. The average number of phosphates per protein molecule across the full protein population was calculated to be 2.0.
Extended Data Fig. 9
Extended Data Fig. 9. Free fatty acid profile of the cell line used for GAT1 expression.
Normalized abundance of free fatty acids in the HEK293 GNTI−/− cell line used for GAT1 expression. A representative negative mode electrospray ionization mass spectrum used for quantification is displayed as an inset. Fatty acids were observed as carboxylates and are notated as Cx:y where x is the chain length and y is the number of double bonds. Data are presented as the mean ± s.d. from n=3 independent lipid extractions. Individual data points greater than 1% are displayed. Source data
Extended Data Fig. 10
Extended Data Fig. 10. GAT1 lipidation fills a potential lipid binding site by mimicking cholesterol.
a, Structure of human GAT1 (PDB ID 7Y7Y) with a palmitate moiety attached to Cys493 (grey). b, Structure of porcine SERT (PDB ID 8DE4) with a bound cholesterol hemisuccinate lipid (grey). c, Sequence alignment of human GAT1 and porcine and human SERT. The lipidated cysteine in GAT1 (Cys493) is not present in either SERT, preventing lipid conjugation at this site.

References

    1. Beltrao, P., Bork, P., Krogan, N. J. & van Noort, V. Evolution and functional cross‐talk of protein post‐translational modifications. Mol. Syst. Biol.9, 714 (2013). - PMC - PubMed
    1. Lee, J. M., Hammaren, H. M., Savitski, M. M. & Baek, S. H. Control of protein stability by post-translational modifications. Nat. Commun.14, 201 (2023). - PMC - PubMed
    1. Aebersold, R. et al. How many human proteoforms are there? Nat. Chem. Biol.14, 206–214 (2018). - PMC - PubMed
    1. Smith, L. M. et al. The Human Proteoform Project: defining the human proteome. Sci. Adv.7, eabk0734 (2021). - PMC - PubMed
    1. Melani, R. D. et al. The Blood Proteoform Atlas: a reference map of proteoforms in human hematopoietic cells. Science375, 411–418 (2022). - PMC - PubMed

LinkOut - more resources