Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 4;13(4):2028-44.
doi: 10.1021/pr401191w. Epub 2014 Mar 10.

Annotating N termini for the human proteome project: N termini and Nα-acetylation status differentiate stable cleaved protein species from degradation remnants in the human erythrocyte proteome

Affiliations

Annotating N termini for the human proteome project: N termini and Nα-acetylation status differentiate stable cleaved protein species from degradation remnants in the human erythrocyte proteome

Philipp F Lange et al. J Proteome Res. .

Abstract

A goal of the Chromosome-centric Human Proteome Project is to identify all human protein species. With 3844 proteins annotated as "missing", this is challenging. Moreover, proteolytic processing generates new protein species with characteristic neo-N termini that are frequently accompanied by altered half-lives, function, interactions, and location. Enucleated and largely void of internal membranes and organelles, erythrocytes are simple yet proteomically challenging cells due to the high hemoglobin content and wide dynamic range of protein concentrations that impedes protein identification. Using the N-terminomics procedure TAILS, we identified 1369 human erythrocyte natural and neo-N-termini and 1234 proteins. Multiple semitryptic N-terminal peptides exhibited improved mass spectrometric identification properties versus the intact tryptic peptide enabling identification of 281 novel erythrocyte proteins and six missing proteins identified for the first time in the human proteome. With an improved bioinformatics workflow, we developed a new classification system and the Terminus Cluster Score. Thereby we described a new stabilizing N-end rule for processed protein termini, which discriminates novel protein species from degradation remnants, and identified protein domain hot spots susceptible to cleavage. Strikingly, 68% of the N-termini were within genome-encoded protein sequences, revealing alternative translation initiation sites, pervasive endoproteolytic processing, and stabilization of protein fragments in vivo. The mass spectrometry proteomics data have been deposited to ProteomeXchange with the data set identifier <PXD000434>.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Identification of proteins and their termini from erythrocytes. (A) Schematic workflow. Erythrocytes (RBC) were enriched from leukocytes (WBC) by repeated Ficoll density gradient centrifugations, lysed, and separated into membrane, soluble, and soluble hemoglobin-depleted protein fractions. Proteins were denatured and primary amines of proteins with free N-termini and Lys side chains blocked by dimethylation (light gray triangle), followed by digest with trypsin or GluC. Note that unlike shotgun proteomics workflows, TAILS requires labeling at the protein level before trypsin or GluC digestion to isolate and identify the N termini present in the sample. For protein identification, an aliquot of the tryptic digest was removed (preTAILS). N-terminal peptides were enriched using TAILS, including both in vitro dimethylated (gray triangle) and naturally blocked N-termini (black tilted square). Peptides from preTAILS and TAILS were fractionated off-line by SCX-chromatography, analyzed by LC–MS/MS, and identified using three different database search engines before statistical validation and protein identification using the Trans Proteomic Pipeline. (B) Enrichment of erythrocytes and depletion of white blood cells and platelets by repeated gradient (grad.) centrifugations. PLT, platelets; cell counts depicted for: cell pellet, packed cells after serum removal; grad. 1, after first Ficoll gradient; grad. 2, after second Ficoll gradient; final, erythrocyte preparations used to prepare proteome samples. (C) Proteins and (D) N termini identified in membrane (87 MS runs), soluble (147 MS runs), and hemoglobin-depleted (42 MS runs) protein fractions. (E) Proteins and (F) N termini identified in proteome analysis from tryptic digests (preTAILS), enriched N terminal peptides from tryptic digest (Trypsin TAILS) or GluC digest (GluC TAILS). Spectra of (G) trypsin and (H) GluC-digested samples matched to peptide sequences by semispecific database searches with Mascot, X!Tandem, and MS-GF+.
Figure 2
Figure 2
Erythrocyte proteome. (A) Comparison of the erythrocyte proteome with the proteomes of two nucleated cell lines, U2OS and HELA Gene Ontology enrichment (pval <0.05) of molecular function terms associated with (B) proteins unique to erythrocytes and (C) proteins common to erythrocytes and nucleated cells. (D) Overlap between our data set and all known erythrocyte proteins. (E) Method by which the 281 proteins uniquely identified in this study were identified. (F) Chromosome distribution of the genes encoding for the identified protein displayed as percent of total number of proteins encoded on a given chromosome. Dark blue: at least one N terminus identified; light blue: only protein identified; red: observed “missing” proteins. Labels: Absolute number of proteins with N terminus identification/protein only identification/observed “missing” proteins. (G) Number of copies for select proteins in erythrocytes.
Figure 3
Figure 3
Modification and functional classification of all identified protein termini. (A) Distribution of protein termini identified with naturally free amino group (free N term), N-terminal acetylation (acetylated), or pyro-Glu (pyroE) formation. (B) Protein termini with annotated functions, termini identified as likely alternative translation initiation sites (alt. start), and termini with unannotated function. (C) Functional classification of annotated protein termini. Alt NME, alternative start with initiating Met excised; alt. Met, alternative start at initiating Met; NME, N-terminal Met excision; Met, start at initiating Met; SP, signal peptide removed; Pro, propeptide removed. (D) N-terminal modification of unannotated protein termini. (E) N-terminal modification of protein termini with processed (left) and intact (right) initiating Met.
Figure 4
Figure 4
Internal protein N terminal residues and their modification determine protein stability. (A) Occurrence and identity of internal protein N terminal amino acid (i.e., starting at positions >2 in their protein sequence) compared with natural amino acid abundance and acetylation state. Blue, free N-terminus; red, acetylated N-terminus; gray, natural abundance. * pval <0.05; ** pval <0.01. (B) Fold change of N-terminal acetylation for each amino acid of all internal N-termini relative to average post-translational acetylation, shown in vertical bars labeled primary and secondary. * pval <0.05; ** pval <0.01. (C) Proposed terminus stability classification categories based on the N-end rule and N-terminal acetylation status. (D) Gene Ontology term enrichment (pval <0.05) of proteins with N termini falling into the stability classes defined in panel C. Yellow, free destabilizing; orange, acetylation destabilized; light blue, free nondestabilizing; dark blue, acetylation stabilized.
Figure 5
Figure 5
Discrimination of degradation intermediates from functional protein species. The positions of identified N-termini are indicated relative to the annotated protein sequence (x axis). For each example, the top panels display the number of free (blue dots), acetylated (red dots), and total (gray dots) spectra (right y axis) measured at each position. The terminus prevalence score (TPS, black curve, left y axis) is a measure for the estimated relevance. The terminus cluster score (TCS, green curve, left y axis) is a measure for the potential of a protein species to start within a given sequence span. TPS and TCS are considered relevant above a threshold of two (dashed gray horizontal line). The first of the lower two plots represents the number of biological replica in which a terminus has been identified (light gray to dark gray). The lowest plot displays the terminus stability classification we introduced in Figure 4A) N-termini of hemoglobin chains alpha and beta. In both chains, the predominant species start with their original Met-processed N termini, as indicated by high TPS scores. Four clusters with a TCS >2 are identified in the hemoglobin alpha chain of which three (gray boxes: I, II, III) are also found in hemoglobin beta chain (albeit II with a subthreshold score). (B) Degradation remnants of the nuclear protein NSFL cofactor p47 identified by few termini, each with a low TPS, scattered across the genome-encoded sequence with overall low TCS scores. (C) Degradation remnants and band-3 species with altered functions. All identified N-termini are located in cytoplasmic domains or loops. Four distinct regions (I–IV) are identified by high TPS or TCS scores. (I) A single, reproducibly identified stable N-terminus at position 29. (II) Region in the cytoplasmic domain susceptible to proteolysis. (III) Cluster of multiple free N-termini correlating with a cluster of phosphorylation sites in the cytoplasmic domain. (IV) Dense cluster of predominantly nondestabilizing free termini falling into a potentially cytoplasmic loop within the transmembrane region of band 3. Lower panel: protein feature representation: green box: cytoplasmic domain; brown box: transmembrane domain; red ^: phosphorylation site.
Figure 6
Figure 6
Functional protein species inferred from isoform comparison and crystal structure mapping. (A) Integrated terminus analysis of the 14-3-3-protein family. Family members 14-3-3 alpha/beta, theta, and zeta/delta all have a predominant species starting with original N termini (I). A second cluster of mostly stable N termini between positions 24–44 is conserved across all three proteins. (B) Crystal structure of 14-3-3 alpha/beta homodimer with terminus cluster II highlighted in red. Protein species starting with a terminus of this cluster lack the two N-terminal helices forming the interaction interface.

References

    1. Paik Y.-K.; Jeong S.-K.; Omenn G. S.; Uhlen M.; Hanash S.; Cho S. Y.; Lee H.-J.; Na K.; Choi E.-Y.; Yan F.; Zhang F.; Zhang Y.; Snyder M.; Cheng Y.; Chen R.; Marko-Varga G.; Deutsch E. W.; Kim H.; Kwon J.-Y.; Aebersold R.; Bairoch A.; Taylor A. D.; Kim K. Y.; Lee E.-Y.; Hochstrasser D.; Legrain P.; Hancock W. S. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30, 221–223. - PubMed
    1. Lane L.; Bairoch A.; Beavis R. C.; Deutsch E. W.; Gaudet P.; Lundberg E.; Omenn G. S. Metrics for the Human Proteome Project 2013–2014 and Strategies for Finding Missing Proteins. J. Proteome Res. 2014, 13, 15–20. - PMC - PubMed
    1. Kleifeld O.; Doucet A.; Auf dem Keller U.; Prudova A.; Schilling O.; Kainthan R. K.; Starr A. E.; Foster L. J.; Kizhakkedathu J. N.; Overall C. M. Isotopic labeling of terminal amines in complex samples identifies protein N-termini and protease cleavage products. Nat. Biotechnol. 2010, 28, 281–288. - PubMed
    1. Frenette P. S.; Atweh G. F. Sickle cell disease: old discoveries, new concepts, and future promise. J. Clin. Invest. 2007, 117, 850–858. - PMC - PubMed
    1. Wilson J. G.; Wong W. W.; Murphy E. E. 3.; Schur P. H.; Fearon D. T. Deficiency of the C3b/C4b receptor (CR1) of erythrocytes in systemic lupus erythematosus: analysis of the stability of the defect and of a restriction fragment length polymorphism of the CR1 gene. J. Immunol. 1987, 138, 2708–2710. - PubMed

Publication types