Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 11;11(1):12358.
doi: 10.1038/s41598-021-91661-x.

Assessment of a complete and classified platelet proteome from genome-wide transcripts of human platelets and megakaryocytes covering platelet functions

Affiliations

Assessment of a complete and classified platelet proteome from genome-wide transcripts of human platelets and megakaryocytes covering platelet functions

Jingnan Huang et al. Sci Rep. .

Abstract

Novel platelet and megakaryocyte transcriptome analysis allows prediction of the full or theoretical proteome of a representative human platelet. Here, we integrated the established platelet proteomes from six cohorts of healthy subjects, encompassing 5.2 k proteins, with two novel genome-wide transcriptomes (57.8 k mRNAs). For 14.8 k protein-coding transcripts, we assigned the proteins to 21 UniProt-based classes, based on their preferential intracellular localization and presumed function. This classified transcriptome-proteome profile of platelets revealed: (i) Absence of 37.2 k genome-wide transcripts. (ii) High quantitative similarity of platelet and megakaryocyte transcriptomes (R = 0.75) for 14.8 k protein-coding genes, but not for 3.8 k RNA genes or 1.9 k pseudogenes (R = 0.43-0.54), suggesting redistribution of mRNAs upon platelet shedding from megakaryocytes. (iii) Copy numbers of 3.5 k proteins that were restricted in size by the corresponding transcript levels (iv) Near complete coverage of identified proteins in the relevant transcriptome (log2fpkm > 0.20) except for plasma-derived secretory proteins, pointing to adhesion and uptake of such proteins. (v) Underrepresentation in the identified proteome of nuclear-related, membrane and signaling proteins, as well proteins with low-level transcripts. We then constructed a prediction model, based on protein function, transcript level and (peri)nuclear localization, and calculated the achievable proteome at ~ 10 k proteins. Model validation identified 1.0 k additional proteins in the predicted classes. Network and database analysis revealed the presence of 2.4 k proteins with a possible role in thrombosis and hemostasis, and 138 proteins linked to platelet-related disorders. This genome-wide platelet transcriptome and (non)identified proteome database thus provides a scaffold for discovering the roles of unknown platelet proteins in health and disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Classification scheme and decision tree for gene and protein assignment to 21 function classes. Assignment was based on primary subcellular localization of the protein and its assumed function according to UniProt-KB. (A) Class numbering in alphabetical order. (B) Hierarchical decision tree.
Figure 2
Figure 2
Dataflow of numbers of transcripts of proteome proteins. Relevant transcripts were defined as those of log2fpkm ≥ 0.20. Identified proteins refer to proteins present in the combined proteome from six cohorts. Non-identified proteins refer to proteins with relevant transcript levels in the combined PLT and MGK transcriptome. Data from validation cohort are also indicated.
Figure 3
Figure 3
Histograms of RNA levels in transcriptome of platelets (PLT) or megakaryocytes (MGK). (A,B) Distribution of all 57,289 genome-wide transcripts. (C,D) Distribution of all relevant transcripts (log2fpkm ≥ 0.20) for PLT (n = 17,629) or MGK (n = 16,843). (E,F) Distribution of protein-coding transcripts, as identified in the proteome, for PLT (n = 5,030) or MGK (n = 4,882). Levels of RNA expression (log2fpkm) were binned as < 0.20, 0.20–0.50, 0.50–1.00, 1.00–2.00, etc. For flow of numbers of transcripts and proteins, see Fig. 2.
Figure 4
Figure 4
Transcript distribution of identified and not identified proteins in the platelet proteome per function class. Examined were all relevant protein-coding transcripts (log2fpkm ≥ 0.20) of the combined relevant PLT/MGK transcriptome, with separation of identified proteins (n = 5,050) and not identified proteins (n = 9,721). For full data, see Suppl. Figure 3. (A) Numbers of transcripts numbers per function class. (B) Percentage distribution of transcripts per function class.
Figure 5
Figure 5
Comparison of protein copy numbers with mRNA levels and class-based analysis. (A,B) Protein copy numbers compared per gene to transcript levels (log2fpkm) for datasets of platelets (PLT, n = 3,519) (A) or megakaryocytes (MGK, n = 3,442). (B) Note triangular space, with low-abundance proteins (< 500 copies/platelet) were normalized to 150 copies. (C,D) Over-representation of protein function classes in quantitative proteome-transcriptome space per predefined area (I–V). Area I is considered to represent a condition of high translation (high mRNA level) and high transcription (high copy number); area II of high translation and low transcription; area III of low translation and transcription, and area IV an intermediate condition. Area V represents proteins without relevant transcript levels in PLT. Transcriptome-proteome triangle with analyzed areas (C). Enlarged space indicating function classes (C01-C21) with significant over-representation per area. Statistics in Suppl. Table 1.
Figure 6
Figure 6
Distribution profile of relevant transcripts of per protein function class. For the relevant platelet transcriptome (n = 17,629), heatmaps were constructed of percentual distribution of transcript levels per function class (rainbow colors; blue = low, red = high). (A) Heatmap for transcripts of identified proteins (n = 5,030). (B) Heatmap for transcripts of non-identified proteins (n = 9,267); furthermore RNA genes (n = 2,480) and pseudogenes (n = 852). Expression levels (log2fpkm) were binned as 0.20–0.50, 0.50–1.00, 1.00–2.00, etc. For numbers of transcripts, see Suppl. Figure 3.
Figure 7
Figure 7
Restraining factors per function class and prediction model of full platelet proteome. Analysis of non-identified proteins (n = 9,721) from the relevant, combined PLT/MGK transcriptome per function class. Full dataset is provided in Suppl. Table 2. (A) Fraction of identified proteins in green. Well-identified classes with fractions > 0.55 labeled as ID. Indicated in red are each of three restraining factors per class: (i) over- represented low copy number (areas II-III in Fig. 5D), (ii) low mRNA level (area V, LM = low mRNA > 45%); (iii) retainment in megakaryocyte (peri)nucleus upon platelet shedding. Bottom: means of identified fractions (weighted for the presence of multiple factors); and correction factor in comparison to 'well-identified'. (B) Based on identified proteins (n = 5,050), modelled prediction of increased identification of missing proteins per class at higher proteomic detection. Shown per class are fractions of total relevant transcripts (heatmapped), and total expected proteins (bottom line). (C) Validation of prediction model based on novel proteome with 5,341 identified proteins.
Figure 8
Figure 8
Network-based potential roles of (non)identified proteins in platelet proteome in arterial thrombosis and hemostasis. Using a published meta-analysis of mouse genes in thrombosis and bleeding, the network was built in Cytoscape, containing 267 core genes (bait nodes), 2679 new nodes, connected by 19.7 k interactions. (A) Redrawn network visualization with color-coded proteins identified (green) or not identified (red) in the platelet proteome, with relevant transcript levels (node size, log2fpkm). Names are listed of 40 proteins with highest mRNA expression levels. (B) Distribution profile of (non)identified proteins with transcript levels (median copy numbers, median log2pkm). No mRNA = below relevant threshold. Attribute lists are given in Suppl. Datafile 4.

References

    1. Versteeg HH, Heemskerk JW, Levi M, Reitsma PS. New fundamentals in hemostasis. Physiol. Rev. 2013;93:327–358. doi: 10.1152/physrev.00016.2011. - DOI - PubMed
    1. Van der Meijden PE, Heemskerk JW. Platelet biology and functions: New concepts and clinical perspectives. Nat. Rev. Cardiol. 2018;16:166–179. doi: 10.1038/s41569-018-0110-0. - DOI - PubMed
    1. Werner G, Morgenstern E. Three-dimensional reconstruction of human blood platelets using serial sections. Eur. J. Cell. Biol. 1980;20:276–282. - PubMed
    1. Van Nispen tot pannerden H, et al. The platelet interior revisited: Electron tomography reveals tubular alpha-granule subtypes. Blood. 2010;116:1147–1156. doi: 10.1182/blood-2010-02-268680. - DOI - PubMed
    1. Thon JN, Italiano JE. Platelets: production, morphology and ultrastructure. Handb. Exp. Pharmacol. 2012;210:3–22. doi: 10.1007/978-3-642-29423-5_1. - DOI - PubMed

Publication types