Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Oct 20;12(10):883.
doi: 10.15252/msb.20167144.

Gene-specific correlation of RNA and protein levels in human cells and tissues

Affiliations
Comparative Study

Gene-specific correlation of RNA and protein levels in human cells and tissues

Fredrik Edfors et al. Mol Syst Biol. .

Abstract

An important issue for molecular biology is to establish whether transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on parallel reaction monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands of protein copies per mRNA molecule for others. In conclusion, our data suggest that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.

Keywords: gene expression; protein quantification; targeted proteomics; transcriptomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Determination of cell counts using the histone abundance for normalization
  1. The core histones and overview of the corresponding QPrEST and peptide standards mapped out on the protein sequence.

  2. Relative quantification of all four histone proteins in each tissue replicate (order of appearance per replicate: H2A, H2B, H3.3, and H4).

  3. Immunohistochemistry images from the Human Protein Atlas (http://www.proteinatlas.org) for protein ANXA1 with nuclear staining (blue) for three selected tissues (scale bars = 100 μm).

  4. Calibration curves for two of the four histone peptides, with decreasing amount of QPrEST standard spiked into a U2OS cell lysate.

Figure 2
Figure 2. Absolute copy number of proteins in tissues and corresponding cell lines
  1. Absolute copy number of protein in kidney tissue and human embryonal kidney cells (HEK293), liver tissue and liver cancer cell line (HepG2), lung tissue and lung cancer cell line (A549), and breast tissue and breast cancer cell line (MCF7). The order of proteins is the same in the tissue and corresponding cell line, and the proteins have been ordered according to the abundance in the respective tissue.

  2. The direct correlation between RNA (TPM) and protein abundances (copy number) for all quantified genes in the same tissues and cell lines. Spearman's (ρ) and Pearson's (r) correlation between the two values across the quantified genes are shown. The other seven tissues and five cell lines are shown in Fig EV4.

Figure 3
Figure 3. The protein and RNA levels for three genes
Subcellular localization by immunofluorescence staining and immunohistochemistry staining in tissue sections by three different antibodies (SELENBP1, HPA011731; STOM, HPA010961; ASS1, HPA020896). Microtubule and nuclear probes are visualized in red and blue, respectively. Antibody staining is shown in green. RNA‐to‐protein ratio across nine cell lines and 11 tissues with Spearman's ρ, Pearson's r and R 2 for each gene. All other genes can be found in Fig EV1.
Figure EV1
Figure EV1. Protein copy number and RNA levels (TPM) for all quantified genes
The absolute copy numbers of proteins (blue) and the level of RNA (purple) measured as TPM are shown for all 55 genes across nine cell lines and 11 tissues. All values can be found in Table EV3 and Table EV6.
Figure 4
Figure 4. The correlation between the absolute copy number of proteins and the corresponding RNA levels (measured as TPM) in nine cell lines and 11 tissues
  1. The gene‐specific RNA‐to‐protein correlation factors are shown for all the 55 genes with a box‐plot showing the average correlation factor for each gene and the variation observed in the nine cell lines and 11 tissues. All the values for each of the cell lines and tissues are found in Table EV7. Horizontal lines = median. The lower and upper “hinges” correspond to the first and third quartiles (the 25th and 75th percentiles). Length of the whiskers as multiple of IQR = 1.5.

  2. The gene‐specific correlation between protein copy number (x‐axis) and predicted protein copy number based on the RNA levels (RNA‐based prediction) is shown for four tissues and four cell lines. The other seven tissues and five cell lines are also shown in Fig EV5 and predicted copy numbers can be found in Table EV9.

Figure EV2
Figure EV2. RNA‐to‐protein ratio and variation versus protein length
  1. The variation measured as coefficient of variation (cv) across samples are plotted against the protein length.

  2. The protein lengths for the 55 target proteins are plotted against the RTP ratio.

Data information: Test based on Pearson's product moment correlation coefficient and follows a t‐distribution with length(x)‐2 degrees of freedom if the samples follow independent normal distributions. An asymptotic confidence interval is given based on Fisher's Z‐transform.
Figure EV3
Figure EV3. RNA‐to‐protein ratios for proteins with different subcellular compartments
The number of GO‐annotated (UniProt, August 1, 2016) proteins in each compartment with a given RTP ratio is plotted (blue) with all other proteins not annotated to this compartment (red). P‐values are calculated using Student's t‐test.
Figure EV4
Figure EV4. Protein copies per cell versus RNA levels (TPM) for all quantified genes in all cell lines and tissues
  1. A, B

    The direct correlation between RNA (TPM, y‐axis) and protein abundances (copy number, x‐axis) for all quantified genes in all cell lines (A) and tissues (B). The Spearman's (ρ) and Pearson's (r) correlations between the two values across the quantified genes are shown.

Figure EV5
Figure EV5. Prediction of protein levels based on TPM levels as compared to the experimentally derived protein copy number for all quantified genes in all cell lines and tissues
  1. A, B

    The gene‐specific correlation between protein copy number (x‐axis) and predicted protein copy number based on the RNA levels (RNA‐based prediction, y‐axis) in all cell lines (A) and tissues (B). The Spearman's (ρ) and Pearson's (r) correlations between the two values across the quantified genes are shown.

Figure 5
Figure 5. The gene‐specific correlation between RNA and protein levels
  1. The gene‐specific correlation between protein copy number (x‐axis) and predicted protein copy number based on the RNA levels (RNA‐based prediction) is shown for all the 55 genes and all 20 cell lines and tissues. Horizontal lines = median. The lower and upper “hinges” correspond to the first and third quartiles (the 25th and 75th percentiles). Length of the whiskers as multiple of IQR. Defaults to 1.5. Circles indicate outliers.

  2. The Pearson's correlation between RNA and protein levels for the 55 genes in the nine cell lines and 11 tissues is shown as a direct comparison of RNA and protein levels (purple, RNA versus protein) and after introducing the gene‐specific correlation factor (blue, RNA‐based prediction versus protein).

  3. Density plot for the direct comparison between RNA and protein levels before and after introducing the RTP‐conversion factor. The Pearson's correlation using the RTP‐conversion factor is improved substantially for all cell lines and tissues with a median Pearson's correlation of 0.93.

Figure EV6
Figure EV6. Examples of quantitative peptide profiles using QPrEST for targeted proteomics
All peptides were quantified by comparing the relative amount of light endogenous peptide (red) against the heavy standard peptide (blue). The relative amount was thereafter normalized against the number of cells present in each biological replicate by accounting for the relative amount of histones present in each sample.

Comment in

References

    1. Ahrné E, Molzahn L, Glatter T, Schmidt A (2013) Critical assessment of proteome‐wide label‐free absolute abundance estimation strategies. Proteomics 13: 2567–2578 - PubMed
    1. Anderson L, Seilhamer J (1997) A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 18: 533–537 - PubMed
    1. Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near‐optimal probabilistic RNA‐seq quantification. Nat Biotechnol 34: 525–527 - PubMed
    1. Cancer Genome Atlas Research Network , Weinstein JN, Collisson EA, Mills GB, Mills Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The Cancer Genome Atlas Pan‐Cancer analysis project. Nat Genetics 45: 1113–1120 - PMC - PubMed
    1. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.‐range mass accuracies and proteome‐wide protein quantification. Nat Biotechnol 26: 1367–1372 - PubMed

Publication types