Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 5;47(15):8111-8125.
doi: 10.1093/nar/gkz646.

A hidden human proteome encoded by 'non-coding' genes

Affiliations

A hidden human proteome encoded by 'non-coding' genes

Shaohua Lu et al. Nucleic Acids Res. .

Abstract

It has been a long debate whether the 98% 'non-coding' fraction of human genome can encode functional proteins besides short peptides. With full-length translating mRNA sequencing and ribosome profiling, we found that up to 3330 long non-coding RNAs (lncRNAs) were bound to ribosomes with active translation elongation. With shotgun proteomics, 308 lncRNA-encoded new proteins were detected. A total of 207 unique peptides of these new proteins were verified by multiple reaction monitoring (MRM) and/or parallel reaction monitoring (PRM); and 10 new proteins were verified by immunoblotting. We found that these new proteins deviated from the canonical proteins with various physical and chemical properties, and emerged mostly in primates during evolution. We further deduced the protein functions by the assays of translation efficiency, RNA folding and intracellular localizations. As the new protein UBAP1-AST6 is localized in the nucleoli and is preferentially expressed by lung cancer cell lines, we biologically verified that it has a function associated with cell proliferation. In sum, we experimentally evidenced a hidden human functional proteome encoded by purported lncRNAs, suggesting a resource for annotating new human proteins.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The translating lncRNAs that can encode proteins (canonical ORF length ≥150 nt). (A) The number of translating lncRNAs detected by the full-length translating mRNA sequencing (RNC-seq) in nine cell lines. (B) Ribosome footprints (RFP) of two translating lncRNAs as examples. Red bars denote the RFP coverage along the RNA, and the grey region marks the predicted canonical ORF. (C) The chromosome enrichment of the translating lncRNAs. P-values were calculated using Fisher Exact test. (D, E) Expression patterns of the translating lncRNAs in nine cell lines. The threshold of 0.1 (D) and 1.0 (E) RPKM are respectively used for positive detections.
Figure 2.
Figure 2.
Protein evidence of the translating lncRNAs. (A) Length distribution of the new proteins and the PE1 proteins. (B) Identifications of new proteins by shotgun mass spectrometry analysis on total and low molecular weight proteins. (C) Bioinformatics based peptide evidence of the new proteins. (D) Data independent analysis verification by non-synthetic peptide based MRM, PRM and heavy synthetic peptide based MRM (heavy-MRM). (E) Immunoblotting verification of dour MS-detected new proteins in human cell lines and human tissues. (F) Western blot analysis of six MS-undetected new proteins.
Figure 3.
Figure 3.
The difficulty of detecting the new proteins. (A) Expression level of the new proteins and PE1 proteins in transcription level (mRNA) and translation level (full-length translating mRNA). (B) Protein abundance of PE1 proteins and MS-detected new proteins in three hepatocellular carcinoma cell lines. (C) The distribution of isoelectric point (pI) of MS-detected new proteins and PE1 proteins. (D) The amino acid properties of MS-detected new proteins and PE1 proteins. (E) Number of trypsin digested peptides per protein of MS-detected new proteins and PE1 proteins. (F) The calculated instability of MS-detected new proteins and PE1 proteins.
Figure 4.
Figure 4.
Origin of the new proteins. (A) ORF exon counts of MS-detected new proteins and PE1 proteins. (B) The homology of the 308 MS-detected new proteins and 308 randomly selected PE1 proteins across the phylogeny. Upper panels: the number of homologous genes found in the species with at least 10% homology. Lower panels: the distribution of the homologous genes across the species. The homology is color-scaled. (C) Orthology of the 308 MS-detected new proteins. (D) Orthology of the 38 MS confirmed new proteins shown in Figure 2D.
Figure 5.
Figure 5.
Translation efficiency and subcellular localization of the new proteins. (A) Distribution of the translation ratio (TR) of the new proteins and PE1 proteins in 9 cell lines, respectively. * denotes the cell lines in which the new proteins have significantly higher TR than PE1 proteins (P< 10–16, Kolmogorov–Smirnov test). (B) RNA secondary structure stability near the AUG codons of the MS-detected new proteins and PE1 proteins, calculated with a sliding window of ±19 nt. Red lines show the average ΔG, and blue lines denote the upper and lower bounds of the ΔG of such category. (C) The enrichment of subcellular localizations of the MS-detected new proteins, predicted by BLAST2GO. (D) Confocal fluorescent microscopy observation of the subcellular localization of four new proteins. Please refer to Supplementary Figure S6 for more examples.
Figure 6.
Figure 6.
The potential biological function of a new protein UBAP1-AST6. (A) Subcellular localization of UBAP1-AST6, fused by EGFP. (B) Same as (A), UBAP1-AST6 is fused with mCherry. (C) Western blot verification of the nucleus localization of UBAP1-AST6 using three cell lines and three human tissues. (D) The DNA sequence UBAP1-AST6-ATG mut plasmid. The start codon ATG of the UBAP1-AST6 ORF was mutated to GCG to abolish translation initiation. The sequences were verified by Sanger sequencing. (E) qRT-PCR analysis of relative UBAP1-AST6 RNA expression in over-expression (OV) and knock-out (KO) models. A549 cells were infected with LentiViral-flag (OV-Control) or LentiViral-UBAP1-AST6-flag (OV-UBAP1-AST6), followed by qPCR analysis of UBAP1-AST6 relative to GAPDH. Similar analysis was also performed on CRISPR/Cas9 KO and pcDNA3.1-UBAP1-AST6 rescue (KO-rescue) groups. In the rescue models, we included an ATG-mutated pcDNA3.1-UBAP1-AST6 group (KO-rescue-ATG-mut) as a control. (F) Immunoblotting validation of UBAP1-AST6 expression. (G) Proliferation assays using WST-1. n = 3. (H) Colony formation assay. n = 3.

References

    1. Omenn G.S., Lane L., Overall C.M., Corrales F.J., Schwenk J.M., Paik Y.K., Van Eyk J.E., Liu S., Snyder M., Baker M.S. et al. .. Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO human proteome project. J. Proteome Res. 2018; 17:4031–4041. - PMC - PubMed
    1. Gibb E.A., Vucic E.A., Enfield K.S., Stewart G.L., Lonergan K.M., Kennett J.Y., Becker-Santos D.D., MacAulay C.E., Lam S., Brown C.J. et al. .. Human cancer long non-coding RNA transcriptomes. PLoS One. 2011; 6:e25915. - PMC - PubMed
    1. Guttman M., Russell P., Ingolia N.T., Weissman J.S., Lander E.S.. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell. 2013; 154:240–251. - PMC - PubMed
    1. Cech T.R., Steitz J.A.. The noncoding RNA revolution-trashing old rules to forge new ones. Cell. 2014; 157:77–94. - PubMed
    1. St Laurent G., Wahlestedt C., Kapranov P.. The Landscape of long noncoding RNA classification. Trends Genet. 2015; 31:239–251. - PMC - PubMed

Publication types