Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan;9(1):59-64.
doi: 10.1038/nchembio.1120. Epub 2012 Nov 18.

Peptidomic discovery of short open reading frame-encoded peptides in human cells

Affiliations

Peptidomic discovery of short open reading frame-encoded peptides in human cells

Sarah A Slavoff et al. Nat Chem Biol. 2013 Jan.

Abstract

The complete extent to which the human genome is translated into polypeptides is of fundamental importance. We report a peptidomic strategy to detect short open reading frame (sORF)-encoded polypeptides (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously uncharacterized, which is the largest number of human SEPs ever reported. SEP abundances range from 10-1,000 molecules per cell, identical to abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as well as multicistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that noncanonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8 out of 1,866) of long intergenic noncoding RNAs. Together, these results provide strong evidence that the human proteome is more complex than previously appreciated.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Discovering SEPs
(a) An LC-MS/MS-based peptidomics platform was used to profile K562 cells. The MS/MS data were searched against a custom protein database (RefSeq or RNA-seq) to identify polypeptides in K562 cells. Peptides shorter than 8 amino acids were discarded. Tryptic peptides that were exact matches to a segment of an annotated protein were computationally filtered. In addition, tryptic peptides that differed from annotated proteins by a single amino acid were also removed to avoid the false identifications arising from point mutations in known proteins. The sequence assignment of these putative SEPs was validated by visual inspection of the tandem MS spectra. Lastly, K562 RNA-seq data to verify that that detected peptides were derived from a sORF rather than an unannotated ORF longer than 450 nucleotides or a mutated annotated ORF. Any tryptic peptide that fit these criteria was identified as arising from a novel human SEP. (b) We experimentally validated one of these assignments by chemically synthesizing the diagnostic peptide and comparing its tandem MS spectra of that of the endogenous peptide. This particular peptide is derived from a sORF found on a non-coding RNA (chr16:86563805-86589025).
Fig. 2
Fig. 2. Overview of SEPs
(a) RNA maps illustrating the categories of sORFs that are translated into SEPs, including 5′UTR, CDS, 3′UTR, non-coding RNAs and antisense RNAs. The gray arrow represents the RNA, the blue arrow represents annotated protein CDS (if present), and the yellow arrow represents the sORF. (b) Incidence of SEPs in each category within RefSeq mRNAs. (c) Using protein databases derived from K562 RNA-seq data revealed an additional 54 SEPs for a total of 90 human SEPs, 86 of which are novel. SEP length was estimated by defining sORFs as follows: when present, an upstream in-frame AUG was assumed to be the initiation codon. If no upstream AUG was present, the initiation codon was assigned to an in-frame near-cognate non-AUG codon embedded within a Kozak-consensus sequence . In a few cases, neither of these conditions was met, so the codon immediately following an upstream stop codon was used to determine maximal SEP length. (d) Probable sORF initiation codon usage. (Note: RNA maps are not to scale. See Supplementary Fig. 12 for lengths of the RNAs and sORFs.)
Fig. 3
Fig. 3. SEP quantitation
(a) SEPs were quantified by isotope dilution mass spectrometry (IDMS). We synthesized a deuterated (heavy-labeled) variant of the diagnostic SEP peptide we detected. Upon isolation of K562 cells this peptide was added and the entire mixture was prepared using our standard approach to isolate SEPs. SEPs are then quantified by comparing the peak areas for the deuterated peptide to the endogenous peptide by LC-MS. Since the concentration of the deuterated SEP is known this enables the absolute amount of the endogenous SEP to be determined. Overlap between the endogenous SEP and the deuterated SEP in the LC-MS chromatogram. (b) Matching MS/MS spectra (note: 10 Da shift for heavy peptide for some fragments) confirm the peptide sequence assignment in addition to quantifying the peptide.
Fig. 4
Fig. 4. Expression of SEPs
(a) Transient transfection of HEK293T cells with constructs containing a cDNA sequence corresponding to the full-length RefSeq mRNA (i.e., including the 5′- and 3′-UTRs). We appended a C-terminal FLAG-tag on the SEP coding sequence that could be detected by immunofluorescence. In these images the nuclei are stained with DAPI (blue) and the SEPs are detected with anti-FLAG antibody (green). ASNSD1-SEP and FRAT2-SEP sORFs in the 5′-UTR (uORFs) but FRAT2-SEP starts with a non-AUG codon. DEDD2-SEP (CDS) and H2AFx-SEP (3′-UTR) were not translated from the RefSeq RNAs, which is consistent with a scanning model of eukaryotic translation. (b) DEDD2-SEP was subcloned and expressed in HeLa cells to examine is expression and localization by immunofluorescence. Co-staining with MitoTracker (red) indicated that the DEDD2-SEP localizes to the mitochondria (overlay). (Note: RNA maps are not to scale. See Supplementary Fig. 12 for lengths of the RNAs and sORFs.)
Fig. 5
Fig. 5. Characterization of the non-AUG initiation codon of the FRAT2-SEP sORF
(a) An ACG was confirmed as the FRAT2-SEP initiation codon by site-directed mutagenesis followed by western blots of FRAT2-SEP-FLAG using an anti-FLAG antibody. Conversion of the ACG to an ATG resulted in higher expression (lane 2), while ablation of this codon removed all expression (lane 3). In addition, perturbation of the Kozak sequence (lanes 4-7) revealed the importance of context when using non-AUG codons, as substitution of less favorable residues at the most important positions in the Kozak sequence resulted in lower FRAT2-SEP-FLAG expression. (b) Epitope tagging of the sORF and CDS of the FRAT2 mRNA demonstrates that the FRAT2 mRNA is bi-cistronic. Specifically, the FRAT2 CDS was c-myc tagged and the FRAT2-SEP was FLAG tagged. Conversion of the FRAT2-SEP initiation codon from ACG to ATG ablates the expression of the downstream FRAT2-CDS, indicating the importance of alternate start codons for polycistronic expression. (Note: RNA maps are not to scale. See Supplementary Fig. 12 for lengths of the RNAs and sORFs.)

Comment in

References

    1. Frith MC, et al. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2006;2:e52. - PMC - PubMed
    1. Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of Mammalian proteomes. Cell. 2011;147:789–802. - PMC - PubMed
    1. Zhang F, Hinnebusch AG. An upstream ORF with non-AUG start codon is translated in vivo but dispensable for translational control of GCN4 mRNA. Nucleic Acids Res. 2011;39:3128–3140. - PMC - PubMed
    1. Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci U S A. 2009;106:7507–7512. - PMC - PubMed
    1. Abastado JP, Miller PF, Hinnebusch AG. A quantitative model for translational control of the GCN4 gene of Saccharomyces cerevisiae. New Biol. 1991;3:511–524. - PubMed

Publication types