Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 7;13(3):1757-65.
doi: 10.1021/pr401280w. Epub 2014 Feb 14.

Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue

Affiliations

Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue

Jiao Ma et al. J Proteome Res. .

Abstract

The existence of nonannotated protein-coding human short open reading frames (sORFs) has been revealed through the direct detection of their sORF-encoded polypeptide (SEP) products. The discovery of novel SEPs increases the size of the genome and the proteome and provides insights into the molecular biology of mammalian cells, such as the prevalent usage of non-AUG start codons. Through modifications of the existing SEP-discovery workflow, we discover an additional 195 SEPs in K562 cells and extend this methodology to identify novel human SEPs in additional cell lines and human tissue for a final tally of 237 new SEPs. These results continue to expand the human genome and proteome and demonstrate that SEPs are a ubiquitous class of nonannotated polypeptides that require further investigation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflows tested in the discovery of novel human SEPs. (A) Schematic of the four different SEP discovery workflows used: MWCO+LC–MS; MWCO+ERLIC+LC–MS; PAGE+ERLIC+LC–MS; and PAGE+LC–MS. The K562 peptidome is separated by size using a 30 kDa MWCO filter (MWCO) or polyacrylamide gel electrophoresis (PAGE) and then analyzed directly by LC–MS (first and last lane) or fractionated by ERLIC prior to LC–MS analysis (middle lanes). (B) Number of total SEP and novel SEPs identified in K562 cells using each of the four different SEP discovery workflows.
Figure 2
Figure 2
Biological and technical replicates lead to the discovery of novel SEPs. (A) Number of SEPs detected in four biological replicates of K562 cells. Each of these samples was analyzed using the PAGE+LC–MS SEP discovery workflow. For each replicate, the detected SEPs include the total number of SEPs identified as well as the novel SEPs that were characterized for the first time. (B) Three technical replicates of biological replicate #4 from panel A were performed using the PAGE+LC–MS workflow with K562 peptidome. The total number of SEPs detected in each run (black), nonoverlapping SEPs (gray; SEPs that were not present in either of the other two technical replicates), and novel SEPs (light gray; SEPs that were not detected in any other analysis).
Figure 3
Figure 3
Validating SEPs with targeted mass spectrometry. Analysis of PRR3-SEP by Skyline and subsequent MRM targeted LC–MS identifies additional peptides from this SEP. The tryptic peptide (blue box) that was detected in the original shotgun proteomics experiment led to the initial identification of the PRR3-SEP. To identify additional peptides from PRR3-SEP, we used Skyline to predict MRM transitions for four tryptic peptides from PRR3-SEP, and this information is fed into a targeted LC–MS experiment. This experiment identified peptides for two out of the four peptides and provided an additional two peptides (red and purple boxes) to validate this PRR3-SEP.
Figure 4
Figure 4
Overview of 195 novel SEPs identified in K562 cells. (A) Length of each SEP was determined using a defined set of criteria (see Methods), and the length distribution reveals that the majority (>90%) of SEPs discovered are between 8 and 100 amino acids. (B) SEPs utilize AUG, near cognate codons (i.e., one base away from AUG), and unknown codons to initiate translations. (C) SEPs are primarily derived from nonannoated RNAs (i.e., not found in RefSeq database), but RefSeq RNAs do account for the production of 24% of these SEPs. For the RefSeq-RNAs, the sORFs are found on coding RNAs at the 3′-UTR and CDS and on noncoding RNAs such as antisense RNAs and noncoding RNAs.
Figure 5
Figure 5
SEP derived from MCF10A and MDAMB231 cell lines. (A) Steps in the discovery and validation of SEPs from these cell lines. (B) Total of nine and five SEPs were validated using MRM in the MCF10A and MDAMB231 cell lines, respectively. (C) These 14 validated SEPs were targeted in MCF10A and MDAMB231, while 12 SEPs found in both cell lines, two SEPs, TASP1-SEP, and CAMD8-SEP, were specific to the MDAMB231 cell line.
Figure 6
Figure 6
Discovery of 25 tumor derived SEPs (tdSEPs). (A) Length distribution, (B) initiation codon usage, and (C) RNA source of tdSEPs were similar to the distributions seen for SEPs derived from cell lines.

References

    1. Bertone P.; Stolc V.; Royce T. E.; Rozowsky J. S.; Urban A. E.; Zhu X.; Rinn J. L.; Tongprasit W.; Samanta M.; Weissman S. Global identification of human transcribed sequences with genome tiling arrays. Science 2004, 306, 2242. - PubMed
    1. Wang Z.; Gerstein M.; Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10, 57. - PMC - PubMed
    1. Johnson J. M.; Edwards S.; Shoemaker D.; Schadt E. E. Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 2005, 21, 93. - PubMed
    1. Kapranov P.; Cheng J.; Dike S.; Nix D. A.; Duttagupta R.; Willingham A. T.; Stadler P. F.; Hertel J.; Hackermüller J.; Hofacker I. L. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 2007, 316, 1484. - PubMed
    1. Nagalakshmi U.; Wang Z.; Waern K.; Shou C.; Raha D.; Gerstein M.; Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320, 1344. - PMC - PubMed

Publication types