Genome-wide computational identification and manual annotation of human long noncoding RNA genes
- PMID: 20587619
- PMCID: PMC2905748
- DOI: 10.1261/rna.1951310
Genome-wide computational identification and manual annotation of human long noncoding RNA genes
Abstract
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from approximately 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical-protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
Figures





Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article.
-
Long noncoding RNA repertoire in chicken liver and adipose tissue.Genet Sel Evol. 2017 Jan 10;49(1):6. doi: 10.1186/s12711-016-0275-0. Genet Sel Evol. 2017. PMID: 28073357 Free PMC article.
-
Differentiating protein-coding and noncoding RNA: challenges and ambiguities.PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28. PLoS Comput Biol. 2008. PMID: 19043537 Free PMC article. Review.
-
Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs.Biochim Biophys Acta. 2016 Jan;1859(1):31-40. doi: 10.1016/j.bbagrm.2015.07.017. Epub 2015 Aug 8. Biochim Biophys Acta. 2016. PMID: 26265145 Review.
Cited by
-
The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.Genome Res. 2012 Sep;22(9):1775-89. doi: 10.1101/gr.132159.111. Genome Res. 2012. PMID: 22955988 Free PMC article.
-
Molecular mechanisms and function prediction of long noncoding RNA.ScientificWorldJournal. 2012;2012:541786. doi: 10.1100/2012/541786. Epub 2012 Dec 23. ScientificWorldJournal. 2012. PMID: 23319885 Free PMC article. Review.
-
High-throughput RNA sequencing reveals structural differences of orthologous brain-expressed genes between western lowland gorillas and humans.J Comp Neurol. 2016 Feb 1;524(2):288-308. doi: 10.1002/cne.23843. Epub 2015 Aug 20. J Comp Neurol. 2016. PMID: 26132897 Free PMC article.
-
The role of long non-coding RNAs in genome formatting and expression.Front Genet. 2015 Apr 29;6:165. doi: 10.3389/fgene.2015.00165. eCollection 2015. Front Genet. 2015. PMID: 25972893 Free PMC article. Review.
-
Neurodegeneration as an RNA disorder.Prog Neurobiol. 2012 Dec;99(3):293-315. doi: 10.1016/j.pneurobio.2012.09.006. Epub 2012 Oct 10. Prog Neurobiol. 2012. PMID: 23063563 Free PMC article. Review.
References
-
- Carninci P, Hayashizaki Y 2007. Noncoding RNA transcription beyond annotated genes. Curr Opin Genet Dev 17: 139–144 - PubMed
-
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 1559–1563 - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources