GENCODE: producing a reference annotation for ENCODE
- PMID: 16925838
- PMCID: PMC1810553
- DOI: 10.1186/gb-2006-7-s1-s4
GENCODE: producing a reference annotation for ENCODE
Abstract
Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.
Results: The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.
Conclusion: In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
Figures




Similar articles
-
AceView: a comprehensive cDNA-supported gene and transcripts annotation.Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925834 Free PMC article.
-
GENCODE: the reference human genome annotation for The ENCODE Project.Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111. Genome Res. 2012. PMID: 22955987 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
EGASP: the human ENCODE Genome Annotation Assessment Project.Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
-
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925837 Free PMC article. Review.
Cited by
-
Integrative annotation of chromatin elements from ENCODE data.Nucleic Acids Res. 2013 Jan;41(2):827-41. doi: 10.1093/nar/gks1284. Epub 2012 Dec 5. Nucleic Acids Res. 2013. PMID: 23221638 Free PMC article.
-
mRNA profiling reveals determinants of trastuzumab efficiency in HER2-positive breast cancer.PLoS One. 2015 Feb 24;10(2):e0117818. doi: 10.1371/journal.pone.0117818. eCollection 2015. PLoS One. 2015. PMID: 25710561 Free PMC article.
-
The GENCODE pseudogene resource.Genome Biol. 2012 Sep 26;13(9):R51. doi: 10.1186/gb-2012-13-9-r51. Genome Biol. 2012. PMID: 22951037 Free PMC article.
-
Noncoding RNAs in apoptosis: identification and function.Turk J Biol. 2021 Nov 14;46(1):1-40. doi: 10.3906/biy-2109-35. eCollection 2022. Turk J Biol. 2021. PMID: 37533667 Free PMC article. Review.
-
HCV-Induced Epigenetic Changes Associated With Liver Cancer Risk Persist After Sustained Virologic Response.Gastroenterology. 2019 Jun;156(8):2313-2329.e7. doi: 10.1053/j.gastro.2019.02.038. Epub 2019 Mar 2. Gastroenterology. 2019. PMID: 30836093 Free PMC article.
References
-
- International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed
-
- Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. - PubMed
-
- ENCODE project consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. - PubMed
-
- GENCODE Consortium http://genome.imim.es/gencode
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases