Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution
- PMID: 17568002
- PMCID: PMC1891343
- DOI: 10.1101/gr.5586307
Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution
Abstract
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
Figures









Similar articles
-
GENCODE Pseudogenes.Methods Mol Biol. 2021;2324:67-82. doi: 10.1007/978-1-0716-1503-4_5. Methods Mol Biol. 2021. PMID: 34165709
-
The GENCODE pseudogene resource.Genome Biol. 2012 Sep 26;13(9):R51. doi: 10.1186/gb-2012-13-9-r51. Genome Biol. 2012. PMID: 22951037 Free PMC article.
-
Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes.Gene. 2007 Mar 15;389(2):196-203. doi: 10.1016/j.gene.2006.11.007. Epub 2006 Nov 18. Gene. 2007. PMID: 17196768
-
Computational Methods for Pseudogene Annotation Based on Sequence Homology.Methods Mol Biol. 2021;2324:35-48. doi: 10.1007/978-1-0716-1503-4_3. Methods Mol Biol. 2021. PMID: 34165707 Review.
-
Vertebrate pseudogenes.FEBS Lett. 2000 Feb 25;468(2-3):109-14. doi: 10.1016/s0014-5793(00)01199-6. FEBS Lett. 2000. PMID: 10692568 Review.
Cited by
-
Natural variability of minimotifs in 1092 people indicates that minimotifs are targets of evolution.Nucleic Acids Res. 2015 Jul 27;43(13):6399-412. doi: 10.1093/nar/gkv580. Epub 2015 Jun 11. Nucleic Acids Res. 2015. PMID: 26068475 Free PMC article.
-
Human genomic disease variants: a neutral evolutionary explanation.Genome Res. 2012 Aug;22(8):1383-94. doi: 10.1101/gr.133702.111. Epub 2012 Jun 4. Genome Res. 2012. PMID: 22665443 Free PMC article. Review.
-
LINEs, SINEs and other retroelements: do birds of a feather flock together?Front Biosci (Landmark Ed). 2012 Jan 1;17(4):1345-61. doi: 10.2741/3991. Front Biosci (Landmark Ed). 2012. PMID: 22201808 Free PMC article. Review.
-
Pseudogenes as an alternative source of natural antisense transcripts.BMC Evol Biol. 2010 Nov 3;10:338. doi: 10.1186/1471-2148-10-338. BMC Evol Biol. 2010. PMID: 21047404 Free PMC article.
-
Characterization of human pseudogene-derived non-coding RNAs for functional potential.PLoS One. 2014 Apr 3;9(4):e93972. doi: 10.1371/journal.pone.0093972. eCollection 2014. PLoS One. 2014. PMID: 24699680 Free PMC article.
References
-
- Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Gasteiger E., Huang H., Lopez R., Magrane M., Huang H., Lopez R., Magrane M., Lopez R., Magrane M., Magrane M., et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33:D154–D159. - PMC - PubMed
-
- Balakirev E.S., Ayala F.J., Ayala F.J. Pseudogenes: Are they “junk” or functional DNA? Annu. Rev. Genet. 2003;37:123–151. - PubMed
-
- Bertone P., Stolc V., Royce T.E., Rozowsky J.S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Stolc V., Royce T.E., Rozowsky J.S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Royce T.E., Rozowsky J.S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Rozowsky J.S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Rinn J.L., Tongprasit W., Samanta M., Weissman S., Tongprasit W., Samanta M., Weissman S., Samanta M., Weissman S., Weissman S., et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004;306:2242–2246. - PubMed
-
- Bischof J.M., Chiang A.P., Scheetz T.E., Stone E.M., Casavant T.L., Sheffield V.C., Braun T.A., Chiang A.P., Scheetz T.E., Stone E.M., Casavant T.L., Sheffield V.C., Braun T.A., Scheetz T.E., Stone E.M., Casavant T.L., Sheffield V.C., Braun T.A., Stone E.M., Casavant T.L., Sheffield V.C., Braun T.A., Casavant T.L., Sheffield V.C., Braun T.A., Sheffield V.C., Braun T.A., Braun T.A. Genome-wide identification of pseudogenes capable of disease-causing gene conversion. Hum. Mutat. 2006;27:545–552. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials