Iterative gene prediction and pseudogene removal improves genome annotation
- PMID: 16651666
- PMCID: PMC1457044
- DOI: 10.1101/gr.4766206
Iterative gene prediction and pseudogene removal improves genome annotation
Abstract
Correct gene prediction is impaired by the presence of processed pseudogenes: nonfunctional, intronless copies of real genes found elsewhere in the genome. Gene prediction programs frequently mistake processed pseudogenes for real genes or exons, leading to biologically irrelevant gene predictions. While methods exist to identify processed pseudogenes in genomes, no attempt has been made to integrate pseudogene removal with gene prediction, or even to provide a freestanding tool that identifies such erroneous gene predictions. We have created PPFINDER (for Processed Pseudogene finder), a program that integrates several methods of processed pseudogene finding in mammalian gene annotations. We used PPFINDER to remove pseudogenes from N-SCAN gene predictions, and show that gene prediction improves substantially when gene prediction and pseudogene masking are interleaved. In addition, we used PPFINDER with gene predictions as a parent database, eliminating the need for libraries of known genes. This allows us to run the gene prediction/PPFINDER procedure on newly sequenced genomes for which few genes are known.
Figures





Similar articles
-
Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.Genome Res. 2002 Feb;12(2):272-80. doi: 10.1101/gr.207102. Genome Res. 2002. PMID: 11827946 Free PMC article.
-
GENCODE pseudogenes.Methods Mol Biol. 2014;1167:129-55. doi: 10.1007/978-1-4939-0835-6_10. Methods Mol Biol. 2014. PMID: 24823776
-
Automatic annotation of eukaryotic genes, pseudogenes and promoters.Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925832 Free PMC article.
-
Gauging the trends of pseudogenes in plants.Crit Rev Biotechnol. 2021 Nov;41(7):1114-1129. doi: 10.1080/07388551.2021.1901648. Epub 2021 May 17. Crit Rev Biotechnol. 2021. PMID: 33993808 Review.
-
Pseudogenes and Their Genome-Wide Prediction in Plants.Int J Mol Sci. 2016 Nov 28;17(12):1991. doi: 10.3390/ijms17121991. Int J Mol Sci. 2016. PMID: 27916797 Free PMC article. Review.
Cited by
-
MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays.RNA. 2008 Aug;14(8):1470-9. doi: 10.1261/rna.1070208. Epub 2008 Jun 19. RNA. 2008. PMID: 18566192 Free PMC article.
-
Genome-wide investigation of WRKY transcription factors in Tartary buckwheat (Fagopyrum tataricum) and their potential roles in regulating growth and development.PeerJ. 2020 Mar 5;8:e8727. doi: 10.7717/peerj.8727. eCollection 2020. PeerJ. 2020. PMID: 32185114 Free PMC article.
-
Revisiting the missing protein-coding gene catalog of the domestic dog.BMC Genomics. 2009 Feb 4;10:62. doi: 10.1186/1471-2164-10-62. BMC Genomics. 2009. PMID: 19193219 Free PMC article.
-
PseudoChecker: an integrated online platform for gene inactivation inference.Nucleic Acids Res. 2020 Jul 2;48(W1):W321-W331. doi: 10.1093/nar/gkaa408. Nucleic Acids Res. 2020. PMID: 32449938 Free PMC article.
-
Targeted discovery of novel human exons by comparative genomics.Genome Res. 2007 Dec;17(12):1763-73. doi: 10.1101/gr.7128207. Epub 2007 Nov 7. Genome Res. 2007. PMID: 17989246 Free PMC article.
References
-
- Ashurst J.L., Chen C.K., Gilbert J.G., Jekosch K., Keenan S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Chen C.K., Gilbert J.G., Jekosch K., Keenan S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Gilbert J.G., Jekosch K., Keenan S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Jekosch K., Keenan S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Keenan S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Meidl P., Searle S.M., Stalker J., Storey R., Trevanion S., Searle S.M., Stalker J., Storey R., Trevanion S., Stalker J., Storey R., Trevanion S., Storey R., Trevanion S., Trevanion S., et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res. 2005;33:D459–D465. - PMC - PubMed
-
- Blanco E., Parra G., Guigo R., Parra G., Guigo R., Guigo R.2003. Using geneid to identify genes. In Current protocols in bioinformatics (ed. D.B. Davison) pp. Unit 4.3. John Wiley & Sons Inc. New York
-
- Burge C., Karlin S., Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases