Finding errors in DNA sequences
- PMID: 1316617
- PMCID: PMC49150
- DOI: 10.1073/pnas.89.10.4698
Finding errors in DNA sequences
Abstract
An algorithm is described that can detect certain errors within coding regions of DNA sequences. The algorithm is based on the idea that an insertion or deletion error within a coding sequence would interrupt the reading frame and cause the correct translation of a DNA sequence to require one or more frameshifts. If the coding sequence shows similarity to a known protein sequence then such errors can be detected by comparing the conceptual translations of DNA sequences in all six reading frames with every sequence in a protein sequence data base. We have incorporated these ideas into a computer program, called DETECT, that can serve as an aid to the experimentalist who is determining new DNA sequences so that obvious errors may be located and corrected. The program has been tested using raw experimental data and against sequences from the European Molecular Biology Laboratory data base, annotated as containing frameshifts. We have also tested it using unidentified open reading frames that flank known, annotated genes in the GenBank data base. Many potential errors are apparent and in some cases functions can be suggested for the "corrected" versions of these reading frames leading to the identification of new genes. As more sequences are determined the power of this method will increase substantially.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Comparison of DNA sequences with protein sequences.Genomics. 1997 Nov 15;46(1):24-36. doi: 10.1006/geno.1997.4995. Genomics. 1997. PMID: 9403055
-
PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.Nucleic Acids Res. 1996 Jul 15;24(14):2730-9. doi: 10.1093/nar/24.14.2730. Nucleic Acids Res. 1996. PMID: 8759004 Free PMC article.
-
Finding homologs to nucleic acid or protein sequences using the framesearch program.Curr Protoc Bioinformatics. 2002 Aug;Chapter 3:Unit 3.2. doi: 10.1002/0471250953.bi0302s00. Curr Protoc Bioinformatics. 2002. PMID: 18792937 Review.
-
Computer analysis of DNA and protein sequences.Eur J Biochem. 1991 Jul 15;199(2):253-6. doi: 10.1111/j.1432-1033.1991.tb16117.x. Eur J Biochem. 1991. PMID: 1712725 Review.
Cited by
-
A frameshift error detection algorithm for DNA sequencing projects.Nucleic Acids Res. 1995 Aug 11;23(15):2900-8. doi: 10.1093/nar/23.15.2900. Nucleic Acids Res. 1995. PMID: 7659513 Free PMC article.
-
Segmentally variable genes: a new perspective on adaptation.PLoS Biol. 2004 Apr;2(4):E81. doi: 10.1371/journal.pbio.0020081. Epub 2004 Apr 13. PLoS Biol. 2004. PMID: 15094797 Free PMC article.
-
ICDS database: interrupted CoDing sequences in prokaryotic genomes.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D338-43. doi: 10.1093/nar/gkj060. Nucleic Acids Res. 2006. PMID: 16381882 Free PMC article.
-
Error and error mitigation in low-coverage genome assemblies.PLoS One. 2011 Feb 14;6(2):e17034. doi: 10.1371/journal.pone.0017034. PLoS One. 2011. PMID: 21340033 Free PMC article.
-
Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes.Genome Res. 2008 Jul;18(7):1133-42. doi: 10.1101/gr.074344.107. Epub 2008 Apr 21. Genome Res. 2008. PMID: 18426904 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials