Intrinsic and extrinsic approaches for detecting genes in a bacterial genome
- PMID: 7984428
- PMCID: PMC308528
- DOI: 10.1093/nar/22.22.4756
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome
Abstract
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.
Similar articles
-
Computer survey for likely genes in the one megabase contiguous genomic sequence data of Synechocystis sp. strain PCC6803.DNA Res. 1995 Dec 31;2(6):239-46. doi: 10.1093/dnares/2.6.239. DNA Res. 1995. PMID: 8867797
-
Detection of new genes in a bacterial genome using Markov models for three gene classes.Nucleic Acids Res. 1995 Sep 11;23(17):3554-62. doi: 10.1093/nar/23.17.3554. Nucleic Acids Res. 1995. PMID: 7567469 Free PMC article.
-
How to interpret an anonymous bacterial genome: machine learning approach to gene identification.Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154. Genome Res. 1998. PMID: 9847079
-
Functional genomics of Escherichia coli in Japan.Res Microbiol. 2000 Mar;151(2):121-8. doi: 10.1016/s0923-2508(00)00119-4. Res Microbiol. 2000. PMID: 10865957 Review.
-
Genome sequences: genome sequence of a model prokaryote.Curr Biol. 1997 Oct 1;7(10):R656-9. doi: 10.1016/s0960-9822(06)00328-9. Curr Biol. 1997. PMID: 9368752 Review.
Cited by
-
Molecular interactions of Escherichia coli ExoIX and identification of its associated 3'-5' exonuclease activity.Nucleic Acids Res. 2007;35(12):4094-102. doi: 10.1093/nar/gkm396. Epub 2007 Jun 12. Nucleic Acids Res. 2007. PMID: 17567612 Free PMC article.
-
Improving gene annotation of complete viral genomes.Nucleic Acids Res. 2003 Dec 1;31(23):7041-55. doi: 10.1093/nar/gkg878. Nucleic Acids Res. 2003. PMID: 14627837 Free PMC article.
-
The Yersinia enterocolitica phospholipase gene yplA is part of the flagellar regulon.J Bacteriol. 2000 Apr;182(8):2314-20. doi: 10.1128/JB.182.8.2314-2320.2000. J Bacteriol. 2000. PMID: 10735878 Free PMC article.
-
Proton-dependent multidrug efflux systems.Microbiol Rev. 1996 Dec;60(4):575-608. doi: 10.1128/mr.60.4.575-608.1996. Microbiol Rev. 1996. PMID: 8987357 Free PMC article. Review.
-
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.Viruses. 2010 Oct;2(10):2258-2268. doi: 10.3390/v2102258. Epub 2010 Oct 13. Viruses. 2010. PMID: 21994619 Free PMC article.
References
Publication types
MeSH terms
Associated data
- Actions
- Actions
- Actions
- Actions
- Actions
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials