. 2007 Apr 15;23(8):917-25.

doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

HomologMiner: looking for homologous genomic groups in whole genomes

Minmei Hou¹, Piotr Berman, Chih-Hao Hsu, Robert S Harris

Affiliations

PMID: 17308341
DOI: 10.1093/bioinformatics/btm048

HomologMiner: looking for homologous genomic groups in whole genomes

Minmei Hou et al. Bioinformatics. 2007.

. 2007 Apr 15;23(8):917-25.

doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

Authors

Minmei Hou¹, Piotr Berman, Chih-Hao Hsu, Robert S Harris

Affiliation

¹ Department of Computer Science & Engineering, Penn State University, PA, USA. mhou@cse.psu.edu

PMID: 17308341
DOI: 10.1093/bioinformatics/btm048

Abstract

Motivation: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain.

Results: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families.

Availability: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.

PubMed Disclaimer

Cited by

Identification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome.
Umemori J, Mori A, Ichiyanagi K, Uno T, Koide T. Umemori J, et al. BMC Genomics. 2013 Jul 8;14:455. doi: 10.1186/1471-2164-14-455. BMC Genomics. 2013. PMID: 23834397 Free PMC article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HG02238/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

HomologMiner: looking for homologous genomic groups in whole genomes

Affiliation

HomologMiner: looking for homologous genomic groups in whole genomes

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous