Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 1:10:205.
doi: 10.1186/1471-2105-10-205.

Clique-based data mining for related genes in a biomedical database

Affiliations

Clique-based data mining for related genes in a biomedical database

Tsutomu Matsunaga et al. BMC Bioinformatics. .

Abstract

Background: Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph.

Results: We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes.

Conclusion: We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of a biomedical relational graph. Hypertension and hypertension-related genes are represented by nodes, and the associations between them are represented by edges.
Figure 2
Figure 2
188 modules associated with obesity, diabetes, hyperlipidemia, and hypertension. The vertical bars indicate the genes/diseases in the modules, the columns represent modules, and the rows represent the genes/diseases. The rows and columns are sorted in the ascending order by the score calculated by correspondence analysis (see Methods). Red, grey, blue, and green bars respectively indicate cliques that contain the hypertension, hyperlipidemia, obesity, and diabetes nodes. The letters 't,' 'l,' 'o,' and 'd' on the right show that in the literature the genes are related to hypertension, hyperlipidemia, obesity, and diabetes (see text for the literature references).

Similar articles

Cited by

References

    1. Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet. 2006;7:119–129. - PubMed
    1. Matsunaga T, Muramatsu M. Disease-related concept mining by knowledge-based two-dimensional gene mapping. J Bioinform Comput Biol. 2007;5:1047–1067. - PubMed
    1. Galperin MY. The molecular biology database collection: 2008 update. Nucleic Acids Res. 2008;36:D2–D4. - PMC - PubMed
    1. Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA. Online Mendelian Inheritance in Man (OMIM) Hum Mutat. 2000;15:57–61. - PubMed
    1. Oda K, Matsuoka Y, Funahashi A, Kitano H. A comprehensive pathway map of epidermal growth factor receptor signaling. Mol Syst Biol. 2005;1 2005.0010. - PMC - PubMed

MeSH terms

LinkOut - more resources