Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 16;7 Suppl 3(Suppl 3):S9.
doi: 10.1186/1752-0509-7-S3-S9.

A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks

A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks

Zuoshuang Xiang et al. BMC Syst Biol. .

Abstract

Background: The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.

Results: The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists.

Conclusions: The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The GenoMesh algorithm.
Figure 2
Figure 2
ROC curve comparison of different methods for MeSH term weighting and gene-to-gene dissimilarity calculations.
Figure 3
Figure 3
Clusters of E. coli genes involved in E. coli flagella biogenesis. (A) Thirty-two E. coli flagellar genes were clustered together; (B) Six E. coli flagellar genes were clustered together. The neighbour branch of the six-gene branch includes five E. coli genes.
Figure 4
Figure 4
A cluster of Brucella genes that includes 8 virB genes.
Figure 5
Figure 5
Histogram analyses of average dissimilarity scores of random networks. The peaks and shapes of the curves are affected by the number of genes included in the random networks.
Figure 6
Figure 6
Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).

Similar articles

Cited by

References

    1. The PubMed database. http://www.ncbi.nlm.nih.gov/pubmed/
    1. Lipscomb CE. Medical Subject Headings (MeSH) Bull Med Libr Assoc. 2000;7(3):265–266. - PMC - PubMed
    1. MeSH fact sheet. URL: http://www.nlm.nih.gov/pubs/factsheets/mesh.html, accessed on March 23, 2013.
    1. Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol. 1999. pp. 60–67. - PubMed
    1. Jenssen TK, Laegreid A, Komorowski J, Hovig E. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet. 2001;7(1):21–28. - PubMed

Publication types

LinkOut - more resources