Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009;4(4):e5203.
doi: 10.1371/journal.pone.0005203. Epub 2009 Apr 13.

Exploring clinical associations using '-omics' based enrichment analyses

Affiliations

Exploring clinical associations using '-omics' based enrichment analyses

David A Hanauer et al. PLoS One. 2009.

Abstract

Background: The vast amounts of clinical data collected in electronic health records (EHR) is analogous to the data explosion from the "-omics" revolution. In the EHR clinicians often maintain patient-specific problem summary lists which are used to provide a concise overview of significant medical diagnoses. We hypothesized that by tapping into the collective wisdom generated by hundreds of physicians entering problems into the EHR we could detect significant associations among diagnoses that are not described in the literature.

Methodology/principal findings: We employed an analytic approach original developed for detecting associations between sets of gene expression data, called Molecular Concept Map (MCM), to find significant associations among the 1.5 million clinical problem summary list entries in 327,000 patients from our institution's EHR. An odds ratio (OR) and p-value was calculated for each association. A subset of the 750,000 associations found were explored using the MCM tool. Expected associations were confirmed and recently reported but poorly known associations were uncovered. Novel associations which may warrant further exploration were also found. Examples of expected associations included non-insulin dependent diabetes mellitus and various diagnoses such as retinopathy, hypertension, and coronary artery disease. A recently reported association included irritable bowel and vulvodynia (OR 2.9, p = 5.6x10(-4)). Associations that are currently unknown or very poorly known included those between granuloma annulare and osteoarthritis (OR 4.3, p = 1.1x10(-4)) and pyloric stenosis and ventricular septal defect (OR 12.1, p = 2.0x10(-3)).

Conclusions/significance: Computer programs developed for analyses of "-omic" data can be successfully applied to the area of clinical medicine. The results of the analysis may be useful for hypothesis generation as well as supporting clinical care by reminding clinicians of likely problems associated with a patient's existing problems.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Commercial use of the MCM tool has been licensed to Compendia Biosciences, in which A.M.C. and D.R.R. are shareholders. D.R.R. also serves as the CEO of Compendia Biosciences.

Figures

Figure 1
Figure 1. Overall network diagram containing 1106 nodes and 1939 edges showing the most significant problem category associations using an odds ratio>100.0 and p-value<1.0×10−10 as thresholds for inclusion.
Nodes are roughly proportional to the number of times each problem appears in the problem summary list (PSL) and only nodes with more than 100 occurrences are shown. Problems are color-coded based on the general area in medicine in which the problem would likely be diagnosed or followed. At this level several clusters of related problems can be seen, some of which are labeled above.
Figure 2
Figure 2. Network graphs showing well-known clinical associations.
Node size represents the approximate number of diagnoses in the database and edges represent significant associations between nodes. Node colors are designated according to the legend in Figure 1. 2A displays the complex network of associations linked to the diagnosis of “noninsulin dependent diabetes mellitus” (type 2 diabetes mellitus) using an odds ratio of 1.25 or greater. While the network is mostly interconnected, “cataracts” are not directly associated with either “obesity” or “sleep apnea”. 2B displays the same diagnoses associated with NIDDM using an odds ratio of 8.0 or more as a threshold for connections between nodes. At this odds ratio less significant associations drop out and stronger ones persist. 2C shows common associations with the diagnosis “Turner syndrome” using an odds ratio of 1.25 or greater. “Horsehoe kidney” and “ovarian failure” are independently associated with Turner syndrome, whereas the cardiac defects are associated with one another. Coarctation appears twice because of the free text variability of the diagnoses.
Figure 3
Figure 3. Examples of network graphs used to help identify unexpected associations and form hypotheses about the meaning of the associations.
Figure 3A shows a network graph with selected associations for the diagnosis “vulvodynia” using a threshold for edges as odds ratio of 2.5 or more and p-value of 1.0×10−3 or less. “Fibromyalgia” and “irritable bowel” are associated with “vulvodynia” independently from the other inter-related gynecologic diagnoses. Figure 3B displays a network graph showing the associations between “shingles”, “hypothyroidism”, and other cancer-related diagnoses, using a threshold for edges as odds ratio of 1.75 or more and p value of 1.0×10−4 or less. Use of such a network helps to determine that the relationship between “shingles” and “hypothyroidism” may be due to cancer therapies. Node size represents the approximate number of diagnoses in the database. Node colors are designated according to the legend in Figure 1.

References

    1. Prather JC, Lobach DF, Goodwin LK, Hales JW, Hage ML, et al. Medical data mining: knowledge discovery in a clinical data warehouse. Proc AMIA Annu Fall Symp. 1997:101–105. - PMC - PubMed
    1. Yang J, Logan J. A data mining and survey study on diseases associated with paraesophageal hernia. AMIA Annu Symp Proc. 2006:829–833. - PMC - PubMed
    1. Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comput Biol Med. 2006;36:1351–1377. - PubMed
    1. Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak G. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. AMIA Annu Symp Proc. 2005:106–110. - PMC - PubMed
    1. Rhodes DR, Kalyana-Sundaram S, Tomlins SA, Mahavisno V, Kasper N, et al. Molecular concepts analysis links tumors, pathways, mechanisms, and drugs. Neoplasia. 2007;9:443–454. - PMC - PubMed

Publication types

MeSH terms