Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug;7(8):e1002141.
doi: 10.1371/journal.pcbi.1002141. Epub 2011 Aug 25.

Using electronic patient records to discover disease correlations and stratify patient cohorts

Affiliations

Using electronic patient records to discover disease correlations and stratify patient cohorts

Francisco S Roque et al. PLoS Comput Biol. 2011 Aug.

Abstract

Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Disease chapter networks.
ICD10 Chapters are shown as nodes; links represent correlations. Link weight represents correlation strength between two chapters; node area represents the proportion of codes from that chapter in the entire corpus. (A) Network based on the assigned codes for each patient. Most frequent chapter is chapter V ‘Mental and behavioral disorders’ with a frequency of 81%. The strongest correlation is between chapters V and XXI with a cosine similarity score of 0.45. Chapters IX, ‘Diseases of the circulatory system’ and IV ‘Endocrine, nutritional and metabolic diseases’ have a score of 0.3. (B) Full network containing both the assigned and mined codes for all patients. Chapters V and XVIII have a frequency of 24% and 35% respectively, and have a score of 0.92. After mining, ‘Diseases of the respiratory system’ - chapter X, and ‘Injury, poisoning and certain other consequences of external causes’ - chapter XIX, now have a cosine similarity score of 0.6 and 0.78, respectively.
Figure 2
Figure 2. Disease-disease correlations.
Heatmap of the most significant 100 ICD10 codes, based on ranking the list of 802 candidate pairs by their comorbidity scores. Chapter colors are highlighted next to the ICD10 codes. Diseases that occur often together have red color in the heatmap, while those with lower than expected co-occurrence are colored blue. The color label shows the log2 change of comorbidity between two diseases when compared to the expected level.
Figure 3
Figure 3. Patient cohort network.
(A) Nodes represent 1,497 patients from 26 clusters. Edges are correlations between patients. Node color denotes cluster membership. (B) Heatmap showing ICD10 composition of each cluster. Values are the fraction of the cluster ICD10 vector covered by this code. Shown are only the 26 ICD10 codes that are most distinguishing codes for a cluster. The heatmap columns match the network clusters in a counter clockwise direction starting at cluster 27.

References

    1. Haux R, Ammenwerth E, Herzog W, Knaup P. Health care in the information society. A prognosis for the year 2013. Int J Med Inform. 2002;66:3–21. - PubMed
    1. Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48:38–44. - PubMed
    1. DesRoches CM, Campbell EG, Rao SR, Donelan K, Ferris TG, et al. Electronic health records in ambulatory care–a national survey of physicians. N Engl J Med. 2008;359:50–60. - PubMed
    1. Hoffman S. Electronic health records and research: privacy versus scientific priorities. Am J Bioeth. 2010;10:19–20. - PubMed
    1. Greenhalgh T, Stramer K, Bratan T, Byrne E, Russell J, et al. Adoption and non-adoption of a shared electronic summary record in England: a mixed-method case study. BMJ. 2010;340:c3111. - PubMed

Publication types