Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 29;17(1):140.
doi: 10.1186/s12911-017-0537-y.

A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease

Affiliations

A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease

Jean-Baptiste Escudié et al. BMC Med Inform Decis Mak. .

Abstract

Background: Data collected in EHRs have been widely used to identifying specific conditions; however there is still a need for methods to define comorbidities and sources to identify comorbidities burden. We propose an approach to assess comorbidities burden for a specific disease using the literature and EHR data sources in the case of autoimmune diseases in celiac disease (CD).

Methods: We generated a restricted set of comorbidities using the literature (via the MeSH® co-occurrence file). We extracted the 15 most co-occurring autoimmune diseases of the CD. We used mappings of the comorbidities to EHR terminologies: ICD-10 (billing codes), ATC (drugs) and UMLS (clinical reports). Finally, we extracted the concepts from the different data sources. We evaluated our approach using the correlation between prevalence estimates in our cohort and co-occurrence ranking in the literature.

Results: We retrieved the comorbidities for 741 patients with CD. 18.1% of patients had at least one of the 15 studied autoimmune disorders. Overall, 79.3% of the mapped concepts were detected only in text, 5.3% only in ICD codes and/or drugs prescriptions, and 15.4% could be found in both sources. Prevalence in our cohort were correlated with literature (Spearman's coefficient 0.789, p = 0.0005). The three most prevalent comorbidities were thyroiditis 12.6% (95% CI 10.1-14.9), type 1 diabetes 2.3% (95% CI 1.2-3.4) and dermatitis herpetiformis 2.0% (95% CI 1.0-3.0).

Conclusion: We introduced a process that leveraged the MeSH terminology to identify relevant autoimmune comorbidities of the CD and several data sources from EHRs to phenotype a large population of CD patients. We achieved prevalence estimates comparable to the literature.

Keywords: Addison disease; Antiphospholipid syndrome; Arthritis, juvenile; Arthritis, rheumatoid; Autoimmune diseases; Celiac disease; Dermatitis herpetiformis; Diabetes mellitus, type 1; Electronic health records; Graves’ disease; Hepatitis, autoimmune; Icd 10; Lupus erythematosus, systemic; Multiple sclerosis; Myasthenia gravis; Phenotype; Polyendocrinopathies, autoimmune; Prevalence study; Sjogren’s syndrome; Thyroiditis, autoimmune.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

This study has been approved by the institutional review board for observational study (Comité éthique de recherche des hôpitaux universitaires Paris Ouest) under registration #00001072.

Patient consent for reuse of their EHR on an opt-out basis is declared at the Commission Nationale de l’Informatique et des Libertés, references #1743502 and #1695855v0.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Patients inclusion flowchart
Fig. 2
Fig. 2
Workflow from comorbidities selection to comorbidities burden phenotyping
Fig. 3
Fig. 3
Phenotypes identified by text reviewing only (black), ICD codes or drugs only (grey), both (light grey), in percent

References

    1. Jannot AS, Zapletal E, Avillach P, Mamzer MF, Burgun A, Degoulet P. The Georges Pompidou University Hospital Clinical Data Warehouse: a 8-years follow-up experience. Int J Med Inform. 2017;102:21–28. doi: 10.1016/j.ijmedinf.2017.02.006. - DOI - PubMed
    1. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230. doi: 10.1136/amiajnl-2013-001935. - DOI - PMC - PubMed
    1. Conway M, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, et al. Analyzing the heterogeneity and complexity of electronic health record oriented phenotyping algorithms. AMIA Annu Symp Proc. 2011;2011:274–283. - PMC - PubMed
    1. Benchimol EI, Guttmann A, Mack DR, Nguyen GC, Marshall JK, Gregor JC, et al. Validation of international algorithms to identify adults with inflammatory bowel disease in health administrative data from Ontario, Canada. J Clin Epidemiol. 2014;67:887–896. doi: 10.1016/j.jclinepi.2014.02.019. - DOI - PubMed
    1. Bertaud V, Lasbleiz J, Mougin F, Burgun A, Duvauferrier R. A unified representation of findings in clinical radiology using the UMLS and DICOM. Int J Med Inf. 2008;77:621–629. doi: 10.1016/j.ijmedinf.2007.11.003. - DOI - PubMed