Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Oct;177(7):601-612.
doi: 10.1002/ajmg.b.32548. Epub 2017 May 30.

The use of electronic health records for psychiatric phenotyping and genomics

Affiliations
Review

The use of electronic health records for psychiatric phenotyping and genomics

Jordan W Smoller. Am J Med Genet B Neuropsychiatr Genet. 2018 Oct.

Abstract

The widespread adoption of electronic health record (EHRs) in healthcare systems has created a vast and continuously growing resource of clinical data and provides new opportunities for population-based research. In particular, the linking of EHRs to biospecimens and genomic data in biobanks may help address what has become a rate-limiting study for genetic research: the need for large sample sizes. The principal roadblock to capitalizing on these resources is the need to establish the validity of phenotypes extracted from the EHR. For psychiatric genetic research, this represents a particular challenge given that diagnosis is based on patient reports and clinician observations that may not be well-captured in billing codes or narrative records. This review addresses the opportunities and pitfalls in EHR-based phenotyping with a focus on their application to psychiatric genetic research. A growing number of studies have demonstrated that diagnostic algorithms with high positive predictive value can be derived from EHRs, especially when structured data are supplemented by text mining approaches. Such algorithms enable semi-automated phenotyping for large-scale case-control studies. In addition, the scale and scope of EHR databases have been used successfully to identify phenotypic subgroups and derive algorithms for longitudinal risk prediction. EHR-based genomics are particularly well-suited to rapid look-up replication of putative risk genes, studies of pleiotropy (phenomewide association studies or PheWAS), investigations of genetic networks and overlap across the phenome, and pharmacogenomic research. EHR phenotyping has been relatively under-utilized in psychiatric genomic research but may become a key component of efforts to advance precision psychiatry.

Keywords: EHR; PheWAS; electronic medical records; phenotyping; psychiatric genetics.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Workflows for leveraging phenotypic data from HER. A) Extraction of clinical data into a research-ready database. Unstructured text can be transformed into standardized coded format through natural language processing (NLP); B) Stages in development of a phenotyping algorithm for case–control analyses. (1) An enriched datamart of cases or controls for the target phenotype is constructed using structured data filters followed by (2) selection of a subset for clinician chart review to establish gold-standard instances. (3) Potential predictors of case (or control) status are extracted from structured and text features in a subset of charts. (4) Using these selected features, a model is trained to predict the gold-standard cases/controls and model metrics are calculated to desired performance. (5) The model is applied to the full datamart and a chart review of a subset of cases (or controls) is conducted to determine PPV and NPV. (6) If desired performance is not achieved, the model can be adjusted until adequate performance (e.g. PPV > .90) is obtained.
Figure 2:
Figure 2:
Comparison of cost and time needed to obtain phenotypic data and samples based on experience in studies of bipolar disorder at Partners Healthcare. Approximately 700 samples were collected over a period of years at a cost of approximately $1000/subject in the STEP-BD study. At one-third of this cost, an even larger sample collected as part of the ICCBD (see text). Finally, the Partners Biobank enables investigators to rapidly obtain validated EHR phenotypes and samples at minimal cost and genomewide GWAS data at no cost.

References

    1. Baca-Garcia E, Perez-Rodriguez MM, Basurte-Villamor I, Saiz-Ruiz J, Leiva-Murillo JM, de Prado-Cumplido M, Santiago-Mozos R, Artes-Rodriguez A, de Leon J. 2006. Using data mining to explore complex clinical decisions: A study of hospitalization after a suicide attempt. J Clin Psychiatry 67(7):1124–1132. - PubMed
    1. Barak-Corren Y, Castro VM, Javitt S, Hoffnagle AG, Dai Y, Perlis RH, Nock MK, Smoller JW, Reis BY. 2016. Predicting Suicidal Behavior From Longitudinal Electronic Health Records. Am J Psychiatry:appiajp201616010077. - PubMed
    1. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, Melamed R, Rabadan R, Bernstam EV, Brunak S, Jensen LJ, Nicolae D, Shah NH, Grossman RL, Cox NJ, White KP, Rzhetsky A. 2013. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell 155(1):70–80. - PMC - PubMed
    1. Blumenthal D 2011. Wiring the health system--origins and provisions of a new federal program. N Engl J Med 365(24):2323–2329. - PubMed
    1. Blumenthal SR, Castro VM, Clements CC, Rosenfield HR, Murphy SN, Fava M, Weilburg JB, Erb JL, Churchill SE, Kohane IS, Smoller JW, Perlis RH. 2014. An electronic health records study of long-term weight gain following antidepressant use. JAMA Psychiatry 71(8):889–896. - PMC - PubMed

Publication types

LinkOut - more resources