Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul;23(4):731-40.
doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.

Electronic medical record phenotyping using the anchor and learn framework

Affiliations

Electronic medical record phenotyping using the anchor and learn framework

Yoni Halpern et al. J Am Med Inform Assoc. 2016 Jul.

Abstract

Background: Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention.

Materials and methods: We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels.

Results: We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97.

Discussion: The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients.

Conclusion: Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

Keywords: clinical decision support systems; electronic health records; knowledge representation; machine learning; natural language processing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of performance of phenotypes learned with 200 000 unlabeled patients using the semi-supervised anchor based method, and phenotypes learned with supervised classification using 5000 gold-standard labels. Error bars indicate 2 * standard error. For anticoagulated and cancer, there were not a sufficient number of gold-standard labels to learn with 5000 patients, so the fully supervised baseline is omitted.
Figure 2
Figure 2
Changes to patient records over time. The time of every change to the patient record is recorded (measured in minutes from arrival) and a non-parametric kernel density estimator is used to plot the distribution of times at which changes occur.
Figure 3
Figure 3
Influence and highly changing features for the pneumonia phenotype extractor as a function of time.
Figure 4
Figure 4
Additive change in AUC from baseline for phenotype extraction as a function of the features used. The baseline phenotype extraction uses only features from age, sex, and triage vitals and its value is indicated for each phenotype on the y-axis label. In each plot, the bars on the left use structured data while the center bars use free-text data. Hatched lines represent a combination of features. A star is placed below the single feature that has the highest performance. From left to right, the classifiers used: Med – Medication history (prior to visit) Pyx – Medication dispensing record (during visit) Lab – Laboratory values Strct – All structured data (Med + Pyx + Labs) Tri – Triage nursing text MD – Physician comments Txt – All Text (Tri + MD) All – All features (Structured + Text)

References

    1. Wright A, Pang J, Feblowitz JC, et al. . Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial. J Am Med Inform Assoc. 2012;19(4):555–561. - PMC - PubMed
    1. Gandhi TK, Zuccotti G, Lee TH. Incomplete Care — On the Trail of Flaws in the System. New Engl J Med. 2011;365(6):486–488. - PubMed
    1. Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011;18(2):181–186. - PMC - PubMed
    1. Sittig DF, Wright A, Osheroff JA, et al. . Grand challenges in clinical decision support. J Biomed Inform. 2008;41(2):387–392. - PMC - PubMed
    1. Liu M, McPeek Hinz ER, Matheny ME, et al. . Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc. 2013;20(3):420–426. - PMC - PubMed

Publication types