. 2016 Jul;23(4):731-40.

doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.

Electronic medical record phenotyping using the anchor and learn framework

Yoni Halpern¹, Steven Horng², Youngduck Choi³, David Sontag¹

Affiliations

¹ Department of Computer Science, New York University, New York, NY, USA dsontag@cs.nyu.edu.
² Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA dsontag@cs.nyu.edu.
³ Department of Computer Science, New York University, New York, NY, USA.

PMID: 27107443
PMCID: PMC4926745
DOI: 10.1093/jamia/ocw011

Electronic medical record phenotyping using the anchor and learn framework

Yoni Halpern et al. J Am Med Inform Assoc. 2016 Jul.

. 2016 Jul;23(4):731-40.

doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.

Authors

Yoni Halpern¹, Steven Horng², Youngduck Choi³, David Sontag¹

Affiliations

¹ Department of Computer Science, New York University, New York, NY, USA dsontag@cs.nyu.edu.
² Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA dsontag@cs.nyu.edu.
³ Department of Computer Science, New York University, New York, NY, USA.

PMID: 27107443
PMCID: PMC4926745
DOI: 10.1093/jamia/ocw011

Abstract

Background: Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention.

Materials and methods: We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels.

Results: We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97.

Discussion: The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients.

Conclusion: Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

Keywords: clinical decision support systems; electronic health records; knowledge representation; machine learning; natural language processing.

PubMed Disclaimer

Figures

**Figure 1**
Comparison of performance of phenotypes learned with 200 000 unlabeled patients using the semi-supervised anchor based method, and phenotypes learned with supervised classification using 5000 gold-standard labels. Error bars indicate 2 * standard error. For anticoagulated and cancer, there were not a sufficient number of gold-standard labels to learn with 5000 patients, so the fully supervised baseline is omitted.

**Figure 2**
Changes to patient records over time. The time of every change to the patient record is recorded (measured in minutes from arrival) and a non-parametric kernel density estimator is used to plot the distribution of times at which changes occur.

**Figure 3**
Influence and highly changing features for the pneumonia phenotype extractor as a function of time.

**Figure 4**
Additive change in AUC from baseline for phenotype extraction as a function of the features used. The baseline phenotype extraction uses only features from age, sex, and triage vitals and its value is indicated for each phenotype on the y-axis label. In each plot, the bars on the left use structured data while the center bars use free-text data. Hatched lines represent a combination of features. A star is placed below the single feature that has the highest performance. From left to right, the classifiers used: Med – Medication history (prior to visit) Pyx – Medication dispensing record (during visit) Lab – Laboratory values Strct – All structured data (Med + Pyx + Labs) Tri – Triage nursing text MD – Physician comments Txt – All Text (Tri + MD) All – All features (Structured + Text)

See this image and copyright information in PMC

References

1. Wright A, Pang J, Feblowitz JC, et al. . Improving completeness of electronic problem lists through clinical decision support: a randomized, controlled trial. J Am Med Inform Assoc. 2012;19(4):555–561. - PMC - PubMed
1. Gandhi TK, Zuccotti G, Lee TH. Incomplete Care — On the Trail of Flaws in the System. New Engl J Med. 2011;365(6):486–488. - PubMed
1. Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011;18(2):181–186. - PMC - PubMed
1. Sittig DF, Wright A, Osheroff JA, et al. . Grand challenges in clinical decision support. J Biomed Inform. 2008;41(2):387–392. - PMC - PubMed
1. Liu M, McPeek Hinz ER, Matheny ME, et al. . Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc. 2013;20(3):420–426. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

UL1 TR000038/TR/NCATS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Electronic medical record phenotyping using the anchor and learn framework

Affiliations

Electronic medical record phenotyping using the anchor and learn framework

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials