Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 27;5(1):100906.
doi: 10.1016/j.patter.2023.100906. eCollection 2024 Jan 12.

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records

Affiliations

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records

Jun Wen et al. Patterns (N Y). .

Abstract

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
LATTE framework LATTE is an end-to-end neural network pipeline consisting of four major components: (a) the CR module, which selects important input features based on their semantic relationship to the target phenotype; (b) the VAN, which learns to pay attention to the most incident-indicative visits; (c) BiGRU layers, which model the sequential dependency among visits; and (d) incident predictors, which generate incident predictions at each visit.
Figure 2
Figure 2
Numerical results of incident phenotyping with varied gold-standard label sizes We evaluate LATTE on the onset of both type 2 diabetes (T2D) and heart failure (HF) and the onset and relapse of multiple sclerosis (MS). Error bars indicate 95% confidence intervals.
Figure 3
Figure 3
t-SNE visualization of visit embedding vectors on T2D and HF (A) The visit embedding vectors aggregated based on the observed counts. (B) The visit embedding vectors aggregated with the concept re-weighting (CR). (C) The visit embedding vectors aggregated with both CR and visit attention network (VAN). Blue dots denote the visits before the target incidents, and red dots denote those after the incidents.
Figure 4
Figure 4
Prediction interpretability of LATTE We visualize the prediction curve (orange) and longitudinal evidence that drive LATTE’s incident predictions. The concepts are ranked from bottom to top by the learned importance. Red, diagnosis code; green, medication code; purple, lab test code; blue, UMLS CUIs extracted from medical notes. The chart date is the annotated date of HF onset.
Figure 5
Figure 5
Hazard ratio (HR) for HF risk prediction on patients with rheumatoid arthritis (RA) We provide the estimated hazard ratios with 95% confidence intervals for risk prediction among RA patients up to 5-year follow-up.
Figure 6
Figure 6
Estimated relative efficiency of coefficient estimation We provide the estimated relative efficiency of analyses with LATTE-derived HF timings versus analysis with HF diagnosis code-derived outcomes in HR risk prediction among RA patients up to 5-year follow-up. LATTE-derived outcomes achieve systematically improved efficiency.

Similar articles

Cited by

References

    1. Kohane I.S., Churchill S.E., Murphy S.N. A translational engine at the national scale: informatics for integrating biology and the bedside. J. Am. Med. Inf. Assoc. 2012;19:181–185. - PMC - PubMed
    1. Miotto R., Li L., Kidd B.A., Dudley J.T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 2016;6:26094–26110. - PMC - PubMed
    1. Ananthakrishnan A.N., Cai T., Savova G., Cheng S.C., Chen P., Perez R.G., Gainer V.S., Murphy S.N., Szolovits P., Xia Z., et al. Improving case definition of crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm. Bowel Dis. 2013;19:1411–1420. - PMC - PubMed
    1. Liao K.P., Cai T., Gainer V., Goryachev S., Zeng-treitler Q., Raychaudhuri S., Szolovits P., Churchill S., Murphy S., Kohane I., et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62:1120–1127. - PMC - PubMed
    1. Murphy S.N., et al. Vol. 2006. American Medical Informatics Association; 2006. Integration of clinical and genetic data in the i2b2 architecture; p. 1040. (AMIA Annual Symposium Proceedings). - PMC - PubMed

LinkOut - more resources