Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct:14221:649-659.
doi: 10.1007/978-3-031-43895-0_61. Epub 2023 Oct 1.

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Affiliations

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Thomas Z Li et al. Med Image Comput Comput Assist Interv. 2023 Oct.

Abstract

The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.

Keywords: Latent Clinical Signatures; Multimodal Transformers; Pulmonary Nodule Classification.

PubMed Disclaimer

Figures

Fig.1:
Fig.1:
Left: Event streams for non-imaging variables are transformed into longitudinal curves. ICA learns independent latent signatures, S, in an unsupervised manner on a large non-imaging cohort. Right: Subject k’s expressions of the signatures, Ek, are sampled at scan dates. Input embeddings are the sum of 1) token embedding derived from signatures or imaging, 2) a fixed positional embedding indicating the token’s position in the sequence, and 3) a learnable segment embedding indicating imaging or non-imaging modality. The time interval between scans is used to compute a time-distance scaled self-attention. This is a flexible approach that handles asynchronous modalities, incompleteness over varying sequence lengths, and irregular time intervals.
Fig.2:
Fig.2:
A comparison of median and interquartile range of predicted probabilities reveals that TDSig is more correctly confident than baselines. Blue and red indicate subjects that were correctly and incorrectly reclassified by TDSig respectively. When compared to these baselines, TDSig is more often reclassifying correctly than not.
Fig.3:
Fig.3:
This is a control subject who developed a lesion over 3 months (a), to which the imaging-only approaches assigned a cancer probability of 0.4 (c). However, the subject’s highest expressed clinical signature at the 3-month mark was a new pattern of bacterial pneumonia (b), offering to the model a benign explanation of an image that it would otherwise be less correctly confident in.

References

    1. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, Etemadi M, Ye W, Corrado G, et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography (May 2019), https://www.nature.com/articles/s41591-019-0447-x - PubMed
    1. Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29 (2016)
    1. Finch A, Crowell A, Bhatia M, Parameshwarappa P, Chang YC, Martinez J, Horberg M: Exploiting hierarchy in medical concept embedding. JAMIA open 4(1), ooab022 (2021) - PMC - PubMed
    1. Fritsch FN, Butland J: A method for constructing local monotone piecewise cubic interpolants. SIAM journal on scientific and statistical computing 5(2), 300–304 (1984)
    1. Gao R, Tang Y, Xu K, Huo Y, Bao S, Antic SL, Epstein ES, Deppen S, Paulson AB, Sandler KL, Massion PP, Landman BA: Time-distanced gates in long short-term memory networks. Med. Image Anal. 65(101785), 101785 (Oct 2020) - PMC - PubMed

LinkOut - more resources