Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Thomas Z Li¹, John M Still², Kaiwen Xu³, Ho Hin Lee³, Leon Y Cai¹, Aravind R Krishnan⁴, Riqiang Gao⁵, Mirza S Khan⁶, Sanja Antic⁷, Michael Kammer⁷, Kim L Sandler⁸, Fabien Maldonado⁷, Bennett A Landman^{1

3

4

8}, Thomas A Lasko^{2

3}

Affiliations

¹ Biomedical Engineering, Vanderbilt University, Nashville, TN 37212, USA.
² Biomedical Informatics, Vanderbilt University, Nashville, TN 37212, USA.
³ Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
⁴ Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37212, USA.
⁵ Digital Technology and Innovation, Siemens Healthineers, Princeton NJ 08540, USA.
⁶ Saint Luke's Mid America Heart Institute, Kansas City, MO 64111, USA.
⁷ Medicine, Vanderbilt University Medical Center, Nashville, TN 37235, USA.
⁸ Radiology, Vanderbilt University Medical Center, Nashville, TN 37235, USA.

PMID: 38779102
PMCID: PMC11110542
DOI: 10.1007/978-3-031-43895-0_61

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Thomas Z Li et al. Med Image Comput Comput Assist Interv. 2023 Oct.

. 2023 Oct:14221:649-659.

doi: 10.1007/978-3-031-43895-0_61. Epub 2023 Oct 1.

Authors

Affiliations

¹ Biomedical Engineering, Vanderbilt University, Nashville, TN 37212, USA.
² Biomedical Informatics, Vanderbilt University, Nashville, TN 37212, USA.
³ Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
⁴ Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37212, USA.
⁵ Digital Technology and Innovation, Siemens Healthineers, Princeton NJ 08540, USA.
⁶ Saint Luke's Mid America Heart Institute, Kansas City, MO 64111, USA.
⁷ Medicine, Vanderbilt University Medical Center, Nashville, TN 37235, USA.
⁸ Radiology, Vanderbilt University Medical Center, Nashville, TN 37235, USA.

PMID: 38779102
PMCID: PMC11110542
DOI: 10.1007/978-3-031-43895-0_61

Abstract

The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.

Keywords: Latent Clinical Signatures; Multimodal Transformers; Pulmonary Nodule Classification.

PubMed Disclaimer

Figures

**Fig.1:**
Left: Event streams for non-imaging variables are transformed into longitudinal curves. ICA learns independent latent signatures, $S$ , in an unsupervised manner on a large non-imaging cohort. Right: Subject $k$ ’s expressions of the signatures, $E_{k}^{'}$ , are sampled at scan dates. Input embeddings are the sum of 1) token embedding derived from signatures or imaging, 2) a fixed positional embedding indicating the token’s position in the sequence, and 3) a learnable segment embedding indicating imaging or non-imaging modality. The time interval between scans is used to compute a time-distance scaled self-attention. This is a flexible approach that handles asynchronous modalities, incompleteness over varying sequence lengths, and irregular time intervals.

**Fig.2:**
A comparison of median and interquartile range of predicted probabilities reveals that TDSig is more correctly confident than baselines. Blue and red indicate subjects that were correctly and incorrectly reclassified by TDSig respectively. When compared to these baselines, TDSig is more often reclassifying correctly than not.

**Fig.3:**
This is a control subject who developed a lesion over 3 months (a), to which the imaging-only approaches assigned a cancer probability of 0.4 (c). However, the subject’s highest expressed clinical signature at the 3-month mark was a new pattern of bacterial pneumonia (b), offering to the model a benign explanation of an image that it would otherwise be less correctly confident in.

See this image and copyright information in PMC

References

1. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, Etemadi M, Ye W, Corrado G, et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography (May 2019), https://www.nature.com/articles/s41591-019-0447-x - PubMed
1. Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29 (2016)
1. Finch A, Crowell A, Bhatia M, Parameshwarappa P, Chang YC, Martinez J, Horberg M: Exploiting hierarchy in medical concept embedding. JAMIA open 4(1), ooab022 (2021) - PMC - PubMed
1. Fritsch FN, Butland J: A method for constructing local monotone piecewise cubic interpolants. SIAM journal on scientific and statistical computing 5(2), 300–304 (1984)
1. Gao R, Tang Y, Xu K, Huo Y, Bao S, Antic SL, Epstein ES, Deppen S, Paulson AB, Sandler KL, Massion PP, Landman BA: Time-distanced gates in long short-term memory networks. Med. Image Anal. 65(101785), 101785 (Oct 2020) - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Affiliations

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials