Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 18;28(4):772-781.
doi: 10.1093/jamia/ocaa288.

High-throughput phenotyping with temporal sequences

Affiliations

High-throughput phenotyping with temporal sequences

Hossein Estiri et al. J Am Med Inform Assoc. .

Abstract

Objective: High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

Materials and methods: We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

Results: Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

Discussion: The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

Conclusion: Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Keywords: electronic health records; phenotyping; sequential pattern mining; temporal data representation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The 2 steps involved in the high-throughput phenotyping with temporal sequences. AVR: aggregated vector representation; C.V.: cross validation; MSMR: minimize sparsity and maximize relevance; SPM: sequential pattern mining; tSPM: transitive sequential pattern mining.
Figure 2.
Figure 2.
Distribution of phenotyping areas under the receiver-operating characteristic curve (AUC ROCs) by phenotype and data representation. AD: Alzheimer’s disease; AFIB: atrial fibrillation; AVR: aggregated vector representation; CAD: coronary artery disease; CHF: congestive heart failure; COPD: chronic obstructive pulmonary disease; MSMR: minimize sparsity and maximize relevance; RA: rheumatoid arthritis; SPM: sequential pattern mining; T1DM: type 1 diabetes mellitus; T2DM: type 2 diabetes mellitus; tSPM: transitive sequential pattern mining; UC: ulcerative colitis.

References

    1. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13 (6): 395–405. - PubMed
    1. Zhao J Papapetrou P Asker L Boström H. Learning from heterogeneous temporal data in electronic health records. J Biomed Inform 2017; 65: 105–19. - PubMed
    1. Hripcsak G Albers DJ Perotte A. Exploiting time in electronic health record correlations. J Am Med Inform Assoc 2011; 18 (Suppl 1): i109–15. - PMC - PubMed
    1. Hripcsak G Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2013; 20 (1): 117–21. - PMC - PubMed
    1. Agniel D, Kohane IS, Weber GM. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 2018; 361: k1479. - PMC - PubMed