Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;14(1):101100.
doi: 10.1016/j.imr.2024.101100. Epub 2024 Nov 15.

Methods for identifying health status from routinely collected health data: An overview

Affiliations

Methods for identifying health status from routinely collected health data: An overview

Mei Liu et al. Integr Med Res. 2025 Mar.

Abstract

Routinely collected health data (RCD) are currently accelerating publications that evaluate the effectiveness and safety of medicines and medical devices. One of the fundamental steps in using these data is developing algorithms to identify health status that can be used for observational studies. However, the process and methodologies for identifying health status from RCD remain insufficiently understood. While most current methods rely on International Classification of Diseases (ICD) codes, they may not be universally applicable. Although machine learning methods hold promise for more accurately identifying the health status, they remain underutilized in RCD studies. To address these significant methodological gaps, we outline key steps and methodological considerations for identifying health statuses in observational studies using RCD. This review has the potential to boost the credibility of findings from observational studies that use RCD.

Keywords: Health status; Machine learning algorithms; Routinely collected health data; Rule-based algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest.

Figures

Fig 1
Fig. 1
illustration of key steps for identifying health statuses from RCD. Rule-based approaches involve the use of expert-defined logic rules, such as those based on specific indicators like ICD codes, medication prescription, or specific laboratory test results. These rules often utilize Boolean logic (e.g., AND, OR, NOT) and other logical criteria to identify and classify features. The machine learning approaches may involve models such as random forest, decision trees, regression, naïve bayes, which are trained on a dataset to learn patterns and make classification predictions.

Similar articles

References

    1. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10) - PMC - PubMed
    1. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence - what is it and what can it tell us? New Engl J Med. 2016;375(23):2293–2297. - PubMed
    1. Janssen A, Shah K, Keep M, Shaw T. Community perspectives on the use of electronic health data to support reflective practice by health professionals. BMC Med Inform Decis Mak. 2024;24(1):226. - PMC - PubMed
    1. Deng L, Chen Z, Zhu P, Hu C, Jin T, Wang X, et al. Effects of integrated traditional Chinese and Western medicine for acute pancreatitis: A real-world study in a tertiary teaching hospital. J Evid Based Med. 2024;17(3):575–587. - PubMed
    1. Peng L, Zhang K, Li Y, Chen L, Gao H, Chen H. Real-world evidence of traditional chinese medicine (TCM) treatment on cancer: a literature-based review. Evid Based Complement Alternat Med. 2022;2022 - PMC - PubMed

LinkOut - more resources