lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses
- PMID: 39227951
- PMCID: PMC11370074
- DOI: 10.1186/s12911-024-02652-7
lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses
Abstract
Background: The integrity of clinical research and machine learning models in healthcare heavily relies on the quality of underlying clinical laboratory data. However, the preprocessing of this data to ensure its reliability and accuracy remains a significant challenge due to variations in data recording and reporting standards.
Methods: We developed lab2clean, a novel algorithm aimed at automating and standardizing the cleaning of retrospective clinical laboratory results data. lab2clean was implemented as two R functions specifically designed to enhance data conformance and plausibility by standardizing result formats and validating result values. The functionality and performance of the algorithm were evaluated using two extensive electronic medical record (EMR) databases, encompassing various clinical settings.
Results: lab2clean effectively reduced the variability of laboratory results and identified potentially erroneous records. Upon deployment, it demonstrated effective and fast standardization and validation of substantial laboratory data records. The evaluation highlighted significant improvements in the conformance and plausibility of lab results, confirming the algorithm's efficacy in handling large-scale data sets.
Conclusions: lab2clean addresses the challenge of preprocessing and cleaning clinical laboratory data, a critical step in ensuring high-quality data for research outcomes. It offers a straightforward, efficient tool for researchers, improving the quality of clinical laboratory data, a major portion of healthcare data. Thereby, enhancing the reliability and reproducibility of clinical research outcomes and clinical machine learning models. Future developments aim to broaden its functionality and accessibility, solidifying its vital role in healthcare data management.
Keywords: Algorithms; Clinical laboratories; Data integrity; Data preprocessing; Electronic medical records.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Automated Fall Detection Algorithm With Global Trigger Tool, Incident Reports, Manual Chart Review, and Patient-Reported Falls: Algorithm Development and Validation With a Retrospective Diagnostic Accuracy Study.J Med Internet Res. 2020 Sep 21;22(9):e19516. doi: 10.2196/19516. J Med Internet Res. 2020. PMID: 32955445 Free PMC article.
-
EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records.Comput Methods Programs Biomed. 2025 Feb;259:108521. doi: 10.1016/j.cmpb.2024.108521. Epub 2024 Nov 24. Comput Methods Programs Biomed. 2025. PMID: 39615196
-
Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov. PLoS Med. 2018. PMID: 30399150 Free PMC article.
-
Toward High-Quality Real-World Laboratory Data in the Era of Healthcare Big Data.Ann Lab Med. 2025 Jan 1;45(1):1-11. doi: 10.3343/alm.2024.0258. Epub 2024 Sep 30. Ann Lab Med. 2025. PMID: 39344148 Free PMC article. Review.
-
A Blockchain Framework for Patient-Centered Health Records and Exchange (HealthChain): Evaluation and Proof-of-Concept Study.J Med Internet Res. 2019 Aug 31;21(8):e13592. doi: 10.2196/13592. J Med Internet Res. 2019. PMID: 31471959 Free PMC article. Review.
References
-
- Garbage. in, garbage out. In: Wikipedia. 2023. Available from: https://en.wikipedia.org/wiki/Garbage_in,_garbage_out. Cited 2024 Feb 12.
-
- Kandel S, Heer J, Plaisant C, Kennedy J, van Ham F, Riche NH, et al. Research directions in data wrangling: visualizations and transformations for usable and credible data. Inform Visual. 2011;10(4):271–88.
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous