Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 3;24(1):245.
doi: 10.1186/s12911-024-02652-7.

lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses

Affiliations

lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses

Ahmed Medhat Zayed et al. BMC Med Inform Decis Mak. .

Abstract

Background: The integrity of clinical research and machine learning models in healthcare heavily relies on the quality of underlying clinical laboratory data. However, the preprocessing of this data to ensure its reliability and accuracy remains a significant challenge due to variations in data recording and reporting standards.

Methods: We developed lab2clean, a novel algorithm aimed at automating and standardizing the cleaning of retrospective clinical laboratory results data. lab2clean was implemented as two R functions specifically designed to enhance data conformance and plausibility by standardizing result formats and validating result values. The functionality and performance of the algorithm were evaluated using two extensive electronic medical record (EMR) databases, encompassing various clinical settings.

Results: lab2clean effectively reduced the variability of laboratory results and identified potentially erroneous records. Upon deployment, it demonstrated effective and fast standardization and validation of substantial laboratory data records. The evaluation highlighted significant improvements in the conformance and plausibility of lab results, confirming the algorithm's efficacy in handling large-scale data sets.

Conclusions: lab2clean addresses the challenge of preprocessing and cleaning clinical laboratory data, a critical step in ensuring high-quality data for research outcomes. It offers a straightforward, efficient tool for researchers, improving the quality of clinical laboratory data, a major portion of healthcare data. Thereby, enhancing the reliability and reproducibility of clinical research outcomes and clinical machine learning models. Future developments aim to broaden its functionality and accessibility, solidifying its vital role in healthcare data management.

Keywords: Algorithms; Clinical laboratories; Data integrity; Data preprocessing; Electronic medical records.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Outline of the three validation checks of function 2
Fig. 2
Fig. 2
Improved conformance (standardization rates) of result values by applying function 1 on the INTEGO database. Color map: ‘bamako’ by Fabio Crameri [24]

Similar articles

References

    1. Garbage. in, garbage out. In: Wikipedia. 2023. Available from: https://en.wikipedia.org/wiki/Garbage_in,_garbage_out. Cited 2024 Feb 12.
    1. Kandel S, Heer J, Plaisant C, Kennedy J, van Ham F, Riche NH, et al. Research directions in data wrangling: visualizations and transformations for usable and credible data. Inform Visual. 2011;10(4):271–88.
    1. Abhyankar S, Demner-Fushman D, McDonald CJ. Standardizing clinical laboratory data for secondary use. J Biomed Inform. 2012;45(4):642–50. 10.1016/j.jbi.2012.04.012 - DOI - PMC - PubMed
    1. Dixon BE, McGowan JJ, Grannis SJ. Electronic laboratory data quality and the value of a health information exchange to support public health reporting processes. AMIA Annu Symp Proc. 2011;2011:322–30. - PMC - PubMed
    1. Hauser RG, Quine DB, Ryder A. LabRS: a Rosetta stone for retrospective standardization of clinical laboratory test results. J Am Med Inf Assoc. 2017;25(2):121–6.10.1093/jamia/ocx046 - DOI - PMC - PubMed

LinkOut - more resources