Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;160(6):620-632.
doi: 10.1093/ajcp/aqad099.

A fully interpretable machine learning model for increasing the effectiveness of urine screening

Affiliations

A fully interpretable machine learning model for increasing the effectiveness of urine screening

Fabio Del Ben et al. Am J Clin Pathol. .

Abstract

Objectives: This article addresses the need for effective screening methods to identify negative urine samples before urine culture, reducing the workload, cost, and release time of results in the microbiology laboratory. We try to overcome the limitations of current solutions, which are either too simple, limiting effectiveness (1 or 2 parameters), or too complex, limiting interpretation, trust, and real-world implementation ("black box" machine learning models).

Methods: The study analyzed 15,312 samples from 10,534 patients with clinical features and the Sysmex Uf-1000i automated analyzer data. Decision tree (DT) models with or without lookahead strategy were used, as they offer a transparent set of logical rules that can be easily understood by medical professionals and implemented into automated analyzers.

Results: The best model achieved a sensitivity of 94.5% and classified negative samples based on age, bacteria, mucus, and 2 scattering parameters. The model reduced the workload by an additional 16% compared to the current procedure in the laboratory, with an estimated financial impact of €40,000/y considering 15,000 samples/y. Identified logical rules have a scientific rationale matched to existing knowledge in the literature.

Conclusions: Overall, this study provides an effective and interpretable screening method for urine culture in microbiology laboratories, using data from the Sysmex UF-1000i automated analyzer. Unlike other machine learning models, our model is interpretable, generating trust and enabling real-world implementation.

Keywords: data science; decision tree; machine learning; urinalysis.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
A, Schematics of the architecture of the automated urine analyzer. B, Scatterplots and particle classification of the Sysmex UF-1000i. The left graph shows forward scatter of the sediment channel (S FSC) against fluorescence high gain of the same channel (S FLH). The right graph shows forward scatter of the sediment channel (S FSC) against fluorescence low gain of the same channel (S FLL). Every dot represents an object detected by the cytometer. The device has an internal proprietary clustering algorithm classifying objects into different classes (white blood cells [WBC], red blood cells [RBC], yeast [YLC], endothelial cells [EC]).
FIGURE 2
FIGURE 2
Tree-pruning strategy. A, Generation of the initial decision tree using the standard configuration of sk-learn. B, Selection of the end node by the identification of the node with the highest number of negative samples (in orange) while keeping the number of positive within the sensitivity threshold.
FIGURE 2
FIGURE 2
Tree-pruning strategy. A, Generation of the initial decision tree using the standard configuration of sk-learn. B, Selection of the end node by the identification of the node with the highest number of negative samples (in orange) while keeping the number of positive within the sensitivity threshold.
FIGURE 3
FIGURE 3
Example of the lookahead procedure. A, The data set shows 8 samples with patients with a certain age, sex, and the outcome of the test. The classic decision tree (B) is not able to recognize that the first split should be done based on the sex of the patient, while the lookahead procedure (C) is more capable and achieves the perfect split of all the leaves. The color represents whether the samples are negative (orange), positive (blue), or no majority (gray).
FIGURE 4
FIGURE 4
Decision tree created with all the available features. The node in orange represents the end node, which is the node that identifies the samples classified as negative. All the other nodes, in blue, indicate the samples that are classified as positive and are sent to urine culture. For definitions of abbreviations, see TABLE 1.
FIGURE 5
FIGURE 5
Decision tree created with all the available features and using a lookahead of 1 step to split the nodes. The tree is more compact than the counterpart without lookahead FIGURE 4 and is less prone to overfit, as shown in TABLE 3. For definitions of abbreviations, see TABLE 1.

References

    1. Boonen K, Koldewijn E, Arents N, Raaymakers P, Scharnhorst V.. Urine flow cytometry as a primary screening method to exclude urinary tract infections. World J Urol. 2013;31(3):547-551. - PubMed
    1. Broeren M, Bahçeci S, Vader H, Arents N.. Screening for urinary tract infection with the Sysmex UF-1000i urine flow cytometer. J Clin Microbiol. 2011;49(3):1025-1029. - PMC - PubMed
    1. Enko D, Stelzer I, Böckl M, et al. . Comparison of the reliability of Gram-negative and Gram-positive flags of the Sysmex UF-5000 with manual Gram stain and urine culture results. Clin Chem Lab Med. 2021;59(3):619-624. - PubMed
    1. De Rosa R, Grosso S, Bruschetta G, et al. . Evaluation of the Sysmex UF1000i flow cytometer for ruling out bacterial urinary tract infection. Clin Chim Acta. 2010;411(15-16):1137-1142. - PubMed
    1. De Rosa R, Grosso S, Lorenzi G, Bruschetta G, Camporese A.. Evaluation of the new Sysmex UF-5000 fluorescence flow cytometry analyser for ruling out bacterial urinary tract infection and for prediction of Gram negative bacteria in urine cultures. Clin Chim Acta. 2018;484:171-178. 10.1016/j.cca.2018.05.047 - DOI - PubMed