Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 9:46:e00492.
doi: 10.1016/j.plabm.2025.e00492. eCollection 2025 Sep.

Differential Distributions: A refined methodology to indirect reference interval estimation by including Patient's health status according to associated ICD-10 codes

Affiliations

Differential Distributions: A refined methodology to indirect reference interval estimation by including Patient's health status according to associated ICD-10 codes

David Schär et al. Pract Lab Med. .

Abstract

Background: Traditional methods for estimating reference intervals (RIs) using patient's blood test results from the clinical routine, typically remove outliers without considering the nuanced health statuses of patients. This removes a vast majority of test results for reference interval estimation without considering the actual health status of the patient.

Methods: We introduce the Differential Distribution Method (DDM) which uses laboratory routine data coded with ICD-10 to approximate an underlying non-diseased age and sex stratified population from mixed clinical data. By removing test results that stem from subpopulations significantly different from the general population, reference intervals can be generated stratified by sex and age, taking into account the associated health conditions of the patients as derived by the ICD-10 coding system.

Results: Applying the DDM to blood plasma potassium levels demonstrated its ability to adjust RIs dynamically across different patient groups. The method effectively differentiated RIs in a decade-based stratification, showing significant variability and tighter confidence intervals, particularly in older (above 60 years old) adults. The RIs were slightly wider with advancing age in both males and females, while their standard deviation was reduced by removing large portions of test results differing significantly, grouped by either their individual ICD-10 code or clusters of ICD-10 codes.

Conclusions: This DDM data mining approach offers a robust framework for RI inference by generating adjusted RIs that incorporate clinical nuances reflected in ICD-10 codes. This approach not only enhances the accuracy of patient diagnostics but also facilitates the identification of potential multimorbidities affecting laboratory results.

Keywords: Clinical diagnostics; Laboratory Medicine; Machine Learning; Personalized Medicine; Reference intervals.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests. Reports a relationship with that includes:. Has patent pending to. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Workflow for Clustering ICD-10-GM Diagnosis Using Natural Language Processing. The process is based on laboratory data that is stratified by analyte and sex. The natural language processing algorithm Word2vec is employed to transform N diagnoses into an N-dimensional vector space. Each diagnosis is represented as a vector, and the angular relationship between these vectors is quantified using cosine similarity.
Fig. 2
Fig. 2
User interface of the Differential Distribution Method (DDM) R Shiny app. The left panel provides user input options for the reference interval estimation process to select variables such as sex, age range, hypothesis testing parameters and significance level thresholds. The main plot area (right) shows histograms representing the Global Distribution (GD) and the Differential Distribution (DD) of selected laboratory measurements. Reference interval estimates are depicted as vertical lines for both distributions. The lower plot section features two directional arrows that represent the magnitude and direction of changes in the reference limits, comparing those estimated from the GD to those from the DD.
Fig. 3
Fig. 3
Heatmap of all potassium measurements stratified by age and sex (left: female, right: male). The color intensity represents the density of available potassium measurement for each age year (x-axis) and the respective potassium level in mmol/L (y-axis).

Similar articles

References

    1. Gowda S., Desai P.B., Kulkarni S.S., Hull V.V., Math A.A.K., Vernekar S.N. Markers of renal function tests. N. Am. J. Med. Sci. 2010;2:170–173. https://www.ncbi.nlm.nih.gov/pubmed/22624135 - PMC - PubMed
    1. Benoit S.W., Ciccia E.A., Devarajan P. Cystatin C as a biomarker of chronic kidney disease: latest developments. Expert Rev. Mol. Diagn. 2020;20:1019–1026. doi: 10.1080/14737159.2020.1768849. - DOI - PMC - PubMed
    1. Wayne . Clinical Laboratory Standards Institute; 2008. Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory: Approved Guideline. CLSI Document EP28-A3c.
    1. Ozarda Y., Higgins V., Adeli K. Verification of reference intervals in routine clinical laboratories: practical challenges and recommendations. Clin. Chem. Lab. Med. 2018;57:30–37. doi: 10.1515/cclm-2018-0059. - DOI - PubMed
    1. Martinez-Sanchez L., Cobbaert C.M., Noordam R., Brouwer N., Blanco-Grau A., Villena-Ortiz Y., Thelen M., Ferrer-Costa R., Casis E., Rodríguez-Frias F., den Elzen W.P.J. Indirect determination of biochemistry reference intervals using outpatient data. PLoS One. 2022;17 doi: 10.1371/journal.pone.0268522. - DOI - PMC - PubMed