Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 13;19(1):378.
doi: 10.1186/s13023-024-03406-4.

A machine learning algorithm for the detection of paroxysmal nocturnal haemoglobinuria (PNH) in UK primary care electronic health records

Affiliations

A machine learning algorithm for the detection of paroxysmal nocturnal haemoglobinuria (PNH) in UK primary care electronic health records

Amanda Worker et al. Orphanet J Rare Dis. .

Abstract

Background: Paroxysmal Nocturnal Haemoglobinuria (PNH) is an ultra-rare, acquired disorder that is challenging to diagnose due to varied symptoms, heterogeneous patient presentations, and lack of awareness among healthcare professionals. This leads to frequent misdiagnosis and delays in diagnosis. This study evaluated the feasibility of a machine learning model to identify undiagnosed PNH patients using structured electronic health records.

Methods: The study used data from the Optimum Patient Care Research Database, which contains electronic health records from general practitioner (GP) practices across the United Kingdom. PNH patients were identified by the presence, and control patients by the absence of a PNH diagnosis code in their records. Clinical features (symptoms, diagnoses, healthcare utilisation) from 131 patients in the PNH group and 593,838 patients in the control group, were inputted to a tree-based XGBoost machine learning model to classify patients as either "positive" or "negative" for PNH suspicion. The algorithm was finalised after additional exclusions and inclusions applied. Performance was assessed using positive predictive value (PPV), recall and specificity. As the sample used to develop the algorithm was not representative of the true population prevalence, PPV was additionally adjusted to reflect performance in the wider population.

Results: Of all the patients in the PNH group, 27% were classified as positive (recall). 99.99% of the control group were classified as negative (specificity). Of all the patients classified as positive, 60.4% had a diagnosis of PNH in their record (PPV). The PPV adjusted for the population prevalence of PNH was 19.59 suggesting nearly 1 in 5 patients flagged may warrant further PNH investigation. The key clinical features in the model were aplastic anaemia, pancytopenia, haemolytic anaemia, myelodysplastic syndrome, and Budd-Chiari syndrome.

Conclusion: This is the first study to combine clinical understanding of PNH with machine learning, demonstrating the ability to discriminate between PNH and control patients in retrospective electronic health records. With further investigation and validation, this algorithm could be deployed on live health data, potentially leading to earlier diagnosis for patients who currently experience long diagnostic delays or remain undiagnosed.

Keywords: Machine learning; Paroxysmal Nocturnal Haemoglobinuria (PNH); Primary care; Rare disease; UK electronic health records.

PubMed Disclaimer

Conflict of interest statement

During study conduct, AW, HM, JS, FBP, EM, RD, AW, EV, DO, CG and PF were employees of Mendelian. RJK receives research funding from Novartis, acts as a consultant for Sobi, AstraZeneca and Alexion. RJK also sits on the advisory boards for Alexion, AstraZeneca, Novartis, Sobi, Jazz, Amgen and receives honoraria from Alexion, Sobi, Biologix.

Figures

Fig. 1
Fig. 1
Flowchart demonstrating process for final feature inclusion
Fig. 2
Fig. 2
Illustration of how additional exclusion and inclusion criteria could impact final algorithm
Fig. 3
Fig. 3
Breakdown of sample after cleaning and preprocessing
Fig. 4
Fig. 4
A visual representation of average performance of the algorithm across the 5-folds of cross-validation, including sensitivity, recall, positive predictive value (PPV) and adjusted PPV
Fig. 5
Fig. 5
10 features of most importance in the XGBoost model, using XGBoost’s inbuilt feature importance method

References

    1. Richards SJ, Painter D, Dickinson AJ, Griffin M, Munir T, Arnold L, Payne D, Pike A, Muus P, Hill A, Newton DJ. The incidence and prevalence of patients with paroxysmal nocturnal haemoglobinuria and aplastic anaemia PNH syndrome: a retrospective analysis of the UK’s population-based haematological malignancy research network 2004–2018. Eur J Haematol. 2021;107(2):211–8. - PubMed
    1. National PNH Service. The National PNH Service website. Leeds Teaching Hospitals NHS Trust. https://pnhserviceuk.co.uk/. Accessed 15 Jul 2024.
    1. Kelly RJ, Hill A, Arnold LM, Brooksbank GL, Richards SJ, Cullen M, Mitchell LD, Cohen DR, Gregory WM, Hillmen P. Long-term treatment with eculizumab in paroxysmal nocturnal hemoglobinuria: sustained efficacy and improved survival. Blood J Am Soc Hematol. 2011;117(25):6786–92. - PubMed
    1. Kelly RJ, Holt M, Vidler J, Arnold LM, Large J, Forrest B, Barnfield C, Pike A, Griffin M, Munir T, Muus P. Treatment outcomes of complement protein C5 inhibition in 509 UK patients with paroxysmal nocturnal hemoglobinuria. Blood J. 2023:blood-2023021762. - PubMed
    1. Röth A, Maciejewski J, Nishimura JI, Jain D, Weitz JI. Screening and diagnostic clinical algorithm for paroxysmal nocturnal hemoglobinuria: expert consensus. Eur J Haematol. 2018;101(1):3–11. - PubMed