Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 18;61(5):2201720.
doi: 10.1183/13993003.01720-2022. Print 2023 May.

Collaboration between explainable artificial intelligence and pulmonologists improves the accuracy of pulmonary function test interpretation

Affiliations

Collaboration between explainable artificial intelligence and pulmonologists improves the accuracy of pulmonary function test interpretation

Nilakash Das et al. Eur Respir J. .

Abstract

Background: Few studies have investigated the collaborative potential between artificial intelligence (AI) and pulmonologists for diagnosing pulmonary disease. We hypothesised that the collaboration between a pulmonologist and AI with explanations (explainable AI (XAI)) is superior in diagnostic interpretation of pulmonary function tests (PFTs) than the pulmonologist without support.

Methods: The study was conducted in two phases, a monocentre study (phase 1) and a multicentre intervention study (phase 2). Each phase utilised two different sets of 24 PFT reports of patients with a clinically validated gold standard diagnosis. Each PFT was interpreted without (control) and with XAI's suggestions (intervention). Pulmonologists provided a differential diagnosis consisting of a preferential diagnosis and optionally up to three additional diagnoses. The primary end-point compared accuracy of preferential and additional diagnoses between control and intervention. Secondary end-points were the number of diagnoses in differential diagnosis, diagnostic confidence and inter-rater agreement. We also analysed how XAI influenced pulmonologists' decisions.

Results: In phase 1 (n=16 pulmonologists), mean preferential and differential diagnostic accuracy significantly increased by 10.4% and 9.4%, respectively, between control and intervention (p<0.001). Improvements were somewhat lower but highly significant (p<0.0001) in phase 2 (5.4% and 8.7%, respectively; n=62 pulmonologists). In both phases, the number of diagnoses in the differential diagnosis did not reduce, but diagnostic confidence and inter-rater agreement significantly increased during intervention. Pulmonologists updated their decisions with XAI's feedback and consistently improved their baseline performance if AI provided correct predictions.

Conclusion: A collaboration between a pulmonologist and XAI is better at interpreting PFTs than individual pulmonologists reading without XAI support or XAI alone.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: N. Das holds a patent on automated quality control of spirometry. E. Derom reports consultancy fees from Chiesi, GlaxoSmithKline, AstraZeneca and Boehringer Ingelheim. G. Brusselle reports payment or honoraria for lectures from AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Novartis and Sanofi. F. Burgos reports consultancy fees from Medical Graphics Corporation Diagnostics. M. Contoli reports grants from the University of Ferrara, Chiesi and GlaxoSmithKline, consultancy fees and honoraria from AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline and Novartis, as well as support for attending meetings from Chiesi, AstraZeneca, GlaxoSmithKline and ALK-Abelló. W.D-C. Man is part funded by a NIHR Artificial Intelligence Award, and reports grants from the NIHR and British Lung Foundation, as well as honoraria from Mundipharma, Novartis, European Conference and Incentive Services DMC; and is the Honorary President of the Association for Respiratory Technology and Physiology (ARTP, UK). J.K. Quint reports grants from the MRC, HDR UK, GlaxoSmithKline, AstraZeneca and Chiesi, and consultancy fees from Insmed and Evidera. E. Vanderhelst reports grants from Chiesi, and consultancy fees and honoraria from Boehringer Ingelheim, Vertex and GlaxoSmithKline. M. Topalovic is part funded by a NIHR Artificial Intelligence Award, and is co-founder and shareholder of ArtiQ. W. Janssens reports grants from Chiesi and AstraZeneca, consultancy and lecture fees from AstraZeneca, Chiesi and GlaxoSmithKline, and he is co-founder and shareholder of ArtiQ. The remaining authors report no potential conflicts of interest.

Figures

FIGURE 1
FIGURE 1
a) A sample pulmonary function test (PFT) report (see [–18] for details) and b) explainable artificial intelligence (XAI)'s diagnostic suggestions with Shapley value (SV) evidence. The gold standard diagnosis was COPD based on emphysema on computed tomography scan and passive smoke exposure during childhood (normal α1-antitrypsin levels). In this case, XAI makes two diagnostic suggestions (COPD and other obstructive disease (OBD)) since the probability of the second disease (OBD) is >15%. Additionally, we show a normalised SV plot of the top five PFT indices that contributed towards the prediction of COPD and OBD, respectively. A positive SV (in green) is supporting evidence, while a negative SV (in red) is counter-evidence. BMI: body mass index; PY: pack-years; ILD: interstitial lung disease; NMD: neuromuscular disease; PVD: pulmonary vascular disease; TD: thoracic deformity; FEV1: forced expiratory volume in 1 s; FVC: forced vital capacity; KCO: transfer coefficient of the lung for carbon monoxide; FEF25–75%: forced expiratory flow at 25–75% of FVC; DLCO: diffusing capacity of the lung for carbon monoxide.
FIGURE 2
FIGURE 2
Percentage change of preferential and differential diagnostic performance between control (individual pulmonologists) and intervention (pulmonologists and explainable artificial intelligence (XAI)) in a) the phase 1 (P1) study with 16 pulmonologists and b) the phase 2 (P2) study with 62 pulmonologists. Boxes indicate median and interquartile range. ***: p<0.001; ****: p<0.0001. GS: gold standard.

Comment in

References

    1. Decramer M, Janssens W, Derom E, et al. . Contribution of four common pulmonary function tests to diagnosis of patients with respiratory symptoms: a prospective cohort study. Lancet Respir Med 2013; 1: 705–713. doi:10.1016/S2213-2600(13)70184-X - DOI - PubMed
    1. Ranu H, Wilde M, Madden B. Pulmonary function tests. Ulster Med J 2011; 80: 84–90. - PMC - PubMed
    1. Robert OC. Pulmonary-function testing. N Engl J Med 1994; 331: 25–30. doi:10.1056/NEJM199407073310107 - DOI - PubMed
    1. Pellegrino R, Viegi G, Brusasco V, et al. . Interpretative strategies for lung function tests. Eur Respir J 2005; 26: 948–968. doi:10.1183/09031936.05.00035205 - DOI - PubMed
    1. Johnson JD, Theurer WM. A stepwise approach to the interpretation of pulmonary function tests. Am Fam Physician 2014; 89: 359–366. - PubMed

Publication types