Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 2;6(1):104.
doi: 10.1038/s41746-023-00838-3.

DeepBreath-automated detection of respiratory pathology from lung auscultation in 572 pediatric outpatients across 5 countries

Collaborators, Affiliations

DeepBreath-automated detection of respiratory pathology from lung auscultation in 572 pediatric outpatients across 5 countries

Julien Heitmann et al. NPJ Digit Med. .

Abstract

The interpretation of lung auscultation is highly subjective and relies on non-specific nomenclature. Computer-aided analysis has the potential to better standardize and automate evaluation. We used 35.9 hours of auscultation audio from 572 pediatric outpatients to develop DeepBreath : a deep learning model identifying the audible signatures of acute respiratory illness in children. It comprises a convolutional neural network followed by a logistic regression classifier, aggregating estimates on recordings from eight thoracic sites into a single prediction at the patient-level. Patients were either healthy controls (29%) or had one of three acute respiratory illnesses (71%) including pneumonia, wheezing disorders (bronchitis/asthma), and bronchiolitis). To ensure objective estimates on model generalisability, DeepBreath is trained on patients from two countries (Switzerland, Brazil), and results are reported on an internal 5-fold cross-validation as well as externally validated (extval) on three other countries (Senegal, Cameroon, Morocco). DeepBreath differentiated healthy and pathological breathing with an Area Under the Receiver-Operator Characteristic (AUROC) of 0.93 (standard deviation [SD] ± 0.01 on internal validation). Similarly promising results were obtained for pneumonia (AUROC 0.75 ± 0.10), wheezing disorders (AUROC 0.91 ± 0.03), and bronchiolitis (AUROC 0.94 ± 0.02). Extval AUROCs were 0.89, 0.74, 0.74 and 0.87 respectively. All either matched or were significant improvements on a clinical baseline model using age and respiratory rate. Temporal attention showed clear alignment between model prediction and independently annotated respiratory cycles, providing evidence that DeepBreath extracts physiologically meaningful representations. DeepBreath provides a framework for interpretable deep learning to identify the objective audio signatures of respiratory pathology.

PubMed Disclaimer

Conflict of interest statement

A.Ge. and A.P. intend to develop a smart stethoscope ‘Onescope’, which may be commercialised. All other authors declare no Competing Financial or Non-Financial Interests.

Figures

Fig. 1
Fig. 1. DeepBreath ROC curve for binary classifiers on internal and external validation data.
Each panel shows the ROC curves of one binary classifier: a Control, b Pneumonia, c Wheezing Disorder, d Bronchiolitis. Every iteration of nested CV yields a different model, which produces a receiver-operating characteristic (ROC) curve for the internal test fold. The mean was computed over the obtained ROC curves. For the external validation data, predictions are averaged across the nested CV models, and a single ROC curve is computed. Internal validation is performed on various test folds from the Geneva and Porto Alegre data (blue). External validation (green) is performed on independently collected unseen data from Dakar, Marrakesh, Rabat and Yaoundé.
Fig. 2
Fig. 2. DeepBreath confusion matrices for multi-class predictions.
A confusion matrix was computed for every CV model on the corresponding test fold. a The internal confusion matrix was then obtained by taking the average of these intermediate confusion matrices. b The external confusion matrix is computed on the aggregated patient predictions (ensemble output). The rows are normalized to add up to 1.
Fig. 3
Fig. 3. Example attention curves, returned by the CNN classifier that discriminates healthy recordings from pathological ones.
The attention curves are overlaid on the recording spectrograms, which are given as inputs to the CNN classifier. Depending on the MAD value, either inspiration or expiration phases are provided as a reference. The respiration phases were extracted from the recording annotations. a The recording has a negative MAD, and its attention curve is shown with the inspiration phases. b The recording has a positive MAD, and its attention curve is shown with the expiration phases.
Fig. 4
Fig. 4. Optimal duration and combination of inference audio.
Each graph represents a trained binary model for a Control, b Pneumonia, c Wheezing Disorders and d Bronchiolitis. Each solid line is the AUROC performance resulting from the external validation data comprising one of four combinations of anatomical positions (1, 2, 4 and 8). The AUROC is plotted over the duration of the test set samples in seconds ranging from 2.5 to 30. The dashed lines show the performances of the clinical baseline models that only use age and respiratory rate.
Fig. 5
Fig. 5. Dataset partitioning strategy.
Geneva (GVA) and Porto Alegre (POA) are used for training, internal validation (tuning), model selection and testing. External validation is performed on independently collected recordings from Dakar (DKR), Marrakesh (MAR), Rabat (RBA) and Yaoundé (YAO). CV Cross Validation.
Fig. 6
Fig. 6. Overview of the DeepBreath binary classification model.
This binary classification architecture is trained for each of the four diagnostic classes. Top to bottom: a Data collection. Every patient has 8 lung audio recordings acquired at the indicated anatomical sites. b Pre-processing. A band-pass filter is applied to clips before transformation to log-mel spectograms which are batch-normalized and augmented and then fed into an (c) Audio classifier. Here, a CNN outputs both a segment-level prediction and attention values which are aggregated into a single clip-wise output for each site. These are then (d) Aggregated by concatenation to obtain a feature vector of size 8 for every patient, which is evaluated by a logistic regression. Finally (e) Patient-level classification is performed by thresholding to get a binary output. The segment-wise outputs of the audio classifier are extracted for further analysis. Note that the way the 5-second frames are created during training is not shown here (zero padding or random start).
Fig. 7
Fig. 7. Feature construction for multi-class classification.
For every patient, the four binary models produce a feature vector of size 8, corresponding to the predictions of the recordings from the 8 anatomical sites. Those feature vectors are concatenated to form a prediction array of size 4 (classes) × 8 (sites). Then, the following operations are applied to the prediction array: (a) Column normalization of the prediction array (b) Flattening to obtain a feature vector of size 32. The final feature vector is than given as input to the multinomial logistic regression.

References

    1. Hafke-Dys H, Bręborowicz A, Kleka P, Kociński J, Biniakowski A. The accuracy of lung auscultation in the practice of physicians and medical students. PloS One. 2019;14:e0220606. doi: 10.1371/journal.pone.0220606. - DOI - PMC - PubMed
    1. Sarkar M, Madabhavi I, Niranjan N, Dogra M. Auscultation of the respiratory system. Ann. Thorac. Med. 2015;10:158. doi: 10.4103/1817-1737.160831. - DOI - PMC - PubMed
    1. Abdel-Hamid O, et al. Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech, Language Process. 2014;22:1533–1545. doi: 10.1109/TASLP.2014.2339736. - DOI
    1. Kong Q, et al. PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transac Audio Speech Language Processing. 2020;28:2880–2894. doi: 10.1109/TASLP.2020.3030497. - DOI
    1. Hershey, S. et al. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), 131–135 (IEEE, 2017).