. 2019 Dec 2;40(11):115001.

doi: 10.1088/1361-6579/ab525c.

Cardio-respiratory signal extraction from video camera data for continuous non-contact vital sign monitoring using deep learning

Sitthichok Chaichulee¹, Mauricio Villarroel, João Jorge, Carlos Arteta, Kenny McCormick, Andrew Zisserman, Lionel Tarassenko

Affiliations

Affiliation

¹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, United Kingdom. Author to whom any correspndence should be addressed.

PMID: 31661680
PMCID: PMC7655150
DOI: 10.1088/1361-6579/ab525c

Cardio-respiratory signal extraction from video camera data for continuous non-contact vital sign monitoring using deep learning

Sitthichok Chaichulee et al. Physiol Meas. 2019.

. 2019 Dec 2;40(11):115001.

doi: 10.1088/1361-6579/ab525c.

Authors

Sitthichok Chaichulee¹, Mauricio Villarroel, João Jorge, Carlos Arteta, Kenny McCormick, Andrew Zisserman, Lionel Tarassenko

Affiliation

¹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, United Kingdom. Author to whom any correspndence should be addressed.

PMID: 31661680
PMCID: PMC7655150
DOI: 10.1088/1361-6579/ab525c

Abstract

Non-contact vital sign monitoring enables the estimation of vital signs, such as heart rate, respiratory rate and oxygen saturation (SpO₂), by measuring subtle color changes on the skin surface using a video camera. For patients in a hospital ward, the main challenges in the development of continuous and robust non-contact monitoring techniques are the identification of time periods and the segmentation of skin regions of interest (ROIs) from which vital signs can be estimated. We propose a deep learning framework to tackle these challenges.

Approach: This paper presents two convolutional neural network (CNN) models. The first network was designed for detecting the presence of a patient and segmenting the patient's skin area. The second network combined the output from the first network with optical flow for identifying time periods of clinical intervention so that these periods can be excluded from the estimation of vital signs. Both networks were trained using video recordings from a clinical study involving 15 pre-term infants conducted in the high dependency area of the neonatal intensive care unit (NICU) of the John Radcliffe Hospital in Oxford, UK.

Main results: Our proposed methods achieved an accuracy of 98.8% for patient detection, a mean intersection-over-union (IOU) score of 88.6% for skin segmentation and an accuracy of 94.5% for clinical intervention detection using two-fold cross validation. Our deep learning models produced accurate results and were robust to different skin tones, changes in light conditions, pose variations and different clinical interventions by medical staff and family visitors.

Significance: Our approach allows cardio-respiratory signals to be continuously derived from the patient's skin during which the patient is present and no clinical intervention is undertaken.

PubMed Disclaimer

Figures

**Figure 1.**
The proposed framework consists of two deep learning networks: the patient detection and skin segmentation network; and the intervention detection network. These networks operate in sequence to identify appropriate time periods and ROIs from which vital signs can be estimated.

**Figure 2.**
Equipment set-up for video recording: (a) camera, recording workstation and incubator; and (b) sample video frame.

**Figure 3.**
The proposed patient detection and skin segmentation network has two output streams. The patient detection stream performs global average pooling over feature maps to predict the presence of the patient in the scene. The skin segmentation stream performs hierarchical upsampling of feature maps across the shared core network to produce a skin label. The network was designed to evaluate the skin segmentation stream only if the infant was present in the scene.

**Figure 4.**
Flowchart of semi-automatic skin annotation. Each annotator was asked to label skin areas in the first image of each session. The label was then propagated to the next frame using GMMs. The annotator can interact with seeds (green and red circles corresponding to skin or non-skin areas respectively) to modify the skin label for the new image frame.

**Figure 5.**
Lighting augmentation was applied to generate additional training images with different lighting conditions. The histogram of the average lighting components of all training images was divided into four uniform intervals. The mean of each interval was computed (marked with a red asterisk). Three additional images were generated by scaling the lighting component of the original image to the mean of the interval 2, 3, and 4 respectively.

**Figure 6.**
The proposed intervention detection network operates on a 5 s time window. The network consists of two input streams. The first input stream (context stream) processes a stack of skin confidence maps, produced by the patient detection and skin segmentation network. The second input stream (optical flow stream) handles a stack of dense optical flow. The outputs from both input streams are then combined to predict the occurrence of a clinical intervention in a given time window.

**Figure 7.**
The processing of the input to the optical flow input stream. For each time window of 5 s, six video frames were taken, one image per second. A total of five optical flow vectors were computed from each pair of consecutive video frames. The horizontal and vertical components of each optical flow were then stacked together.

**Figure 8.**
Example images for skin segmentation results.

**Figure 9.**
Extraction of PPGi and respiratory signals from segmented skin area. (a) Video frames with segmented skin area provided by our proposed framework. (b) Timeline of patient activities over a 60 min segment for a typical recording session, manually annotated over a minute-by-minute basis. (c) Timeline of predicted time periods for infant absence and clinical intervention provided by the proposed algorithms. (d) 60 min time series of the PPGi signal extracted from the mean pixel intensity of the entire segmented skin region in the green channel. (e) 60 min time series of the respiratory signal extracted from the area of the entire segmented skin region. (f) Comparison of non-contact PPGi, contact ECG, and contact PPG signals for the area highlighted in (d). Each signal contains 78 peaks corresponding to a heart rate of 156 beats min⁻¹. (g) Comparison of non-contact respiratory and contact impedance pneumographic (IP) signals for the area highlighted in (e). Each signal contains 35 peaks corresponding to a respiratory rate of 70 beats min⁻¹.

**Figure 10.**
Comparisons of non-contact and contact signals from different subjects. (a) Signals extracted from a mixed-race subject. Each cardiac signal contains 27 peaks corresponding to a heart rate of 162 beats min⁻¹. Each respiratory signal contains 13 peaks corresponding to a respiratory rate of 78 breath min⁻¹. (b) Signals extracted from a subject with dark skin. Each cardiac signal contains 24 peaks corresponding to a heart rate of 144 beats min⁻¹. Each respiratory signal contains nine peaks corresponding to a respiratory rate of 54 breath min⁻¹.

See this image and copyright information in PMC

Cited by

Novel approaches to capturing and using continuous cardiorespiratory physiological data in hospitalized children.
Walker SB, Badke CM, Carroll MS, Honegger KS, Fawcett A, Weese-Mayer DE, Sanchez-Pinto LN. Walker SB, et al. Pediatr Res. 2023 Jan;93(2):396-404. doi: 10.1038/s41390-022-02359-3. Epub 2022 Nov 3. Pediatr Res. 2023. PMID: 36329224 Review.
The effect of telemedicine employing telemonitoring instruments on readmissions of patients with heart failure and/or COPD: a systematic review.
Stergiopoulos GM, Elayadi AN, Chen ES, Galiatsatos P. Stergiopoulos GM, et al. Front Digit Health. 2024 Sep 25;6:1441334. doi: 10.3389/fdgth.2024.1441334. eCollection 2024. Front Digit Health. 2024. PMID: 39386390 Free PMC article.
Depth-Based Intervention Detection in the Neonatal Intensive Care Unit Using Vision Transformers.
Hajj-Ali Z, Dosso YS, Greenwood K, Harrold J, Green JR. Hajj-Ali Z, et al. Sensors (Basel). 2024 Dec 4;24(23):7753. doi: 10.3390/s24237753. Sensors (Basel). 2024. PMID: 39686290 Free PMC article.
Deep Learning Methods for Remote Heart Rate Measurement: A Review and Future Research Agenda.
Cheng CH, Wong KL, Chin JW, Chan TT, So RHY. Cheng CH, et al. Sensors (Basel). 2021 Sep 20;21(18):6296. doi: 10.3390/s21186296. Sensors (Basel). 2021. PMID: 34577503 Free PMC article. Review.
Emerging innovations in neonatal monitoring: a comprehensive review of progress and potential for non-contact technologies.
Krbec BA, Zhang X, Chityat I, Brady-Mine A, Linton E, Copeland D, Anthony BW, Edelman ER, Davis JM. Krbec BA, et al. Front Pediatr. 2024 Oct 14;12:1442753. doi: 10.3389/fped.2024.1442753. eCollection 2024. Front Pediatr. 2024. PMID: 39494377 Free PMC article. Review.

See all "Cited by" articles

References

1. Aarts L A M, Jeanne V, Cleary J P, Lieber C, Nelson J S, Bambang Oetomo S, Verkruysse W. Non-contact heart rate monitoring utilizing camera photoplethysmography in the neonatal intensive care unit. Early Hum. Dev. 2013;89:943–8. doi: 10.1016/j.earlhumdev.2013.09.016. - DOI - PubMed
1. Bianco S, Schettini R. Two new von Kries based chromatic adaptation transforms found by numerical optimization. Color Res. Appl. 2010;35:184–92. doi: 10.1002/col.20573. - DOI
1. Bishop C M. Pattern Recognition and Machine Learning. 6th edn. New York: Springer; 2006.
1. Breiman L. Random Forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. - DOI
1. Brox T, Bruhn A, Papenberg N, Weickert J. High accuracy optical flow estimation based on a theory for warping. Proc. European Conf. on Computer Vision; 2004. pp. pp 25–36. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cardio-respiratory signal extraction from video camera data for continuous non-contact vital sign monitoring using deep learning

Affiliation

Cardio-respiratory signal extraction from video camera data for continuous non-contact vital sign monitoring using deep learning

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical