Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 27:5:1372814.
doi: 10.3389/fpain.2024.1372814. eCollection 2024.

Multimodal automatic assessment of acute pain through facial videos and heart rate signals utilizing transformer-based architectures

Affiliations

Multimodal automatic assessment of acute pain through facial videos and heart rate signals utilizing transformer-based architectures

Stefanos Gkikas et al. Front Pain Res (Lausanne). .

Abstract

Accurate and objective pain evaluation is crucial in developing effective pain management protocols, aiming to alleviate distress and prevent patients from experiencing decreased functionality. A multimodal automatic assessment framework for acute pain utilizing video and heart rate signals is introduced in this study. The proposed framework comprises four pivotal modules: the Spatial Module, responsible for extracting embeddings from videos; the Heart Rate Encoder, tasked with mapping heart rate signals into a higher dimensional space; the AugmNet, designed to create learning-based augmentations in the latent space; and the Temporal Module, which utilizes the extracted video and heart rate embeddings for the final assessment. The Spatial-Module undergoes pre-training on a two-stage strategy: first, with a face recognition objective learning universal facial features, and second, with an emotion recognition objective in a multitask learning approach, enabling the extraction of high-quality embeddings for the automatic pain assessment. Experiments with the facial videos and heart rate extracted from electrocardiograms of the BioVid database, along with a direct comparison to 29 studies, demonstrate state-of-the-art performances in unimodal and multimodal settings, maintaining high efficiency. Within the multimodal context, 82.74% and 39.77% accuracy were achieved for the binary and multi-level pain classification task, respectively, utilizing 9.62 million parameters for the entire framework.

Keywords: ECG; data fusion; deep learning; pain recognition; vision transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
ECG signal preprocessing stages (43). (1st row) Raw ECG signal. (2nd row, left) Signal after band-pass filtering (BPF) to isolate the frequency range of interest. (2nd row, right) Signal post-derivative filtering to highlight the QRS complex. (3rd row, left) Squared signal to accentuate dominant peaks. (3rd row, right) Moving window average applied to the squared signal, illustrating the final signal (formula image) with identified R peaks (formula image), noise level (– –), signal level (formula image), and adaptive thresholding (formula image).
Figure 2
Figure 2
Overview of the proposed framework for automatic pain assessment. (A) Video analysis pipeline. (B) ECG analysis pipeline. (C) Fusion analysis pipeline.
Figure 3
Figure 3
Comparison of average accuracy and inference time for unimodal and multimodal methodologies across NP vs. P4 and MC tasks. Note: The plot employs a dual-y-axis format (left for accuracy, right for time) to illustrate the relation between performance and efficiency, with methodologies listed on the x-axis.
Figure 4
Figure 4
(A) Attention maps from the Spatial-Module. (B) Attention maps from the Temporal-Module. Yellow and red colors indicate high attention to the particular region. (A) (1st row) Original frame sequence. (2nd row) Computed from the Spatial-Module following the first stage pretraining. (3rd row) Computed from the Spatial-Module following the second stage pretraining. (4th row) Computed from the Spatial-Module trained on BioVid. (B) (1st row) Computed from the Temporal-Module with video embedding. (2nd row) Computed from the Temporal-Module with heart rate embedding. (3rd row) Computed from the Temporal-Module with fused (video & heart rate) embedding.
Figure A1
Figure A1
Attention maps from the Spatial-Module. Yellow and red color indicates high attention to the particular region. (1st row) Original frame sequence. (2nd row) Computed from the Spatial-Module following the first stage pretraining. (3rd row) Computed from the Spatial-Module following the second stage pretraining. (4th row) Computed from the Spatial-Module trained on BioVid.
Figure A2
Figure A2
Attention maps from the Spatial-Module. Yellow and red color indicates high attention to the particular region. (1st row) Original frame sequence. (2nd row) Computed from the Spatial-Module following the first stage pretraining. (3rd row) Computed from the Spatial-Module following the second stage pretraining. (4th row) Computed from the Spatial-Module trained on BioVid.

Similar articles

Cited by

References

    1. Williams ACDC, Craig KD. Updating the definition of pain. Pain. (2016) 157(11):2420–3. 10.1097/j.pain.0000000000000613 - DOI - PubMed
    1. Khalid S, Tubbs RS. Neuroanatomy, neuropsychology of pain. Cureus. (2017) 9(10). 10.7759/CUREUS.1754 - DOI - PMC - PubMed
    1. Turk DC, Melzack R. The measurement of pain, the assessment of people experiencing pain. In: Handbook of Pain Assessment. The Guilford Press (2011). p. 3–16.
    1. Sinatra R. Causes, consequences of inadequate management of acute pain. Pain Med. (2010) 11(12):1859–71. 10.1111/j.1526-4637.2010.00983.x - DOI - PubMed
    1. De Ruddere L, Tait R. Facing Others in Pain: Why Context Matters. Cham: Springer International Publishing; (2018). p. 241–69.

LinkOut - more resources