MedVidDeID: Protecting privacy in clinical encounter video recordings
- PMID: 40886813
- DOI: 10.1016/j.jbi.2025.104901
MedVidDeID: Protecting privacy in clinical encounter video recordings
Abstract
Objective: The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.
Methods: We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.
Results: In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.
Conclusion: The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.
Keywords: Audio-video; De-identification; Healthcare; Privacy; Protected health information.
Copyright © 2025. Published by Elsevier Inc.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
LinkOut - more resources
Full Text Sources