Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 10:2025:235-241.
eCollection 2025.

Toward Automated Clinical Transcriptions

Affiliations

Toward Automated Clinical Transcriptions

Mitchell A Klusty et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

Administrative documentation is a major driver of rising healthcare costs and is linked to adverse outcomes, including physician burnout and diminished quality of care. This paper introduces a secure system that applies recent advancements in speech-to-text transcription and speaker-labeling (diarization) to patient-provider conversations. This system is optimized to produce accurate transcriptions and highlight potential errors to promote rapid human verification, further reducing the necessary manual effort. Applied to over 40 hours of simulated conversations, this system offers a promising foundation for automating clinical transcriptions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of the full system showing how each individual component interacts
Figure 2
Figure 2
Example showing calculation of transcription-diarization overlap to predict the speaker
Figure 3
Figure 3
A graph showing the distributions of Word Error Rates
Figure 4
Figure 4
A graph showing the distributions of mislabeled speakers
Figure 5
Figure 5
Pie charts detailing the percentages of words in the original text and transcribed text
Figure 6
Figure 6
Bar graph showing the breakdown of incorrect words in the transcription and the percentage of words with the speaker mislabeled, separated by domain

References

    1. Bredin H. pyannote-audio [Internet] GitHub. 2023 [cited 2024 Sep 16] Available from: https://github.com/pyannote/pyannote-audio .
    1. Radford A, Kim J. W, Xu T, Brockman G, McLeavey C, Sutskever I. Robust speech recognition via large-scale weak supervision [Internet] arXiv. 2022 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2212.04356 .
    1. MinIO documentation MinIO for kubernetes [Internet] 2023 [cited 2024 Sep 16] Available from: https://min.io/docs/minio/kubernetes/upstream/
    1. ClearML documentation [Internet] 2023 [cited 2024 Sep 16] Available from: https://clear.ml/docs/latest/docs/
    1. Bredin H. Pyannote speaker-diarization-3.1 [Internet] 2024 [cited 2024 Sep 16] Available from: https://huggingface.co/pyannote/speaker-diarization-3.1 .

LinkOut - more resources