Continuous speech recognition for clinicians

A Zafar¹, J M Overhage, C J McDonald

Affiliations

PMID: 10332653
PMCID: PMC61360
DOI: 10.1136/jamia.1999.0060195

Continuous speech recognition for clinicians

A Zafar et al. J Am Med Inform Assoc. 1999 May-Jun.

. 1999 May-Jun;6(3):195-204.

doi: 10.1136/jamia.1999.0060195.

Authors

A Zafar¹, J M Overhage, C J McDonald

Affiliation

¹ Indiana University, Regenstrief Institute for Health Care, Indianapolis 46202-2859, USA. zafar_a@regenstrief.iupui.edu

PMID: 10332653
PMCID: PMC61360
DOI: 10.1136/jamia.1999.0060195

Abstract

The current generation of continuous speech recognition systems claims to offer high accuracy (greater than 95 percent) speech recognition at natural speech rates (150 words per minute) on low-cost (under $2000) platforms. This paper presents a state-of-the-technology summary, along with insights the authors have gained through testing one such product extensively and other products superficially. The authors have identified a number of issues that are important in managing accuracy and usability. First, for efficient recognition users must start with a dictionary containing the phonetic spellings of all words they anticipate using. The authors dictated 50 discharge summaries using one inexpensive internal medicine dictionary ($30) and found that they needed to add an additional 400 terms to get recognition rates of 98 percent. However, if they used either of two more expensive and extensive commercial medical vocabularies ($349 and $695), they did not need to add terms to get a 98 percent recognition rate. Second, users must speak clearly and continuously, distinctly pronouncing all syllables. Users must also correct errors as they occur, because accuracy improves with error correction by at least 5 percent over two weeks. Users may find it difficult to train the system to recognize certain terms, regardless of the amount of training, and appropriate substitutions must be created. For example, the authors had to substitute "twice a day" for "bid" when using the less expensive dictionary, but not when using the other two dictionaries. From trials they conducted in settings ranging from an emergency room to hospital wards and clinicians' offices, they learned that ambient noise has minimal effect. Finally, they found that a minimal "usable" hardware configuration (which keeps up with dictation) comprises a 300-MHz Pentium processor with 128 MB of RAM and a "speech quality" sound card (e.g., SoundBlaster, $99). Anything less powerful will result in the system lagging behind the speaking rate. The authors obtained 97 percent accuracy with just 30 minutes of training when using the latest edition of one of the speech recognition systems supplemented by a commercial medical dictionary. This technology has advanced considerably in recent years and is now a serious contender to replace some or all of the increasingly expensive alternative methods of dictation with human transcription.

PubMed Disclaimer

Figures

**Figure 1**
Spectral analysis of the words “free speech” as spoken by an author (A.Z.): *top*, the raw speech waveform; *bottom*, the power spectrum. Notice how the areas that correspond to the phoneme ee look similar and generate two resonance frequencies (formants) in the power spectrum. Also notice how the consonant sounds p and c produce a relative pause in the power spectrum.

See this image and copyright information in PMC

References

1. Chin HL, Krall M. Implementation of a comprehensive computer-based patient record system in Kaiser Permanente's northwest region. MD Comput. 1997;1(1):41-5. - PubMed
1. Sands DZ, Rind DM, Vieira C, Safran C. Going paperless: can it be done? Proc AMIA Annu Fall Symp. 1997:887.
1. McDonald CJ, Overhage JM, Tierney WM, et al. The Regenstrief Medical Record System 1998: a system for city-wide computing. Proc AMIA Annu Fall Symp. 1998:1114.
1. Tang PC, Boggs B, Fellencer C, et al. Northwestern Memorial Hospital CPR Recognition Award of Excellence. Proc Computer-based Patient Record Institute Symp. 1998:9-53.
1. Leming BW, Simon M, Jackson JD, Horowitz GL, Bleich HL. Advances in radiologic reporting with computerized language information processing (CLIP). Radiology. 1979; 133(2):349-53. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Continuous speech recognition for clinicians

Affiliation

Continuous speech recognition for clinicians

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous