Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jun;23(3):319-349.
doi: 10.1007/s10162-022-00846-2. Epub 2022 Apr 20.

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Affiliations
Review

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Blake S Wilson et al. J Assoc Res Otolaryngol. 2022 Jun.

Abstract

Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.

Keywords: Artificial intelligence; Auditory prostheses; Auditory system; Brain-computer interfaces; Cochlear implants; Deep learning; Hearing; Hearing aids; Hearing loss; Human communication; Laryngeal pathology; Machine learning; Neural prostheses; Neuroprostheses; Otolaryngology; Speech perception; Speech production; Thyroid pathology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1
Fig. 1
Flyer for the symposium
Fig. 2
Fig. 2
Program for the symposium
Fig. 3
Fig. 3
Speech decoding approaches using brain activity and artificial intelligence. To decode speech from neural activity in the brain, two general types of approaches can be used: an approach to synthesize speech or an approach to decode text. In both approaches, neural signals are acquired from a human user and the relevant features are extracted from these signals. To synthesize speech, articulatory kinematic features can be inferred from the neural activity, and then these features can be used to synthesize speech. To decode text, sub-word (for example, phonetic) features can be inferred from the neural activity, and then language-modeling techniques can be used to decode text from these features. For either approach, neural features can be mapped directly to the target output using end-to-end modeling. It is possible to convert decoded text into synthesized speech using text-to-speech synthesis and vice versa using speech recognition. Artificial intelligence techniques can be used to enable or improve the quality of each step in this schematic
Fig. 4
Fig. 4
Artificial intelligence techniques for speech brain-computer interface (BCI) applications. Schematic depictions and brief descriptions are provided for artificial intelligence (AI) techniques to model speech-related neural activity. This is not an exhaustive compilation of all relevant AI techniques; multiple variants of a depicted technique may exist, and the techniques are depicted in an arbitrary order. The depicted techniques (and references for each technique) are recurrent neural network modeling (Hochreiter and Schmidhuber ; Gers et al. ; Berezutskaya et al. ; Anumanchipalli et al. ; Makin et al. ; Sun et al. ; Moses et al. 2021), temporal convolution (Zhang et al. ; Makin et al. ; Moses et al. 2021), end-to-end network modeling (Graves et al. ; Collobert et al. ; Oord et al. ; Kim et al. ; Wang et al. , ; Zhang et al. ; Makin et al. ; Sun et al. 2020), data augmentation (Krizhevsky et al. ; Moses et al. 2021), model ensembling (Sollich and Krogh ; Szegedy et al. ; Moses et al. 2021), multi-task learning (Caruana ; Szegedy et al. ; Kim et al. ; Makin et al. ; Sun et al. 2020), and transfer learning (Pratt et al. ; Caruana ; Makin et al. ; Peterson et al. 2021)
Fig. 5
Fig. 5
Accuracy of prediction by neural predictors and traditional characteristics. Panel A shows the median split of subjects’ improvement on the speech recognition index in quiet (SRI-Q) test. Panel B presents a comparison of prediction accuracy (Acc), sensitivity (Sens), specificity (spec), and area under the curve (AUC), which combines sensitivity and specificity, for brain-based models versus the characteristics of age at implant and residual hearing. Panel A is from Feng et al. (2018) and is reproduced here with permission
Fig. 6
Fig. 6
Brain connectivity predictive area mapping from diffusor tensor imaging (DTI) scans. Probability maps of combined fractional anisotropy, and radial and axial diffusivity, that demonstrate little overlap of brain areas predicting baseline and improvement at six months (blue = baseline; green = 6-month improvement; red = overlap); coronal slices of the brain are shown
Fig. 7
Fig. 7
Three artificial intelligence approaches to addressing the hearing-in-noise complaint by people with hearing loss; the label “Either” refers to “Yes” or “No”
Fig. 8
Fig. 8
How artificial intelligence might restore normal or nearly normal hearing in hearing aid or cochlear implant users ( Modified from Lesica , and presented here with permission)
Fig. 9
Fig. 9
Loss and accuracy functions for training of a 12-layer convolutional neural network to classify laryngeal images as normal versus cancerous; acc = accuracy and epochs indicate iterations in the training of the network
Fig. 10
Fig. 10
Normalized confusion matrix showing performance on a test set of 36 images of a convolutional neural network to classify laryngeal images as normal, benign, or suspicious for malignancy
Fig. 11
Fig. 11
The top panel shows a whole-slide cytopathology scan in which the image was produced with a stained smear of cells and other tissue obtained from a fine needle aspiration (FNA) biopsy of a thyroid nodule and the middle panel shows a heat map of predictions of regions of interest (ROIs) identified by the first of the two machine learning algorithms (MLAs) developed in the present study. The bottom panels show magnified images corresponding to the red rectangles in the top and middle panels. The figure is from Dov et al. , and is reproduced here with permission
Fig. 12
Fig. 12
Receiver operating characteristic (ROC) curves for three expert pathologists (experts 1–3), the expert pathologists whose reports were included in the electronic medical records for the patients (MR), and the cascade of the two machine learning algorithms (proposed). The experts and the algorithms used the five diagnostic categories II–VI of the Bethesda System (TBS) as outcome measures (the first category is a non-diagnostic category). The areas under the ROC curves (auc) indicate performance. TPR = true positive rate (or sensitivity) and FPR = false positive rate (or 1.0 – specificity). The illustration is from Dov et al. (2021) and is reproduced here with permission. A description of ROC curves and their meanings is presented in Kumar and Indrayan (2011)

References

    1. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. doi: 10.1038/s41746-018-0040-6. - DOI - PMC - PubMed
    1. Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng. 2019;16(3):036019. doi: 10.1088/1741-2552/ab0c59. - DOI - PMC - PubMed
    1. Angrick M, Ottenhoff MC, Diener L, Ivucic D, Ivucic G, Goulis S, Saal J, Colon AJ, Wagner L, Krusienski DJ, Kubben PL, Schultz T, Herff C. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun Biol. 2021;4(1):1055–1055. doi: 10.1038/s42003-021-02578-0. - DOI - PMC - PubMed
    1. Anon. Listen to this. Nat Mach Intell. 2021;3(2):101. doi: 10.1038/s42256-021-00313-2. - DOI
    1. Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019;568(7753):493. doi: 10.1038/s41586-019-1119-1. - DOI - PMC - PubMed

Publication types