Review

. 2022 Jun;23(3):319-349.

doi: 10.1007/s10162-022-00846-2. Epub 2022 Apr 20.

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Blake S Wilson^{1

2

3

4

5}, Debara L Tucci^{6

7}, David A Moses^{8

9}, Edward F Chang^{8

9}, Nancy M Young^{10

11

12}, Fan-Gang Zeng^{13

14

15

16

17}, Nicholas A Lesica¹⁸, Andrés M Bur¹⁹, Hannah Kavookjian¹⁹, Caroline Mussatto¹⁹, Joseph Penn¹⁹, Sara Goodwin¹⁹, Shannon Kraft¹⁹, Guanghui Wang²⁰, Jonathan M Cohen^{6

21}, Geoffrey S Ginsburg^{22

23

24

25

26

27}, Geraldine Dawson^{28

29

30}, Howard W Francis⁶

Affiliations

¹ Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC, 27710, USA. blake.wilson@duke.edu.
² Duke Hearing Center, Duke University School of Medicine, Durham, NC, 27710, USA. blake.wilson@duke.edu.
³ Department of Electrical & Computer Engineering, Duke University, Durham, NC, 27708, USA. blake.wilson@duke.edu.
⁴ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA. blake.wilson@duke.edu.
⁵ Department of Otolaryngology - Head & Neck Surgery, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA. blake.wilson@duke.edu.
⁶ Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC, 27710, USA.
⁷ National Institute On Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD, 20892, USA.
⁸ Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, 94143, USA.
⁹ UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, 94117, USA.
¹⁰ Division of Otolaryngology, Ann and Robert H. Lurie Childrens Hospital of Chicago, Chicago, IL, 60611, USA.
¹¹ Department of Otolaryngology - Head and Neck Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
¹² Department of Communication, Knowles Hearing Center, Northwestern University, Evanston, IL, 60208, USA.
¹³ Center for Hearing Research, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁴ Department of Anatomy and Neurobiology, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁵ Department of Biomedical Engineering, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁶ Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁷ Department of Otolaryngology - Head and Neck Surgery, University of California, Irvine, CA, 92697, USA.
¹⁸ UCL Ear Institute, University College London, London, WC1X 8EE, UK.
¹⁹ Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS, 66160, USA.
²⁰ Department of Computer Science, Ryerson University, Toronto, ON, M5B 2K3, Canada.
²¹ ENT Department, Kaplan Medical Center, 7661041, Rehovot, Israel.
²² Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
²³ MEDx (Medicine & Engineering at Duke), Duke University, Durham, NC, 27708, USA.
²⁴ Center for Applied Genomics & Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁵ Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁶ Department of Pathology, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁷ Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁸ Duke Institute for Brain Sciences, Duke University, Durham, NC, 27710, USA.
²⁹ Duke Center for Autism and Brain Development, Duke University School of Medicine and the Duke Institute for Brain Sciences, NIH Autism Center of Excellence, Durham, NC, 27705, USA.
³⁰ Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, 27701, USA.

PMID: 35441936
PMCID: PMC9086071
DOI: 10.1007/s10162-022-00846-2

Review

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Blake S Wilson et al. J Assoc Res Otolaryngol. 2022 Jun.

. 2022 Jun;23(3):319-349.

doi: 10.1007/s10162-022-00846-2. Epub 2022 Apr 20.

Authors

Affiliations

¹ Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC, 27710, USA. blake.wilson@duke.edu.
² Duke Hearing Center, Duke University School of Medicine, Durham, NC, 27710, USA. blake.wilson@duke.edu.
³ Department of Electrical & Computer Engineering, Duke University, Durham, NC, 27708, USA. blake.wilson@duke.edu.
⁴ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA. blake.wilson@duke.edu.
⁵ Department of Otolaryngology - Head & Neck Surgery, University of North Carolina, Chapel Hill, Chapel Hill, NC, 27599, USA. blake.wilson@duke.edu.
⁶ Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC, 27710, USA.
⁷ National Institute On Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD, 20892, USA.
⁸ Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, 94143, USA.
⁹ UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, 94117, USA.
¹⁰ Division of Otolaryngology, Ann and Robert H. Lurie Childrens Hospital of Chicago, Chicago, IL, 60611, USA.
¹¹ Department of Otolaryngology - Head and Neck Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
¹² Department of Communication, Knowles Hearing Center, Northwestern University, Evanston, IL, 60208, USA.
¹³ Center for Hearing Research, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁴ Department of Anatomy and Neurobiology, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁵ Department of Biomedical Engineering, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁶ Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, 92697, USA.
¹⁷ Department of Otolaryngology - Head and Neck Surgery, University of California, Irvine, CA, 92697, USA.
¹⁸ UCL Ear Institute, University College London, London, WC1X 8EE, UK.
¹⁹ Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS, 66160, USA.
²⁰ Department of Computer Science, Ryerson University, Toronto, ON, M5B 2K3, Canada.
²¹ ENT Department, Kaplan Medical Center, 7661041, Rehovot, Israel.
²² Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
²³ MEDx (Medicine & Engineering at Duke), Duke University, Durham, NC, 27708, USA.
²⁴ Center for Applied Genomics & Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁵ Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁶ Department of Pathology, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁷ Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, 27710, USA.
²⁸ Duke Institute for Brain Sciences, Duke University, Durham, NC, 27710, USA.
²⁹ Duke Center for Autism and Brain Development, Duke University School of Medicine and the Duke Institute for Brain Sciences, NIH Autism Center of Excellence, Durham, NC, 27705, USA.
³⁰ Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, 27701, USA.

PMID: 35441936
PMCID: PMC9086071
DOI: 10.1007/s10162-022-00846-2

Abstract

Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.

Keywords: Artificial intelligence; Auditory prostheses; Auditory system; Brain-computer interfaces; Cochlear implants; Deep learning; Hearing; Hearing aids; Hearing loss; Human communication; Laryngeal pathology; Machine learning; Neural prostheses; Neuroprostheses; Otolaryngology; Speech perception; Speech production; Thyroid pathology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 3**
Speech decoding approaches using brain activity and artificial intelligence. To decode speech from neural activity in the brain, two general types of approaches can be used: an approach to synthesize speech or an approach to decode text. In both approaches, neural signals are acquired from a human user and the relevant features are extracted from these signals. To synthesize speech, articulatory kinematic features can be inferred from the neural activity, and then these features can be used to synthesize speech. To decode text, sub-word (for example, phonetic) features can be inferred from the neural activity, and then language-modeling techniques can be used to decode text from these features. For either approach, neural features can be mapped directly to the target output using end-to-end modeling. It is possible to convert decoded text into synthesized speech using text-to-speech synthesis and vice versa using speech recognition. Artificial intelligence techniques can be used to enable or improve the quality of each step in this schematic

**Fig. 4**
Artificial intelligence techniques for speech brain-computer interface (BCI) applications. Schematic depictions and brief descriptions are provided for artificial intelligence (AI) techniques to model speech-related neural activity. This is not an exhaustive compilation of all relevant AI techniques; multiple variants of a depicted technique may exist, and the techniques are depicted in an arbitrary order. The depicted techniques (and references for each technique) are recurrent neural network modeling (Hochreiter and Schmidhuber ; Gers et al. ; Berezutskaya et al. ; Anumanchipalli et al. ; Makin et al. ; Sun et al. ; Moses et al. 2021), temporal convolution (Zhang et al. ; Makin et al. ; Moses et al. 2021), end-to-end network modeling (Graves et al. ; Collobert et al. ; Oord et al. ; Kim et al. ; Wang et al. , ; Zhang et al. ; Makin et al. ; Sun et al. 2020), data augmentation (Krizhevsky et al. ; Moses et al. 2021), model ensembling (Sollich and Krogh ; Szegedy et al. ; Moses et al. 2021), multi-task learning (Caruana ; Szegedy et al. ; Kim et al. ; Makin et al. ; Sun et al. 2020), and transfer learning (Pratt et al. ; Caruana ; Makin et al. ; Peterson et al. 2021)

**Fig. 5**
Accuracy of prediction by neural predictors and traditional characteristics. Panel A shows the median split of subjects’ improvement on the speech recognition index in quiet (SRI-Q) test. Panel B presents a comparison of prediction accuracy (Acc), sensitivity (Sens), specificity (spec), and area under the curve (AUC), which combines sensitivity and specificity, for brain-based models versus the characteristics of age at implant and residual hearing. Panel A is from Feng et al. (2018) and is reproduced here with permission

**Fig. 6**
Brain connectivity predictive area mapping from diffusor tensor imaging (DTI) scans. Probability maps of combined fractional anisotropy, and radial and axial diffusivity, that demonstrate little overlap of brain areas predicting baseline and improvement at six months (blue = baseline; green = 6-month improvement; red = overlap); coronal slices of the brain are shown

**Fig. 7**
Three artificial intelligence approaches to addressing the hearing-in-noise complaint by people with hearing loss; the label “Either” refers to “Yes” or “No”

**Fig. 8**
How artificial intelligence might restore normal or nearly normal hearing in hearing aid or cochlear implant users ( Modified from Lesica , and presented here with permission)

**Fig. 9**
Loss and accuracy functions for training of a 12-layer convolutional neural network to classify laryngeal images as normal versus cancerous; acc = accuracy and epochs indicate iterations in the training of the network

**Fig. 10**
Normalized confusion matrix showing performance on a test set of 36 images of a convolutional neural network to classify laryngeal images as normal, benign, or suspicious for malignancy

**Fig. 11**
The top panel shows a whole-slide cytopathology scan in which the image was produced with a stained smear of cells and other tissue obtained from a fine needle aspiration (FNA) biopsy of a thyroid nodule and the middle panel shows a heat map of predictions of regions of interest (ROIs) identified by the first of the two machine learning algorithms (MLAs) developed in the present study. The bottom panels show magnified images corresponding to the red rectangles in the top and middle panels. The figure is from Dov et al. , and is reproduced here with permission

**Fig. 12**
Receiver operating characteristic (ROC) curves for three expert pathologists (experts 1–3), the expert pathologists whose reports were included in the electronic medical records for the patients (MR), and the cascade of the two machine learning algorithms (proposed). The experts and the algorithms used the five diagnostic categories II–VI of the Bethesda System (TBS) as outcome measures (the first category is a non-diagnostic category). The areas under the ROC curves (auc) indicate performance. TPR = true positive rate (or sensitivity) and FPR = false positive rate (or 1.0 – specificity). The illustration is from Dov et al. (2021) and is reproduced here with permission. A description of ROC curves and their meanings is presented in Kumar and Indrayan (2011)

See this image and copyright information in PMC

References

1. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. doi: 10.1038/s41746-018-0040-6. - DOI - PMC - PubMed
1. Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng. 2019;16(3):036019. doi: 10.1088/1741-2552/ab0c59. - DOI - PMC - PubMed
1. Angrick M, Ottenhoff MC, Diener L, Ivucic D, Ivucic G, Goulis S, Saal J, Colon AJ, Wagner L, Krusienski DJ, Kubben PL, Schultz T, Herff C. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun Biol. 2021;4(1):1055–1055. doi: 10.1038/s42003-021-02578-0. - DOI - PMC - PubMed
1. Anon. Listen to this. Nat Mach Intell. 2021;3(2):101. doi: 10.1038/s42256-021-00313-2. - DOI
1. Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019;568(7753):493. doi: 10.1038/s41586-019-1119-1. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Affiliations

Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources