Detection of dementia on voice recordings using deep learning: a Framingham Heart Study
- PMID: 34465384
- PMCID: PMC8409004
- DOI: 10.1186/s13195-021-00888-3
Detection of dementia on voice recordings using deep learning: a Framingham Heart Study
Abstract
Background: Identification of reliable, affordable, and easy-to-use strategies for detection of dementia is sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data are not readily available.
Methods and findings: We used 1264 voice recordings of neuropsychological examinations administered to participants from the Framingham Heart Study (FHS), a community-based longitudinal observational study. The recordings were 73 min in duration, on average, and contained at least two speakers (participant and examiner). Of the total voice recordings, 483 were of participants with normal cognition (NC), 451 recordings were of participants with mild cognitive impairment (MCI), and 330 were of participants with dementia (DE). We developed two deep learning models (a two-level long short-term memory (LSTM) network and a convolutional neural network (CNN)), which used the audio recordings to classify if the recording included a participant with only NC or only DE and to differentiate between recordings corresponding to those that had DE from those who did not have DE (i.e., NDE (NC+MCI)). Based on 5-fold cross-validation, the LSTM model achieved a mean (±std) area under the receiver operating characteristic curve (AUC) of 0.740 ± 0.017, mean balanced accuracy of 0.647 ± 0.027, and mean weighted F1 score of 0.596 ± 0.047 in classifying cases with DE from those with NC. The CNN model achieved a mean AUC of 0.805 ± 0.027, mean balanced accuracy of 0.743 ± 0.015, and mean weighted F1 score of 0.742 ± 0.033 in classifying cases with DE from those with NC. For the task related to the classification of participants with DE from NDE, the LSTM model achieved a mean AUC of 0.734 ± 0.014, mean balanced accuracy of 0.675 ± 0.013, and mean weighted F1 score of 0.671 ± 0.015. The CNN model achieved a mean AUC of 0.746 ± 0.021, mean balanced accuracy of 0.652 ± 0.020, and mean weighted F1 score of 0.635 ± 0.031 in classifying cases with DE from those who were NDE.
Conclusion: This proof-of-concept study demonstrates that automated deep learning-driven processing of audio recordings of neuropsychological testing performed on individuals recruited within a community cohort setting can facilitate dementia screening.
Keywords: Dementia; Digital health; Machine learning; Neuropsychological testing; Voice recording.
© 2021. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
References
Publication types
MeSH terms
Grants and funding
- R01 AG054076/AG/NIA NIH HHS/United States
- R01 AG016495/AG/NIA NIH HHS/United States
- R01 AG049810/AG/NIA NIH HHS/United States
- RF1 AG062109/AG/NIA NIH HHS/United States
- HHSN268201500001I/HL/NHLBI NIH HHS/United States
- RF1 AG072654/AG/NIA NIH HHS/United States
- R21-CA253498/CA/NCI NIH HHS/United States
- N01HC25195/HL/NHLBI NIH HHS/United States
- R01 GM135930/GM/NIGMS NIH HHS/United States
- R56 AG062109/AG/NIA NIH HHS/United States
- P30 AG013846/AG/NIA NIH HHS/United States
- U24 DK115255/DK/NIDDK NIH HHS/United States
- U19 AG068753/AG/NIA NIH HHS/United States
- R01 AG008122/AG/NIA NIH HHS/United States
- P30 AG066546/AG/NIA NIH HHS/United States
- R01 AG033040/AG/NIA NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical
Research Materials
