Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings

Federica Bulgarelli¹, Elika Bergelson²

Affiliations

¹ Duke University, 417 Chapel Drive, Box 90086, Durham, NC, 27708-0086, USA. fedebul@gmail.com.
² Duke University, 417 Chapel Drive, Box 90086, Durham, NC, 27708-0086, USA.

PMID: 31342467
PMCID: PMC6980911
DOI: 10.3758/s13428-019-01265-7

Comparative Study

Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings

Federica Bulgarelli et al. Behav Res Methods. 2020 Apr.

. 2020 Apr;52(2):641-653.

doi: 10.3758/s13428-019-01265-7.

Authors

Federica Bulgarelli¹, Elika Bergelson²

Affiliations

¹ Duke University, 417 Chapel Drive, Box 90086, Durham, NC, 27708-0086, USA. fedebul@gmail.com.
² Duke University, 417 Chapel Drive, Box 90086, Durham, NC, 27708-0086, USA.

PMID: 31342467
PMCID: PMC6980911
DOI: 10.3758/s13428-019-01265-7

Abstract

The LENA system has revolutionized research on language acquisition, providing both a wearable device to collect day-long recordings of children's environments, and a set of automated outputs that process, identify, and classify speech using proprietary algorithms. This output includes information about input sources (e.g., adult male, electronics). While this system has been tested across a variety of settings, here we delve deeper into validating the accuracy and reliability of LENA's automated diarization, i.e., tags of who is talking. Specifically, we compare LENA's output with a gold standard set of manually generated talker tags from a dataset of 88 day-long recordings, taken from 44 infants at 6 and 7 months, which includes 57,983 utterances. We compare accuracy across a range of classifications from the original Lena Technical Report, alongside a set of analyses examining classification accuracy by utterance type (e.g., declarative, singing). Consistent with previous validations, we find overall high agreement between the human and LENA-generated speaker tags for adult speech in particular, with poorer performance identifying child, overlap, noise, and electronic speech (accuracy range across all measures: 0-92%). We discuss several clear benefits of using this automated system alongside potential caveats based on the error patterns we observe, concluding with implications for research using LENA-generated speaker tags.

Keywords: LENA system; LENA system reliability; Talker variability.

PubMed Disclaimer

Figures

**Figure 1 .**
Confusion matrix displaying recall for LENA-generated labels compared to Human-generated labels. Each column constitutes all of the instances labeled by human coders as belonging to that category. Each cell displays how LENA software tags were labeled for each human category, as well the total number of segments in each cell. Darker colors represent a higher proportion of LENA software tags.

**Figure 2 .**
Classification accuracy distribution by utterance-type across the four main categories: adult, child, electronic or overlap. The box plot reflects the median of the means for each infant for each utterance-type. Each point (jittered horizontally) represents one child; diamonds (unjittered) indicate outliers.

**Figure 3 .**
Confusion matrix displaying proportion correct (i.e. recall) for LENA-generated labels compared to Human-generated labels. Each column constitutes all of the instances labeled by human coders. Each cell displays how the LENA system tags were labeled for each human category, as well the total number of segments in each cell. Darker colors represent a higher proportion of LENA system tags.

**Figure 4 .**
Classification accuracy distribution by utterance-type. Each point (jittered horizontally) represents one child; diamonds (unjittered) indicate outliers. N.B. not all participants contributed data to each utterance type for each comparison.

See this image and copyright information in PMC

References

1. Aust F and Barth M (2018). papaja: Create APA manuscripts with R Markdown R package version 0.1.0.9842.
1. Bache SM and Wickham H (2014). magrittr: A Forward-Pipe Operator for R R package version 1.5.
1. Bergelson E (2017). Bergelson Seedlings HomeBank Corpus
1. Bergelson E, Amatuni A, Dailey S, Koorathota S, and Tor S (2018a). Day by day , hour by hour : Naturalistic language input to infants. Developmental Science, (June):1–10. - PMC - PubMed
1. Bergelson E and Aslin RN (2017). Nature and origins of the lexicon in 6-mo-olds. Proceedings of the National Academy of Sciences, page 201712966. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

DP5 OD019812/OD/NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings

Affiliations

Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources