. 2023 Jan;55(1):428-447.

doi: 10.3758/s13428-022-01832-5. Epub 2022 Apr 19.

Measuring event segmentation: An investigation into the stability of event boundary agreement across groups

Karen Sasmita¹, Khena M Swallow²

Affiliations

¹ Department of Psychology, Cornell University, 211 Uris Hall, Ithaca, NY, 14850, USA.
² Department of Psychology, Cornell University, 211 Uris Hall, Ithaca, NY, 14850, USA. kms424@cornell.edu.

PMID: 35441362
PMCID: PMC9017965
DOI: 10.3758/s13428-022-01832-5

Measuring event segmentation: An investigation into the stability of event boundary agreement across groups

Karen Sasmita et al. Behav Res Methods. 2023 Jan.

. 2023 Jan;55(1):428-447.

doi: 10.3758/s13428-022-01832-5. Epub 2022 Apr 19.

Authors

Karen Sasmita¹, Khena M Swallow²

Affiliations

¹ Department of Psychology, Cornell University, 211 Uris Hall, Ithaca, NY, 14850, USA.
² Department of Psychology, Cornell University, 211 Uris Hall, Ithaca, NY, 14850, USA. kms424@cornell.edu.

PMID: 35441362
PMCID: PMC9017965
DOI: 10.3758/s13428-022-01832-5

Abstract

People spontaneously divide everyday experience into smaller units (event segmentation). To measure event segmentation, studies typically ask participants to explicitly mark the boundaries between events as they watch a movie (segmentation task). Their data may then be used to infer how others are likely to segment the same movie. However, significant variability in performance across individuals could undermine the ability to generalize across groups, especially as more research moves online. To address this concern, we used several widely employed and novel measures to quantify segmentation agreement across different sized groups (n = 2-32) using data collected on different platforms and movie types (in-lab & commercial film vs. online & everyday activities). All measures captured nonrandom and video-specific boundaries, but with notable between-sample variability. Samples of 6-18 participants were required to reliably detect video-driven segmentation behavior within a single sample. As sample size increased, agreement values improved and eventually stabilized at comparable sample sizes for in-lab & commercial film data and online & everyday activities data. Stabilization occurred at smaller sample sizes when measures reflected (1) agreement between two groups versus agreement between an individual and group, and (2) boundary identification between small (fine-grained) rather than large (coarse-grained) events. These analyses inform the tailoring of sample sizes based on the comparison of interest, materials, and data collection platform. In addition to demonstrating the reliability of online and in-lab segmentation performance at moderate sample sizes, this study supports the use of segmentation data to infer when events are likely to be segmented.

Keywords: Event cognition; Event segmentation; Naturalistic perception; Online data collection; Segmentation agreement.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to declare.

Figures

**Fig. 1**
Illustration (A) and description (B) of group- and individual-level agreement measures. Group time series are illustrated as the density of button presses over time (peakiness, peak-to-peak distance, and surprise index) or as the proportion of participants that pressed a button within a 1-s-long time bin (agreement index). Individual time series are represented as vertical lines marking button presses at every 1-s time bin (for agreement index) or continuously over time (for surprise index). Normative boundaries are defined as the times of the highest n-peaks, where n = mean number of button presses

**Fig. 2**
(A) Example of density estimates with different bandwidth adjustments for small (n = 2; upper panel) and large (n = 32; lower panel) sample sizes. In all cases, the lower adjustment value (0.01) seems to capture individual button presses rather than the group’s consensus button presses. For the large sample size (lower panel), the middle and higher adjustment values do not strongly influence the shape of the peaks and valleys of the density estimate. However, for small sample size (upper panel), distinctive peaks and valleys are formed in the density estimate using the middle adjustment value (0.1 for coarse, 0.05 for fine). The highest adjustment value (0.2 for coarse and 0.1 for fine) reduces the difference between the peaks and valleys, and even eliminates several peaks (arrows). Therefore, we chose the middle bandwidth adjustment (0.1 for coarse and 0.05 for fine) for our density estimation for all sample sizes. (B) Examples of growth (left) and decay (right) function fits. Small dots represent the average agreement estimate for individual bootstrap iteration. Large dots represent the average agreement estimate across all bootstrap iterations with each sample size. Functions with the lowest BIC value were selected as the best-fitting curve

**Fig. 3**
Log₁₀-transformed peakiness values over increasing sample sizes for: (A) commercial-lab and (B) everyday-online data sets. Small shapes depict the values calculated from a single bootstrap iteration (subsample; only a randomly selected 10% of the bootstrapped values are plotted). Larger shapes depict the average value across all bootstrapping iterations. One low peakiness value and seven high peakiness values for coarse everyday activity segmentation were excluded from the plot due to the y-axis limit. Error bars represent 95% confidence interval and carets (^) represent the elbows

**Fig. 4**
Log₁₀-transformed peak-to-peak distance over increasing sample sizes for segmentation of: (A) commercial-lab and (B) everyday-online. Small shapes depict values calculated from a single bootstrap iteration (subsample; only a randomly selected 10% of the bootstrapped values are plotted). Larger shapes depict the average value across all bootstrapping iterations. The minimum and maximum values of the y-axis for each plot are adjusted between grains to better capture the degree of change in peak-to-peak distance values, but the ranges are kept consistent. Twelve high peak-to-peak distance values for coarse commercial-lab and five high peak-to-peak distance values for coarse everyday-online were excluded from the plot due to the limits set for the y-axes. Error bars represent 95% confidence intervals and carets (^) represent the elbows

**Fig. 5**
Agreement index over increasing sample size for segmentation in: (A) commercial-lab and (B) everyday-online. Small shapes depict values calculated from a single bootstrap iteration (subsample; only a randomly selected 10% of the bootstrapped values are plotted). Larger shapes depict the average value across all bootstrapping iterations. Error bars represent 95% confidence interval and carets (^) represent the elbows

**Fig. 6**
Surprise index over increasing sample size for segmentation in: (A) commercial-lab and (B) everyday-online. Small shapes depict values calculated from a single bootstrap iteration (subsample; only a randomly selected 10% of the bootstrapped values are plotted). Larger shapes depict the average value across all bootstrapping iterations. Error bars represent 95% confidence intervals and carets (^) represent the elbow.

See this image and copyright information in PMC

Cited by

Large language models can segment narrative events similarly to humans.
Michelmann S, Kumar M, Norman KA, Toneva M. Michelmann S, et al. Behav Res Methods. 2025 Jan 3;57(1):39. doi: 10.3758/s13428-024-02569-z. Behav Res Methods. 2025. PMID: 39751673
Neural state changes during movie watching relate to episodic memory in younger and older adults.
Henderson SE, Oetringer D, Geerligs L, Campbell KL. Henderson SE, et al. Cereb Cortex. 2025 May 1;35(5):bhaf114. doi: 10.1093/cercor/bhaf114. Cereb Cortex. 2025. PMID: 40386868 Free PMC article.
Eye movements as predictors of student experiences during nursing simulation learning events.
Mason ML, Vatral C, Cohn C, Davalos E, Jessee MA, Biswas G, Levin DT. Mason ML, et al. Cogn Res Princ Implic. 2025 Jul 1;10(1):37. doi: 10.1186/s41235-025-00640-7. Cogn Res Princ Implic. 2025. PMID: 40591191 Free PMC article.
People can reliably detect action changes and goal changes during naturalistic perception.
Su X, Swallow KM. Su X, et al. Mem Cognit. 2024 Jul;52(5):1093-1111. doi: 10.3758/s13421-024-01525-8. Epub 2024 Feb 5. Mem Cognit. 2024. PMID: 38315292
Language-agnostic, Automated Assessment of Listeners' Speech Recall Using Large Language Models.
Herrmann B. Herrmann B. Trends Hear. 2025 Jan-Dec;29:23312165251347131. doi: 10.1177/23312165251347131. Epub 2025 May 30. Trends Hear. 2025. PMID: 40448324 Free PMC article.

See all "Cited by" articles

References

1. Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, Norman KA. Discovering Event Structure in Continuous Narrative Perception and Memory. Neuron. 2017;95(3):709–721.e5. doi: 10.1016/j.neuron.2017.06.041. - DOI - PMC - PubMed
1. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1). 10.18637/jss.v067.i01
1. Ben-Yakov A, Henson RN. The Hippocampal Film Editor: Sensitivity and Specificity to Event Boundaries in Continuous Experience. Journal of Neuroscience. 2018;38(47):10057–10068. doi: 10.1523/JNEUROSCI.0524-18.2018. - DOI - PMC - PubMed
1. Birnbaum MH. Human Research and Data Collection via the Internet. Annual Review of Psychology. 2004;55(1):803–832. doi: 10.1146/annurev.psych.55.090902.141601. - DOI - PubMed
1. Bläsing, B. E. (2015). Segmentation of dance movement: Effects of expertise, visual familiarity, motor experience and music. Frontiers in Psychology, 5. 10.3389/fpsyg.2014.01500 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Measuring event segmentation: An investigation into the stability of event boundary agreement across groups

Affiliations

Measuring event segmentation: An investigation into the stability of event boundary agreement across groups

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous