That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Vidullan Surendran¹, Alan R Wagner²

Affiliations

¹ 301C Engineering Unit C, Pennsylvania State University, University Park, 16801 USA.
² 234B Hammond Building, Pennsylvania State University, University Park, 16801 USA.

PMID: 36530466
PMCID: PMC9735099
DOI: 10.1007/s10514-022-10074-5

That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Vidullan Surendran et al. Auton Robots. 2023.

. 2023;47(2):249-265.

doi: 10.1007/s10514-022-10074-5. Epub 2022 Dec 3.

Authors

Vidullan Surendran¹, Alan R Wagner²

Affiliations

¹ 301C Engineering Unit C, Pennsylvania State University, University Park, 16801 USA.
² 234B Hammond Building, Pennsylvania State University, University Park, 16801 USA.

PMID: 36530466
PMCID: PMC9735099
DOI: 10.1007/s10514-022-10074-5

Abstract

Recognising intent in collaborative human robot tasks can improve team performance and human perception of robots. Intent can differ from the observed outcome in the presence of mistakes which are likely in physically dynamic tasks. We created a dataset of 1227 throws of a ball at a target from 10 participants and observed that 47% of throws were mistakes with 16% completely missing the target. Our research leverages facial images capturing the person's reaction to the outcome of a throw to predict when the resulting throw is a mistake and then we determine the actual intent of the throw. The approach we propose for outcome prediction performs 38% better than the two-stream architecture used previously for this task on front-on videos. In addition, we propose a 1D-CNN model which is used in conjunction with priors learned from the frequency of mistakes to provide an end-to-end pipeline for outcome and intent recognition in this throwing task.

Keywords: Computer vision; Human robot interaction; Intent recognition; Surface cues.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors declare that they have no affiliations with or involvement in any organization or entity with any conflict of interest regarding the subject matter or materials discussed in this manuscript.

Figures

**Fig. 1**
Throwing task setup with relevant dimensions

**Fig. 2**
$3 \times 3$ target grid with dimensions and zone labels

**Fig. 3**
Dual camera setup consisting of the Intel D435 and a Pi camera fitted with a 50 mm lens. The Raspberry Pi 4 used to process images from the Pi Camera can be seen vertically mounted behind the cameras

**Fig. 4**
Probability of *intent* target given observed *outcome* target in the presence of a mistake, i.e. the subject misses the target they aimed at. ‘Missed’ refers to the subject missing the target grid entirely, i.e. did not hit any of the 9 zones

**Fig. 5**
**Top left** graph shows the filtered X,Y values of the throwing wrist along with the composite score for a sample captured by the 0 $^{\circ}$ D435 camera. The score was scaled for graphing and the maximum value was observed at frame 77 denoting the throw frame. **Bottom left** shows the raw, interpolated, and filtered throwing wrist Y coordinate values illustrating the effect of preprocessing discussed in Sect. 4.1. **Right** shows the throw frame from all 6 cameras

**Fig. 6**
**Top** shows the LSTM model used to classify 2D pose data into one of 9 outcome classes. ‘Batch’ refers to the variable data batch size used during training/inference. **Bottom** shows the Multi-branch 1D CNN model used to detect congruence between outcome and intent using features from a pre-trained emotion model. A and B denote two input branches whereas C is the concatenated branch. The layer parameters are shown in Table 4

**Fig. 7**
Mean accuracy over the 5-folds for each target zone. Ordered left to right showing grids for the 0 $^{\circ}$ , 45 $^{\circ}$ , and 90 $^{\circ}$ D435 camera views

**Fig. 8**
Anonymized image showing the position of the ball in (Li et al., 2020) for each of the 9 target zones when thrown by a single participant. Even with the naked eye one can differentiate between the outcome targets especially which column of the target grid the ball might strike. Image at the top left represents target zone 1, while the image at the bottom right shows target zone 9

**Fig. 9**
Position of the ball for each of the 9 target zones for a single participant in our dataset showing the difficulty of determining the outcome target from the ball position in the frame. Image at the top left represents target zone 1, while the image at the bottom right shows target zone 9

See this image and copyright information in PMC

References

1. Akilan T, Wu QJ, Safaei A, Huo J, Yang Y. A 3D CNN-LSTM-based image-to-image foreground segmentation. IEEE Transactions on Intelligent Transportation Systems. 2019;21(3):959–971. doi: 10.1109/TITS.2019.2900426. - DOI
1. Alikhani, M., Khalid, B., Shome, R., Mitash, C., Bekris, K. E., & Stone, M. (2020). That and there: Judging the intent of pointing actions with robotic arms. In AAAI (pp. 10343–10351).
1. Arriaga, O., Valdenegro-Toro, M., & Plöger, P. (2017). Real-time convolutional neural networks for emotion and gender classification. Preprint arXiv:1710.07557
1. Cheuk T. Can AI be racist? Color-evasiveness in the application of machine learning to science assessments. Science Education. 2021;105(5):825–836. doi: 10.1002/sce.21671. - DOI
1. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. doi: 10.1186/s12864-019-6413-7. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Affiliations

That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources