Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent
- PMID: 34211258
- PMCID: PMC8240413
- DOI: 10.1007/s11263-020-01404-0
Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent
Abstract
Computer vision algorithms performance are near or superior to humans in the visual problems including object recognition (especially those of fine-grained categories), segmentation, and 3D object reconstruction from 2D views. Humans are, however, capable of higher-level image analyses. A clear example, involving theory of mind, is our ability to determine whether a perceived behavior or action was performed intentionally or not. In this paper, we derive an algorithm that can infer whether the behavior of an agent in a scene is intentional or unintentional based on its 3D kinematics, using the knowledge of self-propelled motion, Newtonian motion and their relationship. We show how the addition of this basic knowledge leads to a simple, unsupervised algorithm. To test the derived algorithm, we constructed three dedicated datasets from abstract geometric animation to realistic videos of agents performing intentional and non-intentional actions. Experiments on these datasets show that our algorithm can recognize whether an action is intentional or not, even without training data. The performance is comparable to various supervised baselines quantitatively, with sensible intentionality segmentation qualitatively.
Keywords: Action Recognition; Commonsense; Intent; Theory of Mind; Unsupervised.
Figures








References
-
- Aditya S, Yang Y, Baral C, Fermuller C, Aloimonos Y: Visual commonsense for scene understanding using perception, semantic parsing and reasoning. In: 2015 AAAI Spring Symposium Series (2015)
-
- Aristotle F: The art of rhetoric, vol. 2. Harvard University Press; Cambridge, MA: (1926)
-
- Bahdanau D, Cho K, Bengio Y: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
-
- Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In: arXiv preprint arXiv:1812.08008 (2018) - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources