Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots
- PMID: 24324448
- PMCID: PMC3840616
- DOI: 10.3389/fpsyg.2013.00833
Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots
Abstract
A reinforcement learning agent that autonomously explores its environment can utilize a curiosity drive to enable continual learning of skills, in the absence of any external rewards. We formulate curiosity-driven exploration, and eventual skill acquisition, as a selective sampling problem. Each environment setting provides the agent with a stream of instances. An instance is a sensory observation that, when queried, causes an outcome that the agent is trying to predict. After an instance is observed, a query condition, derived herein, tells whether its outcome is statistically known or unknown to the agent, based on the confidence interval of an online linear classifier. Upon encountering the first unknown instance, the agent "queries" the environment to observe the outcome, which is expected to improve its confidence in the corresponding predictor. If the environment is in a setting where all instances are known, the agent generates a plan of actions to reach a new setting, where an unknown instance is likely to be encountered. The desired setting is a self-generated goal, and the plan of action, essentially a program to solve a problem, is a skill. The success of the plan depends on the quality of the agent's predictors, which are improved as mentioned above. For validation, this method is applied to both a simulated and real Katana robot arm in its "blocks-world" environment. Results show that the proposed method generates sample-efficient curious exploration behavior, which exhibits developmental stages, continual learning, and skill acquisition, in an intrinsically-motivated playful agent.
Keywords: AI planning; artificial curiosity; continual learning; developmental robotics; intrinsic motivation; markov decision processes; online active learning; systematic exploration.
Figures
References
-
- Asada M., Hosoda K., Kuniyoshi Y., Ishiguro H., Inui T., Yoshikawa Y., et al. (2009). Cognitive developmental robotics: a survey. IEEE Trans. Auton. Ment. Dev. 1, 12–34 10.1109/TAMD.2009.2021702 - DOI
-
- Atlas L. E., Cohn D. A., Ladner R. E. (1989). Training connectionist networks with queries and selective sampling, in Advances in Neural Information Processing Systems 2, ed Touretzky D. S. (Denver, CO: Morgan Kaufmann Publishers Inc.), 566–573
-
- Auer P. (2003). Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 10.1162/153244303321897663 - DOI
-
- Azoury K. S., Warmuth M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. J. Mach. Learn. Res. 43, 211–246 10.1023/A:1010896012157 - DOI
-
- Barto A., Singh S., Chentanez N. (2004). Intrinsically motivated learning of hierarchical collections of skills, in Proceedings of International Conference on Development and Learning (ICDL), (San Diego, CA: ), 112–119
LinkOut - more resources
Full Text Sources
Other Literature Sources
