Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 22:17:1223687.
doi: 10.3389/fninf.2023.1223687. eCollection 2023.

AngoraPy: A Python toolkit for modeling anthropomorphic goal-driven sensorimotor systems

Affiliations

AngoraPy: A Python toolkit for modeling anthropomorphic goal-driven sensorimotor systems

Tonio Weidler et al. Front Neuroinform. .

Abstract

Goal-driven deep learning increasingly supplements classical modeling approaches in computational neuroscience. The strength of deep neural networks as models of the brain lies in their ability to autonomously learn the connectivity required to solve complex and ecologically valid tasks, obviating the need for hand-engineered or hypothesis-driven connectivity patterns. Consequently, goal-driven models can generate hypotheses about the neurocomputations underlying cortical processing that are grounded in macro- and mesoscopic anatomical properties of the network's biological counterpart. Whereas, goal-driven modeling is already becoming prevalent in the neuroscience of perception, its application to the sensorimotor domain is currently hampered by the complexity of the methods required to train models comprising the closed sensation-action loop. This paper describes AngoraPy, a Python library that mitigates this obstacle by providing researchers with the tools necessary to train complex recurrent convolutional neural networks that model the human sensorimotor system. To make the technical details of this toolkit more approachable, an illustrative example that trains a recurrent toy model on in-hand object manipulation accompanies the theoretical remarks. An extensive benchmark on various classical, 3D robotic, and anthropomorphic control tasks demonstrates AngoraPy's general applicability to a wide range of tasks. Together with its ability to adaptively handle custom architectures, the flexibility of this toolkit demonstrates its power for goal-driven sensorimotor modeling.

Keywords: anthropomorphic robotics; computational modeling; cortex; deep learning; goal-driven modeling; recurrent convolutional neural networks; reinforcement learning; sensorimotor control.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Schematic interaction between agent and physics simulation. The agent consists of a brain and a body, whereas the physics simulation covers the environment and body. As such, the agent's body constitutes the interface between the brain and the environment. At every time step in the simulation, the environment causes an effect on the body which generates sensory stimulation. The body's sensors read this stimulation and communicate the information to the agent's brain. The brain then maps the description of the perceived state to a motor command which it sends back to the body. The body executes the action and thereby affects the environment as well as itself. With readings of the new sensory state, this cycle recurs until the environment ends the episode, a trajectory of state transitions.
Figure 2
Figure 2
Three different policy distribution classes as described in Section 3.5. (A) Multicategorical distribution. (B) Beta distribution for different combinations of α- and β-values. (C) Gaussian distributions with different means and standard deviations. For equal α and β parameters, the Beta distribution resembles the Gaussian distribution. For diverging parameters, it becomes increasingly skewed. Importantly, the Beta distribution's domain is entirely confined to the interval [0, 1].
Figure 3
Figure 3
Requirements for models trained with AngoraPy. The set of input modalities must be a (sub)set of the modalities available in a Sensation. Outputs must be a policy head output following the desired distribution's shape, followed by a value output head projecting to a single scalar.
Figure 4
Figure 4
Truncated Backpropagation Through Time (TBPTT) during gathering (top) and optimization (bottom). During gathering, the transition sequence is cut off every k elements. If the episode ends before a cutoff point, the remainder of positions up to the next cutoff points is filled with dummy transitions. During optimization, this leads to a dataset of equally sized sequences. Dummy transitions are masked when backpropagating the error. Cutoff points at nonterminal states are handled by copying the last hidden state of the previous sequence into the initial hidden state of the current one.
Figure 5
Figure 5
Analysis of the dynamics in the in-hand object manipulation environment to determine the temporal dependencies between state variables. (Top row) Violin plots depicting the distribution of cross-correlations between singular variables at each lag. (First column) Mean and maximum cross-correlations between singular variables at different lags. (Second column) Mean and maximum autocorrelations (each variable is only compared to itself) at different lags. (Third column) Correlations between state vectors (as opposed to time series of single variables) taken at different lag distances. Dashed red lines mark lag 16, at which the here presented example cuts off gradient propagation to past time steps. All plots are based on 15 i.i.d. time series of 100 time steps that were collected by randomly taking actions in the environment. Standard errors around the mean are indicated by light blue shaded areas.
Figure 6
Figure 6
Distributed Training with MPI. An exemplary depiction of the distributed training cycle comprising gathering and optimization on two compute nodes. Distribution scales to an arbitrary number of nodes. Spawned processes (e.g., on every CPU thread or core) all share the load of gathering information. The GPU(s) on a node is assigned to the lowest-rank process of the node, and all GPU processes share the computational load of optimization. At cycle 0, every process initializes the policy, which is then synced to match the single initial policy on the root process. Every process then rolls out the policy to generate experience up to horizon T. The data is stored in shards. Every optimization process then collects and merges the shards produced by workers on the same node. Based on this data, the optimization process calculates a gradient. The gradients of all optimizers are then reduced into one by averaging and applied as an update to the policy on the root process. πnew is then broadcasted to all processes, which repeat the cycle by rolling out the new policy.
Figure 7
Figure 7
Training of in-hand object manipulation agents using a toy model. Average episode returns (left) and average goals achieved per episode (middle) show the progression of performance over training cycles. The mean performance over 9 independent runs is shown in blue, with a shaded 95% confidence interval. The curve of the best performing agent is shown in red. The distribution (boxplot) of 480 episodes over the number of the consecutive goals reached (right) shows the performance of the agent after training. Red lines indicate the median, white circles indicate means, and red triangles modes.
Figure 8
Figure 8
Learning curves of agents trained on different classical control tasks implemented by Gym (Brockman et al., 2016). All agents use the same two-layer neural network architecture with a Beta/categorical policy head. Weights were not shared between policy and value network. Blue lines represent the average returns of 16 independent agents, and the lighter area around them is their standard deviation.
Figure 9
Figure 9
Benchmark experiments on three-dimensional control tasks simulated in MuJoCo (Todorov et al., 2012) and implemented by Gym (Brockman et al., 2016). Tasks with italic titles are anthropomorphic reaching tasks. In all other tasks, the model controls non-anthropomorphic motor plants. The latter agents use the same two-layer architecture used in Figure 8. For anthropomorphic tasks, a recurrent network with an LSTM cell builds the policy. Lines represent the average returns of 16 independent agents, and the lighter area around them is their standard deviation. Blue lines correspond to agents using a Gaussian policy, whereas red lines summarize the performance of Beta distributed policies.

References

    1. Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., et al. (2016). “TensorFlow: a system for large-scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (Savannah, GA: ), 265–283.
    1. Bradbury J., Frostig R., Hawkins P., Johnson M. J., Leary C., Maclaurin D., et al. (2018). JAX: Composable Transformations of Python+NumPy Programs. Available online at: http://github.com/google/jax
    1. Braver T. S. (2012). The variable nature of cognitive control: a dual mechanisms framework. Trends Cogn. Sci. 16, 106–113. 10.1016/j.tics.2011.12.010 - DOI - PMC - PubMed
    1. Brockman G., Cheung V., Pettersson L., Schneider J., Schulman J., Tang J., et al. (2016). OpenAI Gym. arXiv [Preprint]. 10.48550/arXiv.1606.01540 - DOI
    1. Cadieu C., Kouh M., Pasupathy A., Connor C. E., Riesenhuber M., Poggio T. (2007). A model of V4 shape selectivity and invariance. J. Neurophysiol. 98, 1733–1750. 10.1152/jn.01265.2006 - DOI - PubMed

LinkOut - more resources