A comparison of visual and auditory EEG interfaces for robot multi-stage task control

Kai Arulkumaran¹, Marina Di Vincenzo¹, Rousslan Fernand Julien Dossa¹, Shogo Akiyama¹, Dan Ogawa Lillrank¹, Motoshige Sato¹, Kenichi Tomeoka¹, Shuntaro Sasai¹

Affiliations

PMID: 38783889
PMCID: PMC11111866
DOI: 10.3389/frobt.2024.1329270

A comparison of visual and auditory EEG interfaces for robot multi-stage task control

Kai Arulkumaran et al. Front Robot AI. 2024.

. 2024 May 9:11:1329270.

doi: 10.3389/frobt.2024.1329270. eCollection 2024.

Authors

Kai Arulkumaran¹, Marina Di Vincenzo¹, Rousslan Fernand Julien Dossa¹, Shogo Akiyama¹, Dan Ogawa Lillrank¹, Motoshige Sato¹, Kenichi Tomeoka¹, Shuntaro Sasai¹

Affiliation

¹ Araya Inc., Tokyo, Japan.

PMID: 38783889
PMCID: PMC11111866
DOI: 10.3389/frobt.2024.1329270

Abstract

Shared autonomy holds promise for assistive robotics, whereby physically-impaired people can direct robots to perform various tasks for them. However, a robot that is capable of many tasks also introduces many choices for the user, such as which object or location should be the target of interaction. In the context of non-invasive brain-computer interfaces for shared autonomy-most commonly electroencephalography-based-the two most common choices are to provide either auditory or visual stimuli to the user-each with their respective pros and cons. Using the oddball paradigm, we designed comparable auditory and visual interfaces to speak/display the choices to the user, and had users complete a multi-stage robotic manipulation task involving location and object selection. Users displayed differing competencies-and preferences-for the different interfaces, highlighting the importance of considering modalities outside of vision when constructing human-robot interfaces.

Keywords: brain-computer interface; human-robot interaction; imitation learning; multitask learning; shared autonomy.

PubMed Disclaimer

Conflict of interest statement

Authors KA, MD, RJ, SA, DL, MS, KT, and SS were employed by Araya Inc.

Figures

**FIGURE 1**
**(A)** Experimental layout. The user with the EEG cap sits in front of a small table facing the robot, with a display (for the visual P300 interface only). The robot is behind a table with chest of drawers and three kitchen objects: a cup, a spoon, and a bottle. **(B)** Multi-stage task control flow. After the robot environment is initialised, the robot provides a set of actions that can be performed. These actions are used to create the user interface. After an action has been decoded by the user interface, the robot performs the chosen action, and then provides the next set of actions. This generic process can be repeated; however, in our particular task there are two decision points for the user (pick a drawer to open, and item to pick and place), after which the robot performs a third and final action (close the drawer) autonomously.

**FIGURE 2**
EEG decoding results: hatched bars correspond to decoder training, and plain bars correspond to online decoding. The average balanced accuracy, precision and accuracy across users was 0.50 ± 0.05, 0.17 ± 0.03 and 0.47 ± 0.15 for the visual interface, and 0.55 ± 0.09, 0.20 ± 0.06 and 0.48 ± 0.19 for the auditory interface, respectively. Average success was 0.23 ± 0.42 for the visual interface, and 0.25 ± 0.43 for the auditory interface. Only one user experienced timeouts (3, on the auditory interface).

**FIGURE 3**
Usability questionnaire results. Users understood the interfaces without much difficulty, but were frustrated with poor decoding accuracy. The most significant difference between the interfaces was the ability to ignore the non-targets, with users finding this particularly difficult with the visual interface.

**FIGURE 4**
PerAct success rates. On average, PerAct had a success rate of 72% on opening the drawer, 78% on closing the drawer, and 45% on picking and placing objects.

See this image and copyright information in PMC

References

1. Ahn M., Brohan A., Brown N., Chebotar Y., Cortes O., David B., et al. (2022). Do as i can, not as i say: grounding language in robotic affordances. Available at: https://arxiv.org/abs/2204.01691.
1. Akinola I., Chen B., Koss J., Patankar A., Varley J., Allen P. (2017). “Task level hierarchical system for bci-enabled shared autonomy,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Birmingham, UK, November, 2017, 219–225.
1. Akiyama S., Lillrank D. O., Arulkumaran K. (2023). “Fine-grained object detection and manipulation with segmentation-conditioned perceiver-actor,” in ICRA Workshop on Pretraining for Robotics, London, United Kingdom, May, 2023.
1. Aljalal M., Ibrahim S., Djemal R., Ko W. (2020). Comprehensive review on brain-controlled mobile robots and robotic arms based on electroencephalography signals. Intell. Serv. Robot. 13, 539–563. 10.1007/s11370-020-00328-5 - DOI
1. Ao J., Wang R., Zhou L., Wang C., Ren S., Wu Y., et al. (2022). “Speecht5: unified-modal encoder-decoder pre-training for spoken language processing,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, May, 2022, 5723–5738.

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparison of visual and auditory EEG interfaces for robot multi-stage task control

Affiliation

A comparison of visual and auditory EEG interfaces for robot multi-stage task control

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Miscellaneous