Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct:2022:14.
doi: 10.1145/3517428.3544824. Epub 2022 Oct 22.

Blind Users Accessing Their Training Images in Teachable Object Recognizers

Affiliations

Blind Users Accessing Their Training Images in Teachable Object Recognizers

Jonggi Hong et al. ASSETS. 2022 Oct.

Abstract

Teachable object recognizers provide a solution for a very practical need for blind people - instance level object recognition. They assume one can visually inspect the photos they provide for training, a critical and inaccessible step for those who are blind. In this work, we engineer data descriptors that address this challenge. They indicate in real time whether the object in the photo is cropped or too small, a hand is included, the photos is blurred, and how much photos vary from each other. Our descriptors are built into open source testbed iOS app, called MYCam. In a remote user study in (N = 12) blind participants' homes, we show how descriptors, even when error-prone, support experimentation and have a positive impact in the quality of training set that can translate to model performance though this gain is not uniform. Participants found the app simple to use indicating that they could effectively train it and that the descriptors were useful. However, many found the training being tedious, opening discussions around the need for balance between information, time, and cognitive load.

Keywords: blind; machine teaching; object recognition; participatory machine learning; visual impairment.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
A blind participant in our study training the MYCam app in their homes to recognize Lays with real-time descriptors. A dual video conferencing captures participant’s activities via a laptop camera and smart glasses worn by the participant.
Figure 2:
Figure 2:
The user flow of MyCam. MyCam has three main parts: Recognizing an object in the camera view (purple thread), reviewing and editing the information of the objects (red thread), and teaching an object to the model (green thread).
Figure 3:
Figure 3:
The architecture of the MYCam system indicating approaches for estimating the descriptors and recognizing the object.
Figure 4:
Figure 4:
Object stimuli in the study chosen for a challenging fine-grained classification task: Fritos, Cheetos, and Lays.
Figure 5:
Figure 5:
Photos from P10 and manually annotated attributes to be compared with automatically estimated descriptors.
Figure 6:
Figure 6:
Scatter plots indicating correlations between manual annotations (x-axis) and estimations (y-axis) for each descriptor.
Figure 7:
Figure 7:
Contrasting descriptor values in initial attempts to retraining attempts for P1, P3, P5, P8, and P10. Red dots indicate means.
Figure 8:
Figure 8:
The average values of annotated photo-level attributes for individual photos among 12 participants. The charts include photos of the first three training sets (1-30: first set, 31-60: second set, 61-90: third set). The lines are fitted to dots using LOWESS smoothing.
Figure 9:
Figure 9:
The average annotated values of set-level attributes and the annotated number of photos with photo-level attributes for all 12 participants across three training sets (a training set per object).
Figure 10:
Figure 10:
When testing their models, participants’ experiences varied (a), which seems to be reflected in their satisfaction scores (b).
Figure 11:
Figure 11:
Model accuracy when tested on individual test images, aggregated test images from all 12 blind participants in this remote study, and aggregated test images from all 9 blind participants in a prior in-lab study [41].
Figure 12:
Figure 12:
Participants’ feedback on training with the MYCam testbed.
Figure 13:
Figure 13:
Participants’ feedback on the descriptors.
Figure 14:
Figure 14:
Participants’ feedback on the performance of their object recognition models.

Similar articles

Cited by

References

    1. Abdolrahmani Ali, Easley William, Williams Michele, Branham Stacy, and Hurst Amy. 2017. Embracing Errors: Examining How Context of Use Impacts Blind Individuals’ Acceptance of Navigation Aid Errors. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 4158–4169. 10.1145/3025453.3025528 - DOI
    1. Ahmetovic Dragan, Bernareggi Cristian, Gerino Andrea, and Mascetti Sergio. 2014. ZebraRecognizer: Efficient and Precise Localization of Pedestrian Crossings. In 2014 22nd International Conference on Pattern Recognition. 2566–2571. 10.1109/ICPR.2014.443 - DOI
    1. Ahmetovic Dragan, Sato Daisuke, Oh Uran, Ishihara Tatsuya, Kitani Kris, and Asakawa Chieko. 2020. ReCog: Supporting Blind People in Recognizing Personal Objects. Association for Computing Machinery, New York, NY, USA, 1–12. 10.1145/3313831.3376143 - DOI
    1. Aira. 2017. Your Life, Your Schedule, Right Now. https://aira.io
    1. Akter Taslima, Dosono Bryan, Ahmed Tousif, Kapadia Apu, and Semaan Bryan. 2020. "I am uncomfortable sharing what I can’t see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1929–1948. https://www.usenix.org/conference/usenixsecurity20/presentation/akter

LinkOut - more resources