Blind Users Accessing Their Training Images in Teachable Object Recognizers

Jonggi Hong¹, Jaina Gandhi², Ernest Essuah Mensah², Farnaz Zamiri Zeraati², Ebrima Haddy Jarjue², Kyungjun Lee², Hernisa Kacorri²

Affiliations

PMID: 36916963
PMCID: PMC10008526
DOI: 10.1145/3517428.3544824

Blind Users Accessing Their Training Images in Teachable Object Recognizers

Jonggi Hong et al. ASSETS. 2022 Oct.

. 2022 Oct:2022:14.

doi: 10.1145/3517428.3544824. Epub 2022 Oct 22.

Authors

Jonggi Hong¹, Jaina Gandhi², Ernest Essuah Mensah², Farnaz Zamiri Zeraati², Ebrima Haddy Jarjue², Kyungjun Lee², Hernisa Kacorri²

Affiliations

¹ Smith-Kettlewell Eye Research Institute, San Francisco, United States.
² University of Maryland, College Park, United States.

PMID: 36916963
PMCID: PMC10008526
DOI: 10.1145/3517428.3544824

Abstract

Teachable object recognizers provide a solution for a very practical need for blind people - instance level object recognition. They assume one can visually inspect the photos they provide for training, a critical and inaccessible step for those who are blind. In this work, we engineer data descriptors that address this challenge. They indicate in real time whether the object in the photo is cropped or too small, a hand is included, the photos is blurred, and how much photos vary from each other. Our descriptors are built into open source testbed iOS app, called MYCam. In a remote user study in (N = 12) blind participants' homes, we show how descriptors, even when error-prone, support experimentation and have a positive impact in the quality of training set that can translate to model performance though this gain is not uniform. Participants found the app simple to use indicating that they could effectively train it and that the descriptors were useful. However, many found the training being tedious, opening discussions around the need for balance between information, time, and cognitive load.

Keywords: blind; machine teaching; object recognition; participatory machine learning; visual impairment.

PubMed Disclaimer

Figures

**Figure 1:**
A blind participant in our study training the MYCam app in their homes to recognize Lays with real-time descriptors. A dual video conferencing captures participant’s activities via a laptop camera and smart glasses worn by the participant.

**Figure 2:**
The user flow of MyCam. MyCam has three main parts: Recognizing an object in the camera view (purple thread), reviewing and editing the information of the objects (red thread), and teaching an object to the model (green thread).

**Figure 3:**
The architecture of the MYCam system indicating approaches for estimating the descriptors and recognizing the object.

**Figure 4:**
Object stimuli in the study chosen for a challenging fine-grained classification task: Fritos, Cheetos, and Lays.

**Figure 5:**
Photos from P10 and manually annotated attributes to be compared with automatically estimated descriptors.

**Figure 6:**
Scatter plots indicating correlations between manual annotations (x-axis) and estimations (y-axis) for each descriptor.

**Figure 7:**
Contrasting descriptor values in initial attempts to retraining attempts for P1, P3, P5, P8, and P10. Red dots indicate means.

**Figure 8:**
The average values of annotated photo-level attributes for individual photos among 12 participants. The charts include photos of the first three training sets (1-30: first set, 31-60: second set, 61-90: third set). The lines are fitted to dots using LOWESS smoothing.

**Figure 9:**
The average annotated values of set-level attributes and the annotated number of photos with photo-level attributes for all 12 participants across three training sets (a training set per object).

**Figure 10:**
When testing their models, participants’ experiences varied (a), which seems to be reflected in their satisfaction scores (b).

**Figure 11:**
Model accuracy when tested on individual test images, aggregated test images from all 12 blind participants in this remote study, and aggregated test images from all 9 blind participants in a prior in-lab study [41].

**Figure 12:**
Participants’ feedback on training with the MYCam testbed.

**Figure 13:**
Participants’ feedback on the descriptors.

**Figure 14:**
Participants’ feedback on the performance of their object recognition models.

See this image and copyright information in PMC

References

1. Abdolrahmani Ali, Easley William, Williams Michele, Branham Stacy, and Hurst Amy. 2017. Embracing Errors: Examining How Context of Use Impacts Blind Individuals’ Acceptance of Navigation Aid Errors. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 4158–4169. 10.1145/3025453.3025528 - DOI
1. Ahmetovic Dragan, Bernareggi Cristian, Gerino Andrea, and Mascetti Sergio. 2014. ZebraRecognizer: Efficient and Precise Localization of Pedestrian Crossings. In 2014 22nd International Conference on Pattern Recognition. 2566–2571. 10.1109/ICPR.2014.443 - DOI
1. Ahmetovic Dragan, Sato Daisuke, Oh Uran, Ishihara Tatsuya, Kitani Kris, and Asakawa Chieko. 2020. ReCog: Supporting Blind People in Recognizing Personal Objects. Association for Computing Machinery, New York, NY, USA, 1–12. 10.1145/3313831.3376143 - DOI
1. Aira. 2017. Your Life, Your Schedule, Right Now. https://aira.io
1. Akter Taslima, Dosono Bryan, Ahmed Tousif, Kapadia Apu, and Semaan Bryan. 2020. "I am uncomfortable sharing what I can’t see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1929–1948. https://www.usenix.org/conference/usenixsecurity20/presentation/akter

Grants and funding

90REGE0008/ACL/ACL HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Blind Users Accessing Their Training Images in Teachable Object Recognizers

Affiliations

Blind Users Accessing Their Training Images in Teachable Object Recognizers

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources