Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May:2019:336.
doi: 10.1145/3290605.3300566.

Hands Holding Clues for Object Recognition in Teachable Machines

Affiliations

Hands Holding Clues for Object Recognition in Teachable Machines

Kyungjun Lee et al. Proc SIGCHI Conf Hum Factor Comput Syst. 2019 May.

Abstract

Camera manipulation confounds the use of object recognition applications by blind people. This is exacerbated when photos from this population are also used to train models, as with teachable machines, where out-of-frame or partially included objects against cluttered backgrounds degrade performance. Leveraging prior evidence on the ability of blind people to coordinate hand movements using proprioception, we propose a deep learning system that jointly models hand segmentation and object localization for object classification. We investigate the utility of hands as a natural interface for including and indicating the object of interest in the camera frame. We confirm the potential of this approach by analyzing existing datasets from people with visual impairments for object recognition. With a new publicly available egocentric dataset and an extensive error analysis, we provide insights into this approach in the context of teachable recognizers.

Keywords: blind; egocentric; hand; k-shot learning; object recognition.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
An illustration of our hand-guided object recognition approach on an example from our egocentric dataset. Given a photo of an object in proximity to a hand, it first identifies the hand and then estimates the object center, which is then cropped and passed to the recognition model.
Figure 2:
Figure 2:
In our approach, a hand segmentation model (Step I) is fine-tuned to estimate the center of the object in proximity to the hand (Step II). A bounding box, placed in that center is used to isolate the object and crop the image, which is then passed to the object classification model (Step III).
Figure 3:
Figure 3:
An (input, annotation, output) example for our hand segmentation (a) and object localization (b) models.
Figure 4:
Figure 4:
Examples from each dataset. Glassense-Vision, VizWiz, and our benchmark examples are selected to include hands.
Figure 5:
Figure 5:
Nineteen objects used in our data collection. Objects in the same category are displayed in proximity.
Figure 6:
Figure 6:
Positive and negative outputs of our object localization model on the Glassense-Vision and VizWiz datasets.
Figure 7:
Figure 7:
Our hand-guided object recognition method (CO) tends to improve recognition accuracy on average for S and B compared to the original HO and O methods.
Figure 8:
Figure 8:
Accuracy gain of our method (CO) over HO and O is more pertinent in cluttered backgrounds (wild).
Figure 9:
Figure 9:
On average CO outperforms HO and O consistently across training sample sizes k = 1, 5, 20.
Figure 10:
Figure 10:
Presence of hands (HO and CO) seems to have a different efect for generic vs. teachable models.
Figure 11:
Figure 11:
Positive and negative results on TEgO, with outof-frame hands for some of the negative examples.
Figure 12:
Figure 12:
Confusion matrix for the CO models showing that misclassifcation occurs within objects of similar shape. Cans and bottles are indicated as “-c” and “-b”, respectively.

Similar articles

Cited by

References

    1. Adams Dustin, Kurniawan Sri, Herrera Cynthia, Kang Veronica, and Friedman Natalie. 2016. Blind photographers and VizSnap: A long-term study. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility. ACM, 201–208.
    1. Adams Dustin, Morales Lourdes, and Kurniawan Sri. 2013. A qualitative study to support a blind photography mobile application. In Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments. ACM, 25.
    1. Ahmed Tousif, Hoyle Roberto, Connelly Kay, Crandall David, and Kapadia Apu. 2015. Privacy concerns and behaviors of people with visual impairments. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 3523–3532.
    1. Envision AI. 2018. Enabling vision for the blind. https://www.letsenvision.com
    1. Seeing AI. 2017. A free app that narrates the world around you. https://www.microsoft.com/en-us/seeing-ai

LinkOut - more resources