Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 17;119(20):e2117184119.
doi: 10.1073/pnas.2117184119. Epub 2022 May 12.

Gaze following requires early visual experience

Affiliations

Gaze following requires early visual experience

Ehud Zohary et al. Proc Natl Acad Sci U S A. .

Abstract

Gaze understanding—a suggested precursor for understanding others’ intentions—requires recovery of gaze direction from the observed person's head and eye position. This challenging computation is naturally acquired at infancy without explicit external guidance, but can it be learned later if vision is extremely poor throughout early childhood? We addressed this question by studying gaze following in Ethiopian patients with early bilateral congenital cataracts diagnosed and treated by us only at late childhood. This sight restoration provided a unique opportunity to directly address basic issues on the roles of “nature” and “nurture” in development, as it caused a selective perturbation to the natural process, eliminating some gaze-direction cues while leaving others still available. Following surgery, the patients’ visual acuity typically improved substantially, allowing discrimination of pupil position in the eye. Yet, the patients failed to show eye gaze-following effects and fixated less than controls on the eyes—two spontaneous behaviors typically seen in controls. Our model for unsupervised learning of gaze direction explains how head-based gaze following can develop under severe image blur, resembling preoperative conditions. It also suggests why, despite acquiring sufficient resolution to extract eye position, automatic eye gaze following is not established after surgery due to lack of detailed early visual experience. We suggest that visual skills acquired in infancy in an unsupervised manner will be difficult or impossible to acquire when internal guidance is no longer available, even when sufficient image resolution for the task is restored. This creates fundamental barriers to spontaneous vision recovery following prolonged deprivation in early age.

Keywords: blind; cataract; gaze; joint attention; vision.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Gaze following development: modeling and findings. (A and B) A diagram describing the model’s necessary requirements for developing head (A) and eye (B) gaze following by observing the actions of another person. In the congenital cataract patients, prior to surgery, the conditions described in A are available, but those in B are not. After the operation, the conditions depicted in both A and B are available, but despite this, eye gaze following is not established. (C and D) Our findings show the capacities for each task during the preoperative and postoperative stage for head (C) and eye (D) gaze following. The behavioral and model results indicate that the self-teaching mechanism for gaze following is unavailable beyond early development. V denotes an intact capacity, X denotes a deficit, and --- denotes an untested capacity; preop, preoperative; postop, postoperative.
Fig. 2.
Fig. 2.
Visual development and spatial acuity of the participants. (A) A scatter plot depicting the individual participants’ visual development. The abscissa denotes the duration of visual deprivation in years (i.e., in the cataract-treated groups, the age at surgery; in controls, 0). The ordinate denotes the years of visual experience (i.e., in the cataract-treated groups, time since surgery; in controls, the age of the participant at testing). (B) The CSF of one late-treated participant tested both before (red circles) and after (yellow circles) surgery. The results illustrate characteristic improvement of spatial vision following surgery. The cutoff frequency is defined as the crossing point of the inverse parabola fitted to the data with the abscissa. (C) Scatter plot showing the postoperative cutoff frequency of the patients at the date of gaze-cueing testing as a function of their preoperative visual acuity. Yellow and red circles indicate late-treated patients who performed the gaze-cueing test after surgery and before surgery, respectively. Late-treated patients (n = 6) that did not pass the inclusion criteria for the gaze-cueing task are depicted by ×. Four late-treated patients (denoted by a superimposed + sign) did not do the CSF test before surgery, and thus their preoperative visual acuity denoted here is their first CSF test result after surgery (<1 mo after surgery). The visual acuity of the early-treated patients (n = 11) was not assessed prior to surgery. Their postoperative cutoff frequency at the time of gaze-cueing testing was always better than the maximum spatial frequency. Their acuity is therefore depicted by the light blue region (above 13.6 cpd). The cutoff frequency for legal blindness (3 cpd according to NIH guidelines) is highlighted by a gray background square. Note that most late-treated patients were legally blind before surgery, but their visual acuity improved substantially after surgery, such that they were no longer considered legally blind.
Fig. 3.
Fig. 3.
Gaze-cueing experiment stimuli, design, and group results. (A) Experimental design of the main gaze-cueing experiments, testing compatibility effect to eye (Left) and head (Right) gaze cues. (B) Example of blurred stimuli seen by controls. (C) Group results for the eye (Left) and the head (Right) direction experiments, depicting the group average cue-compatibility effect (RT of incompatible minus compatible trials) in the control (white), the early-treated (blue), and the late-treated (yellow) groups. Error bars denote SEM. The numbers of participants from each group in the two experiments (N) are indicated at the top. Horizontal bars indicate direct comparisons between group effects. Two asterisks (**) denote statistically significant differences; P < 0.001.
Fig. 4.
Fig. 4.
Eye movement patterns during free viewing. (A) Fixation maps of controls (left column; n = 31) and late-treated participants (middle column; n = 9) during observation of an actor gazing at a target object (depicted, for illustration only, by a red arrow) indicated by head orientation (upper row) or eye position (lower row). Predefined interest areas are depicted by red ellipses (right column). The numbers on each image denote the maximum time spent fixating on a specific position in the image (group average smoothed with a Gaussian kernel of 1°). (Right) Bar plots depicting the mean cued-object preference index ([cue congruent – incongruent]/[congruent + incongruent] fixation dwell times) for the control (white) and the late-treated (yellow) groups. Positive values indicate a fixation preference for the cued object. (B and C) Fixation maps of controls (first column; n = 11) and late-treated participants (second column; n = 9) when observing an image of an exemplary face (B) and people in action (C), respectively. (Right) Bar plots depict the cumulative duration of fixations (dwell time) in each interest area (IA) for the two groups. An asterisk (*) denotes P < 0.05; **, P < 0.005. In all tests, controls viewed a blurred version of the images (smoothed with a Gaussian kernel). Error bars depict SEM.
Fig. 5.
Fig. 5.
Development of gaze following in various blur conditions. (A) Examples of blurred images at various cutoff frequencies (1.7, 0.8, and 0.4 cpd). A mover event (green box) is clearly detectable even at the largest blur (0.4 cpd) in the dynamic sequence, although the hand is difficult to recognize by its own appearance. The gaze direction (yellow arrow) can be interpretable using the head orientation but not the eyes’ gaze. (B) Prediction of gaze direction. The angular error in degrees (deg) between the true and predicted gaze directions of a neural network (Resnet50). Error bars denote SEM. The neural network was trained to predict gaze-direction angle from face images under increasing input blur levels. Mover event locations were used as gaze target positions. Gaze directions were in the range of (−90°, +90°) left to right, respectively. Input images were blurred using Gaussian filters with a cutoff frequency of 15.7 (no blur), 1.7, 0.8, and 0.4 cpd (corresponding to kernel spatial SD of 0, 8, 16, and 32 pixels, respectively).
Fig. 6.
Fig. 6.
Computational discrimination between faces looking left or right. Model discrimination was based on the activation of intermediate layers of the network (Resenet50). (AC) Training. Three networks were trained to identify faces using images of either full faces (head condition) or only the eyes region (eye condition) at various blur levels. Blue arrows indicate the input to each network, and red arrows indicate the network’s development (phases of training) in time. (A) Regimen I: “presurgery” (in black). Training on images at a high-blur level (using a Gaussian filter with cutoff frequency of 0.8 cpd) similar or worse than preoperative conditions. (B) Regimen II: “postsurgery” (in green). Training first on images at a high-blur level (mimicking preoperative vision; cutoff frequency of 0.8 cpd) and then further training on images at low-blur levels (cutoff at 3.3 cpd) similar or worse than the postoperative visual acuity. (C) Regimen III: “control (normal)” (in gray). Training on images at the highest resolution with no blur (cutoff at 17.6 cpd). (D and E) Testing. The networks under the three training regimens were evaluated for left/right discrimination of head orientations (D) or eye directions (E), as seen in B. Bar colors correspond to the three regimens (black, high blur; green, high then low blur; gray, no blur). Note that head orientation/eye direction were not explicitly learned during training (to identify faces). The input for testing was either highly or moderately blurred (0.8 or 3.3 cpd, respectively). Error bars represent SE over test images. Chance level is 50%.

References

    1. Brooks R., Meltzoff A. N., “Gaze following: A mechanism for building social connections between infants and adults” in Mechanisms of Social Connection: From Brain to Group, Mikulincer M., Shaver P. R., Eds. (American Psychological Association, 2013), pp. 167–183.
    1. Moore C., Corkum V., Social understanding at the end of the first year of life. Dev. Rev. 14, 349–372 (1994).
    1. Brooks R., Meltzoff A. N., The importance of eyes: How infants interpret adult looking behavior. Dev. Psychol. 38, 958–966 (2002). - PMC - PubMed
    1. Suzuki M., Izawa A., Takahashi K., Yamazaki Y., The coordination of eye, head, and arm movements during rapid gaze orienting and arm pointing. Exp. Brain Res. 184, 579–585 (2008). - PubMed
    1. Tomasello M., Hare B., Lehmann H., Call J., Reliance on head versus eyes in the gaze following of great apes and human infants: The cooperative eye hypothesis. J. Hum. Evol. 52, 314–320 (2007). - PubMed

Publication types

LinkOut - more resources