Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 3;10(1):13035.
doi: 10.1038/s41598-020-69807-0.

Computational discrimination between natural images based on gaze during mental imagery

Affiliations

Computational discrimination between natural images based on gaze during mental imagery

Xi Wang et al. Sci Rep. .

Abstract

When retrieving image from memory, humans usually move their eyes spontaneously as if the image were in front of them. Such eye movements correlate strongly with the spatial layout of the recalled image content and function as memory cues facilitating the retrieval procedure. However, how close the correlation is between imagery eye movements and the eye movements while looking at the original image is unclear so far. In this work we first quantify the similarity of eye movements between recalling an image and encoding the same image, followed by the investigation on whether comparing such pairs of eye movements can be used for computational image retrieval. Our results show that computational image retrieval based on eye movements during spontaneous imagery is feasible. Furthermore, we show that such a retrieval approach can be generalized to unseen images.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Experimental paradigm and eye movement statistics. (a) Paradigm of one single trial in the experiment. After an initial fixation (of 500 ms) at the center of the display, one image stimulus was presented for 5 s, followed by a noise mask. Briefly after that, observers were asked to recall the image they had just seen for another 5 s. (b) Average of fixation duration in encoding (blue) and recall (orange). X-axis indicates the time and y-axis corresponds to the duration of each fixation. The black curves in the middle of each plot correspond to the average duration within the five seconds and the light coloured areas indicate the center intervals of 50%. (c) Main sequence diagrams during encoding (left) and recalling (right) where peak velocity is plotted as a function of the amplitude of saccades. (d) Histogram of the spatial coverage of all fixations over the 100 stimuli. For each image, one bounding box of all fixations in one sequence is drawn and we compute the percentage of the area covered by the bounding box with respect to the whole screen. The distribution of the spatial coverage of all encoding fixations is drawn in blue and the distribution computed from all recall fixations is drawn in orange. (a contains public domain imagery downloaded from the dataset published in Judd et al.).
Figure 2
Figure 2
Recall-based retrieval using CNN trained with two tasks. The black curve shows the averaged result of all leave-one-out tests. Light areas depict the 50% intervals of all ROC curves.
Figure 3
Figure 3
Generalization to unknown stimuli. ROC curve of retrieval based on matched eye movements during recalling. Blue area depicts the 50% intervals of all ROC curves of each subject.
Figure 4
Figure 4
Area under the curve of each leave-one-out test using CNN. (a) AUC when data of each subject is used for testing. Classification based on encoding eye movements are shown in blue and classification based on recall eye movements are shown in orange. The mean values and standard errors (one standard deviations) of the AUCs are shown in (b).
Figure 5
Figure 5
Top-5 image pairs that are most often confused (a) and never confused (b) in encoding-based retrieval using CNN. (a) Image pairs are sorted by the number of confusions from left to right: the first column shows the two images for which the eye movements during encoding are confused most often. In each pair, the image in first row is classified as the image in the second row. Histograms aggregated over the complete dataset are shown in the third row. Histograms of the images in the first row are show in red and histograms of the images in the second row in green. (b) Examples of the most distinct pairs of images, i.e., images in the first row are never misclassified as images in the second row. Similar to (a), aggregated histograms are plotted in the third row. (Public domain imagery downloaded from the dataset published in Judd et al.).
Figure 6
Figure 6
Top-5 most confused image pairs (in each column) in recall-based retrieval. Images in the first row are most often misclassified as the images below. The frequency of each pair’s confusion goes down from left to right. Notably that interesting scene elements in the images of each pair have very similar layout. For example, the two images in the second column have dominant features in the right half: the boy in the top image and the text and display in the bottom one. In the third pair, a dog and a house are placed similarly to the positions where the boy and the adult are in the bottom image. (Public domain imagery downloaded from the dataset published in Judd et al.).
Figure 7
Figure 7
(a) CNN architectures employed for direct image retrieval and (b) for descriptor learning. Width and height of layers are indicted by the numbers around and below is the number of channels. (c) Each histogram was first fed through the network described in (b) to generate a 16-dimensional descriptor. Two distinct networks (truncated pyramids) were used for encoding and recall respectively. We used two triplet losses for descriptor learning, forcing matching pairs closer to each other than non-matching ones.

References

    1. Moore CS. Control of the memory image. Psychol. Rev. Monogr. Suppl. 1903;4:277–306.
    1. Perky CW. An experimental study of imagination. Am. J. Psychol. 1910;21:422–452. doi: 10.2307/1413350. - DOI
    1. Jacobson E. Electrophysiology of mental activities. Am. J. Psychol. 1932;44:677–694. doi: 10.2307/1414531. - DOI
    1. Neisser U. Cognitive psychology. New York: Appleton-Century-Crofts; 1967.
    1. Hebb DO. Concerning imagery. Psychol. Rev. 1968;75:466. doi: 10.1037/h0026771. - DOI - PubMed

Publication types