. 2020 Aug 3;10(1):13035.

doi: 10.1038/s41598-020-69807-0.

Computational discrimination between natural images based on gaze during mental imagery

Xi Wang¹, Andreas Ley¹, Sebastian Koch¹, James Hays², Kenneth Holmqvist^{3

4}, Marc Alexa⁵

Affiliations

¹ Faculty IV - Electrical Engineering and Computer Science, TU Berlin, Berlin, 10587, Germany.
² School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
³ Department of Psychology, Universität Regensburg, 93053, Regensburg, Germany.
⁴ Department of Psychology, Nicolaus Copernicus University, 87-100, Toruń, Poland.
⁵ Faculty IV - Electrical Engineering and Computer Science, TU Berlin, Berlin, 10587, Germany. marc.alexa@tu-berlin.de.

PMID: 32747683
PMCID: PMC7400610
DOI: 10.1038/s41598-020-69807-0

Computational discrimination between natural images based on gaze during mental imagery

Xi Wang et al. Sci Rep. 2020.

. 2020 Aug 3;10(1):13035.

doi: 10.1038/s41598-020-69807-0.

Authors

Xi Wang¹, Andreas Ley¹, Sebastian Koch¹, James Hays², Kenneth Holmqvist^{3

4}, Marc Alexa⁵

Affiliations

¹ Faculty IV - Electrical Engineering and Computer Science, TU Berlin, Berlin, 10587, Germany.
² School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
³ Department of Psychology, Universität Regensburg, 93053, Regensburg, Germany.
⁴ Department of Psychology, Nicolaus Copernicus University, 87-100, Toruń, Poland.
⁵ Faculty IV - Electrical Engineering and Computer Science, TU Berlin, Berlin, 10587, Germany. marc.alexa@tu-berlin.de.

PMID: 32747683
PMCID: PMC7400610
DOI: 10.1038/s41598-020-69807-0

Abstract

When retrieving image from memory, humans usually move their eyes spontaneously as if the image were in front of them. Such eye movements correlate strongly with the spatial layout of the recalled image content and function as memory cues facilitating the retrieval procedure. However, how close the correlation is between imagery eye movements and the eye movements while looking at the original image is unclear so far. In this work we first quantify the similarity of eye movements between recalling an image and encoding the same image, followed by the investigation on whether comparing such pairs of eye movements can be used for computational image retrieval. Our results show that computational image retrieval based on eye movements during spontaneous imagery is feasible. Furthermore, we show that such a retrieval approach can be generalized to unseen images.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Experimental paradigm and eye movement statistics. (a) Paradigm of one single trial in the experiment. After an initial fixation (of 500 ms) at the center of the display, one image stimulus was presented for 5 s, followed by a noise mask. Briefly after that, observers were asked to recall the image they had just seen for another 5 s. (b) Average of fixation duration in encoding (blue) and recall (orange). X-axis indicates the time and y-axis corresponds to the duration of each fixation. The black curves in the middle of each plot correspond to the average duration within the five seconds and the light coloured areas indicate the center intervals of 50%. (c) Main sequence diagrams during encoding (left) and recalling (right) where peak velocity is plotted as a function of the amplitude of saccades. (d) Histogram of the spatial coverage of all fixations over the 100 stimuli. For each image, one bounding box of all fixations in one sequence is drawn and we compute the percentage of the area covered by the bounding box with respect to the whole screen. The distribution of the spatial coverage of all encoding fixations is drawn in blue and the distribution computed from all recall fixations is drawn in orange. (a contains public domain imagery downloaded from the dataset published in Judd et al.).

**Figure 2**
Recall-based retrieval using CNN trained with two tasks. The black curve shows the averaged result of all leave-one-out tests. Light areas depict the $50 %$ intervals of all ROC curves.

**Figure 3**
Generalization to unknown stimuli. ROC curve of retrieval based on matched eye movements during recalling. Blue area depicts the 50% intervals of all ROC curves of each subject.

**Figure 4**
Area under the curve of each leave-one-out test using CNN. (a) AUC when data of each subject is used for testing. Classification based on encoding eye movements are shown in blue and classification based on recall eye movements are shown in orange. The mean values and standard errors (one standard deviations) of the AUCs are shown in (b).

**Figure 5**
Top-5 image pairs that are most often confused (a) and never confused (b) in encoding-based retrieval using CNN. (a) Image pairs are sorted by the number of confusions from left to right: the first column shows the two images for which the eye movements during encoding are confused most often. In each pair, the image in first row is classified as the image in the second row. Histograms aggregated over the complete dataset are shown in the third row. Histograms of the images in the first row are show in red and histograms of the images in the second row in green. (b) Examples of the most distinct pairs of images, i.e., images in the first row are never misclassified as images in the second row. Similar to (a), aggregated histograms are plotted in the third row. (Public domain imagery downloaded from the dataset published in Judd et al.).

**Figure 6**
Top-5 most confused image pairs (in each column) in recall-based retrieval. Images in the first row are most often misclassified as the images below. The frequency of each pair’s confusion goes down from left to right. Notably that interesting scene elements in the images of each pair have very similar layout. For example, the two images in the second column have dominant features in the right half: the boy in the top image and the text and display in the bottom one. In the third pair, a dog and a house are placed similarly to the positions where the boy and the adult are in the bottom image. (Public domain imagery downloaded from the dataset published in Judd et al.).

**Figure 7**
(a) CNN architectures employed for direct image retrieval and (b) for descriptor learning. Width and height of layers are indicted by the numbers around and below is the number of channels. (c) Each histogram was first fed through the network described in (b) to generate a 16-dimensional descriptor. Two distinct networks (truncated pyramids) were used for encoding and recall respectively. We used two triplet losses for descriptor learning, forcing matching pairs closer to each other than non-matching ones.

See this image and copyright information in PMC

References

1. Moore CS. Control of the memory image. Psychol. Rev. Monogr. Suppl. 1903;4:277–306.
1. Perky CW. An experimental study of imagination. Am. J. Psychol. 1910;21:422–452. doi: 10.2307/1413350. - DOI
1. Jacobson E. Electrophysiology of mental activities. Am. J. Psychol. 1932;44:677–694. doi: 10.2307/1414531. - DOI
1. Neisser U. Cognitive psychology. New York: Appleton-Century-Crofts; 1967.
1. Hebb DO. Concerning imagery. Psychol. Rev. 1968;75:466. doi: 10.1037/h0026771. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational discrimination between natural images based on gaze during mental imagery

Affiliations

Computational discrimination between natural images based on gaze during mental imagery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources