. 2021 May 3;21(5):9.

doi: 10.1167/jov.21.5.9.

A deep-learning framework for human perception of abstract art composition

Pierre Lelièvre^{1

2

3}, Peter Neri^{2

4

5}

Affiliations

¹ Laboratoire des systèmes perceptifs, Département d'études cognitives Science Arts Création Recherche (EA 7410), Paris, France.
² École normale supérieure, PSL University, CNRS, Paris, France.
³ contact@plelievre.com http://plelievre.com.
⁴ Laboratoire des systémes perceptifs, Département d'études cognitives, Paris, France.
⁵ neri.peter@googlemail.com https://sites.google.com/site/neripeter/.

PMID: 33974037
PMCID: PMC8114002
DOI: 10.1167/jov.21.5.9

A deep-learning framework for human perception of abstract art composition

Pierre Lelièvre et al. J Vis. 2021.

. 2021 May 3;21(5):9.

doi: 10.1167/jov.21.5.9.

Authors

Pierre Lelièvre^{1

2

3}, Peter Neri^{2

4

5}

Affiliations

¹ Laboratoire des systèmes perceptifs, Département d'études cognitives Science Arts Création Recherche (EA 7410), Paris, France.
² École normale supérieure, PSL University, CNRS, Paris, France.
³ contact@plelievre.com http://plelievre.com.
⁴ Laboratoire des systémes perceptifs, Département d'études cognitives, Paris, France.
⁵ neri.peter@googlemail.com https://sites.google.com/site/neripeter/.

PMID: 33974037
PMCID: PMC8114002
DOI: 10.1167/jov.21.5.9

Abstract

Artistic composition (the structural organization of pictorial elements) is often characterized by some basic rules and heuristics, but art history does not offer quantitative tools for segmenting individual elements, measuring their interactions and related operations. To discover whether a metric description of this kind is even possible, we exploit a deep-learning algorithm that attempts to capture the perceptual mechanism underlying composition in humans. We rely on a robust behavioral marker with known relevance to higher-level vision: orientation judgements, that is, telling whether a painting is hung "right-side up." Humans can perform this task, even for abstract paintings. To account for this finding, existing models rely on "meaningful" content or specific image statistics, often in accordance with explicit rules from art theory. Our approach does not commit to any such assumptions/schemes, yet it outperforms previous models and for a larger database, encompassing a wide range of painting styles. Moreover, our model correctly reproduces human performance across several measurements from a new web-based experiment designed to test whole paintings, as well as painting fragments matched to the receptive-field size of different depths in the model. By exploiting this approach, we show that our deep learning model captures relevant characteristics of human orientation perception across styles and granularities. Interestingly, the more abstract the painting, the more our model relies on extended spatial integration of cues, a property supported by deeper layers.

PubMed Disclaimer

Figures

**Figure 1.**
Gallery of genres and styles mentioned throughout the paper. Ordering is chronological. (*Mona Lisa* by *Leonardo da Vinci* (1503-1519), *Still-Life with Drinking-Horn* by *Willem Kalf* (1653), *The Meeting (Bonjour Monsieur Courbet)* by *Gustave Courbet* (1854), *Argenteuil seen from the small arm of the Seine* by *Claude Monet* (1872), *Young Girls on the Edge of the Sea* by *Pierre Puvis de Chavannes* (1879), *The Scream* by *Edvard Munch* (1893), *Seated man with his arms crossed* by *Pablo Picasso* (1915), *Komposition VII* by *Wassily Kandinsky* (1913), *A Naturalist's Study* by *Pierre Roy* (1928)).

**Figure 2.**
Schematic architecture of the multilevel orientation classification model employed in this study. Each of five convolutional blocks is associated with a classifier (indicated by classifier-n with n = 1 to 5). The output dimensionality of each classifier is indicated by (x, x, 4), where x is the number of samples across each spatial dimension (see density of circle array within insets overlaying local filters onto painting), and 4 is the number of orientation labels {up,90,180,270}. The four values within [ ] show one example of the categorical distribution generated by the network for *Komposition VIII* by *Wassily Kandinsky* (1923). In the legend, k/s stand for kernel/stride size.

**Figure 3.**
Effect of median filtering on network attention, visualized through guided error back-propagation. Error map is inverted and thresholded for legibility. Light gray indicates pixels where attention reaches at least 1% of its maximum (moderate attention); dark gray indicates pixels where it exceeds 10% (high attention). (a) shows original images used for training. (b) shows directed attention in the absence of median filtering applied to the borders, (c) in the presence of median filtering. Two examples by *Paul Klee* are shown: *The Place of the Twins* (1929) and *After Annealing* (1940).

**Figure 4.**
Model performance on whole paintings grouped by genre (a) and style (b).

**Figure 5.**
Network attention through guided error back-propagation (see Methods). (a) Five examples of original inputs for validation (*Komposition VII* by *Wassily Kandinsky* (1913), *Still-Life with Drinking-Horn* by *Willem Kalf* (1653), *Argenteuil seen from the small arm of the Seine* by *Claude Monet* (1872), *The Meeting (Bonjour Monsieur Courbet)* by *Gustave Courbet* (1854), *Mona Lisa* by *Leonardo da Vinci* (1503-1519)). (b) Error maps with inverted and thresholded intensity. Light gray indicates pixels where attention reaches at least 1% of its maximum (moderate attention); dark gray indicates pixels where it exceeds 10% (high attention). Numeric values report light and dark pixel percentages over the entire painting surface. (c) Average surface ratio of high attention, plotted separately for different genres.

**Figure 6.**
Model performance across classifiers. Values are grouped by style (as in Figure 4b) and displayed separately for the five distinct classifiers. (b) plots values from (a) after rescaling between chance and maximum value for given style (corresponding to performance of classifier-5).

**Figure 7.**
Predicted orientations from individual receptive field units within each classifier. Different classifiers (1–5) are plotted from left to right. Relative size of the four wedges within each circle reflects prediction strength across the four different orientations. Examples are shown for three paintings (dates given when known): *Argenteuil seen from the small arm of the Seine* by *Claude Monet* (1872), *The Waterfall of Amida behind the Kiso Road* by *Katsushika Hokusai*, *After Annealing* by *Paul Klee* (1940).

**Figure 8.**
Redundancy between adjacent classifiers, grouped by style. This metric corresponds to rescaled cross-entropy between classifier distributions at level $n$ and those at level $n + 1$ (see Methods). Values are averaged across fragments. Along x axis, c.p. stands for ceiling performance.

**Figure 9.**
Human versus model performance for whole paintings and fragments. In (a), model performance from classifier-5 is plotted alongside human performance on whole paintings (dark versus light bars, respectively), grouped by style. In (**b-c**), model performance from different classifiers (1–5) is plotted alongside human performance on image fragments, separately for abstract (b) and figurative styles (c).

**Figure 10.**
Normalized frequency of incorrectly predicted orientations across classifiers for model, all styles (a) and abstract style (b); across fragment size for humans, abstract styles (c). *eq.* stands for equi-frequency. Examples are shown for three paintings: *Argenteuil seen from the small arm of the Seine* by *Claude Monet* (1872), *Komposition VII* by *Wassily Kandinsky* (1913), *Komposition VIII* by *Wassily Kandinsky* (1923).

**Figure 11.**
Density distribution of joint orientation choices generated by model and humans for individual abstract paintings, computed separately for different fragment-size/classifier from small/early (a) to large/late (e). Diagonal values correspond with matching responses (humans and model generate the same response); the diagonal sum (indicated by large white digits) is therefore termed “mutual agreement.” Its value is z-scored against the null hypothesis of human/model independence of choices (see main text for clarification). Intensity of white digits and thickness of diagonal orange line scale with corresponding z score. Bottom-left value reports agreement on target orientation.

**Figure 12.**
Comparison between our model and the results reported by Mather (2012). (a) The average human and model performance. The original article reports human mean performance per painting. This quantity is not directly comparable to top-1 accuracy of the model, because the latter does not reflect the level of uncertainty for each painting. We have therefore chosen to plot the raw prediction value for the correct orientation as the model metric to plot against human performance (b).

**Figure 13.**
Painting-by-painting human agreement with network model (top), the artists who painted the images used in our study (middle), and other humans from our sample of participants (bottom). This analysis was restricted to abstract material.

See this image and copyright information in PMC

References

1. Arnheim, R. (2004). Art and Visual Perception – A Psychology of the Creative Eye (2nd edition, 50th Anniversary). Berkeley: University of California Press. (Original work published 1954).
1. Chang, D. H., & Troje, N. F. (2009). Acceleration carries the local inversion effect in biological motion perception. Journal of Vision, 9(1), 1–17. - PubMed
1. Cusack, J. P., Williams, J. H., & Neri, P. (2015). Action perception is intact in autism spectrum disorder. Journal of Neuroscience, 35(5), 1849–1857. - PMC - PubMed
1. Cuzick, J. (1985). A wilcoxon-type test for trend. Statistics in Medicine, 4(1), 87–90, 10.1002/sim.4780040112. - DOI - PubMed
1. Devue, C., & Barsics, C. (2016). Outlining face processing skills of portrait artists: Perceptual experience with faces predicts performance. Vision Research, 127, 92–103. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A deep-learning framework for human perception of abstract art composition

Affiliations

A deep-learning framework for human perception of abstract art composition

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources