Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 30;14(1):31649.
doi: 10.1038/s41598-024-80647-0.

Pattern memory cannot be completely and truly realized in deep neural networks

Affiliations

Pattern memory cannot be completely and truly realized in deep neural networks

Tingting Li et al. Sci Rep. .

Abstract

The unknown boundary issue, between superior computational capability of deep neural networks (DNNs) and human cognitive ability, has becoming crucial and foundational theoretical problem in AI evolution. Undoubtedly, DNN-empowered AI capability is increasingly surpassing human intelligence in handling general intelligent tasks. However, the absence of DNN's interpretability and recurrent erratic behavior remain incontrovertible facts. Inspired by perceptual characteristics of human vision on optical illusions, we propose a novel working capability analysis framework for DNNs through innovative cognitive response characteristics on visual illusion images, accompanied with fine adjustable sample image construction strategy. Our findings indicate that, although DNNs can infinitely approximate human-provided empirical standards in pattern classification, object detection and semantic segmentation, they are still unable to truly realize independent pattern memorization. All super cognitive abilities of DNNs purely come from their powerful sample classification performance on similar known scenes. Above discovery establishes a new foundation for advancing artificial general intelligence.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Figure 1
Figure 1
CRVIS. This framework primarily contains generation methods for visual illusion scene images, deep neural networks for cognitive boundary detection and adjustable cognitive response processes. These scene images consist of MNIST-Abutting grating scene, Kanizsa Polygon-Abutting grating scene, ColorMNIST-Abutting grating scene and COCO-Abutting grating scene. The grating images include both horizontal and vertical gratings. White arrows illustrate the connections between different types of visual illusion scenes. Specifically, the DNN represented on the transparent arrow indicates the detection results on the corresponding scene images. The module in the middle serves as both the question and the conclusion, driving experimental research, while also deriving this conclusion from the detection results.
Figure 2
Figure 2
Cognitive response experiments of DNNs on MNIST-Abutting grating scene images. (a) Overall architecture of scene image generation method for MNIST-Abutting grating visual illusion. The method operates through a systematic top-down inference process, encompassing three distinct stages: firstly, it generates standard MNIST images (P1); then, it automatically generates corresponding labels (P2); finally, utilizing this generated information, MNIST-Abutting grating label images (P3) is produced. (b) Ridgeline plots depicting the pattern classification performance of DNN models on MNIST-Abutting grating visual illusion images.
Figure 3
Figure 3
Overall architecture of generation method for Kanizsa Polygon-Abutting grating dataset and semantic segmentation results. (a) The generation method comprises a top-down generation process. Initially, the process generates standard Kanizsa Polygon images (P1). Subsequently, it automatically generates corresponding label data (P2). Finally, utilizing the generated information, it produces adjustable label images of visual illusion scenes (P3). This sequential generation process ensures the creation of standardized and comprehensive scene images dataset of Kanizsa Polygon-Abutting grating visual illusion. (b) This graph displays the segmentation results of YOLOv8 and Mask R-CNN models under different training methods. The “Training Ways” row at the top lists various training image variants (Original, Hor, Ver, Ver2, Hor2) along with their respective training methods (Scratch/Pre, indicating training from scratch or using a pre-trained model). On the right, segmentation results of YOLOv8 and Mask R-CNN are shown for various test image patterns, including Original, Hor, Hor1, Ver, Ver1, etc. Each cell image demonstrates the model’s segmentation performance on the input, highlighting how these models perform differently under complex visual phenomena. This structure provides a clear comparison of detection accuracy and robustness across different visual illusion images for each model.
Figure 4
Figure 4
Cognitive response experiments of DNNs on ColorMNIST-Abutting grating scene images. (a) Overall architecture of visual illusion scenes generation method for ColorMNIST-Abutting grating. The method comprises five steps in a top-down inference manner. Initially, standard Color Blindness images (C1) and MNIST images (M1) are generated. Subsequently, clustering label pixels in images, considering both background and foreground patterns in Color Blindness (C2) and MNIST images (M2). Using this data, new Color Blindness images, referred to as ColorMNIST (P3), and corresponding label data (P4) are automatically generated. Finally, adjustable label images of ColorMNIST-Abutting grating visual illusion scenes are produced. (b) Radar chart representation of precision, recall and F1 score. The center represents training set, with rays for each test set labeled with metric values. Accurately segmented images are highlighted with red circles.
Figure 5
Figure 5
Comparison of detection and segmentation performance of different models across various visual illusion scene images. (a) Presents the object detection accuracy of models such as MobileViT, SwinTransform and VisionTransform, with percentages indicating the classification accuracy of each model fine-tuned on different image patterns. (b) Examples of different image modes from the COCO dataset, highlighting variations in stripe width, transparency and overall visual complexity. These samples are valuable for assessing the ability of cognitive models to recognize patterns under challenging conditions. (c) Compares the segmentation performance of YOLOv8 and Mask R-CNN on the COCO dataset, where color intensity represents segmentation accuracy-darker shades indicate higher accuracy, while lighter shades represent lower performance.
Figure 6
Figure 6
Training performance of YOLOv8 models on Kanizsa Polygon-Abutting grating, ColorMNIST-Abutting grating and COCO-Abutting grating visual illusion images. The confusion matrix reflects the classification accuracy, with the Kanizsa dataset showing the best performance and COCO dataset displaying more misclassifications. The feature space plots show the distribution of samples from different categories, where the Kanizsa dataset exhibits clear separability, while the COCO dataset has more dispersed distributions. The loss trend graphs illustrate the changes in loss functions during training, indicating smooth convergence on simpler tasks and slower, more fluctuating convergence on more complex tasks.

Similar articles

References

    1. Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science362, 1140–1144 (2018). - PubMed
    1. Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science361, 1004–1008 (2018). - PubMed
    1. Li, Y. et al. Competition-level code generation with alphacode. Science378, 1092–1097 (2022). - PubMed
    1. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.61, 85–117 (2015). - PubMed
    1. Silver, D. et al. Mastering the game of go without human knowledge. Nature550, 354–359 (2017). - PubMed

Publication types

Grants and funding

LinkOut - more resources