1
|
Bougou V, Vanhoyland M, Bertrand A, Van Paesschen W, Op De Beeck H, Janssen P, Theys T. Neuronal tuning and population representations of shape and category in human visual cortex. Nat Commun 2024; 15:4608. [PMID: 38816391 PMCID: PMC11139926 DOI: 10.1038/s41467-024-49078-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 05/22/2024] [Indexed: 06/01/2024] Open
Abstract
Object recognition and categorization are essential cognitive processes which engage considerable neural resources in the human ventral visual stream. However, the tuning properties of human ventral stream neurons for object shape and category are virtually unknown. We performed large-scale recordings of spiking activity in human Lateral Occipital Complex in response to stimuli in which the shape dimension was dissociated from the category dimension. Consistent with studies in nonhuman primates, the neuronal representations were primarily shape-based, although we also observed category-like encoding for images of animals. Surprisingly, linear decoders could reliably classify stimulus category even in data sets that were entirely shape-based. In addition, many recording sites showed an interaction between shape and category tuning. These results represent a detailed study on shape and category coding at the neuronal level in the human ventral visual stream, furnishing essential evidence that reconciles human imaging and macaque single-cell studies.
Collapse
Affiliation(s)
- Vasiliki Bougou
- Research Group of Experimental Neurosurgery and Neuroanatomy, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium
- Laboratory for Neuro-and Psychophysiology, Research Group Neurophysiology, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium
| | - Michaël Vanhoyland
- Research Group of Experimental Neurosurgery and Neuroanatomy, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium
- Laboratory for Neuro-and Psychophysiology, Research Group Neurophysiology, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium
- Department of Neurosurgery, University Hospitals Leuven, Leuven, Belgium
| | | | - Wim Van Paesschen
- Department of Neurology, University Hospitals Leuven, Leuven, Belgium
- Laboratory for Epilepsy Research, KU Leuven, Leuven, Belgium
| | - Hans Op De Beeck
- Laboratory Biological Psychology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Peter Janssen
- Laboratory for Neuro-and Psychophysiology, Research Group Neurophysiology, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium.
| | - Tom Theys
- Research Group of Experimental Neurosurgery and Neuroanatomy, Department of Neurosciences, KU Leuven and the Leuven Brain Institute, Leuven, Belgium
- Department of Neurosurgery, University Hospitals Leuven, Leuven, Belgium
| |
Collapse
|
2
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2024:10.3758/s13421-024-01580-1. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
3
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. Nat Commun 2024; 15:1989. [PMID: 38443349 PMCID: PMC10915141 DOI: 10.1038/s41467-024-45679-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/30/2024] [Indexed: 03/07/2024] Open
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide multi-faceted neurocomputational evidence that blurry visual experiences may be critical for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea.
| | - Frank Tong
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
4
|
Zhang H, Yoshida S, Li Z. Brain-like illusion produced by Skye's Oblique Grating in deep neural networks. PLoS One 2024; 19:e0299083. [PMID: 38394261 PMCID: PMC10889903 DOI: 10.1371/journal.pone.0299083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024] Open
Abstract
The analogy between the brain and deep neural networks (DNNs) has sparked interest in neuroscience. Although DNNs have limitations, they remain valuable for modeling specific brain characteristics. This study used Skye's Oblique Grating illusion to assess DNNs' relevance to brain neural networks. We collected data on human perceptual responses to a series of visual illusions. This data was then used to assess how DNN responses to these illusions paralleled or differed from human behavior. We performed two analyses:(1) We trained DNNs to perform horizontal vs. non-horizontal classification on images with bars tilted different degrees (non-illusory images) and tested them on images with horizontal bars with different illusory strengths measured by human behavior (illusory images), finding that DNNs showed human-like illusions; (2) We performed representational similarity analysis to assess whether illusory representation existed in different layers within DNNs, finding that DNNs showed illusion-like responses to illusory images. The representational similarity between real tilted images and illusory images was calculated, which showed the highest values in the early layers and decreased layer-by-layer. Our findings suggest that DNNs could serve as potential models for explaining the mechanism of visual illusions in human brain, particularly those that may originate in early visual areas like the primary visual cortex (V1). While promising, further research is necessary to understand the nuanced differences between DNNs and human visual pathways.
Collapse
Affiliation(s)
- Hongtao Zhang
- Graduate School of Engineering, Kochi University of Technology, Kami, Kochi, Japan
| | - Shinichi Yoshida
- School of Information, Kochi University of Technology, Kami, Kochi, Japan
| | - Zhen Li
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University, Shenzhen, China
- Department of Engineering, Shenzhen MSU-BIT University, Shenzhen, China
| |
Collapse
|
5
|
Shoham A, Grosbard ID, Patashnik O, Cohen-Or D, Yovel G. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 2024:10.1038/s41562-024-01816-9. [PMID: 38332339 DOI: 10.1038/s41562-024-01816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 12/22/2023] [Indexed: 02/10/2024]
Abstract
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Collapse
Affiliation(s)
- Adva Shoham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Idan Daniel Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Or Patashnik
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Daniel Cohen-Or
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
6
|
Schnell AE, Leemans M, Vinken K, Op de Beeck H. A computationally informed comparison between the strategies of rodents and humans in visual object recognition. eLife 2023; 12:RP87719. [PMID: 38079481 PMCID: PMC10712954 DOI: 10.7554/elife.87719] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In this study, we created a large multidimensional stimulus set and designed a visual discrimination task partially based upon modelling with a convolutional deep neural network (CNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 45). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a CNN. A direct comparison with CNN representations and visual feature analyses revealed that rat performance was best captured by late convolutional layers and partially by visual features such as brightness and pixel-level similarity, while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
Collapse
Affiliation(s)
| | - Maarten Leemans
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| | - Kasper Vinken
- Department of Neurobiology, Harvard Medical SchoolBostonUnited States
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| |
Collapse
|
7
|
Kellman PJ, Baker N, Garrigan P, Phillips A, Lu H. For deep networks, the whole equals the sum of the parts. Behav Brain Sci 2023; 46:e396. [PMID: 38054331 DOI: 10.1017/s0140525x23001541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep convolutional networks exceed humans in sensitivity to local image properties, but unlike biological vision systems, do not discover and encode abstract relations that capture important properties of objects and events in the world. Coupling network architectures with additional machinery for encoding abstract relations will make deep networks better models of human abilities and more versatile and capable artificial devices.
Collapse
Affiliation(s)
- Philip J Kellman
- Department of Psychology and David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA ; https://kellmanlab.psych.ucla.edu/
| | - Nicholas Baker
- Department of Psychology, Loyola University of Chicago, Chicago, IL, USA ; https://www.luc.edu/psychology/people/staff/facultyandstaff/nicholasbaker/
| | - Patrick Garrigan
- Department of Psychology, St. Joseph's University, Philadelphia, PA, USA ; https://sjupsych.org/faculty_pg.php
| | - Austin Phillips
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA ; https://kellmanlab.psych.ucla.edu/
| | - Hongjing Lu
- Department of Psychology and Department of Statistics, University of California, Los Angeles, Los Angeles, CA, USA ; https://cvl.psych.ucla.edu/
| |
Collapse
|
8
|
Bowers JS, Malhotra G, Dujmović M, Montero ML, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Clarifying status of DNNs as models of human vision. Behav Brain Sci 2023; 46:e415. [PMID: 38054298 DOI: 10.1017/s0140525x23002777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton L Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | | | - Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
9
|
Op de Beeck H, Bracci S. Going after the bigger picture: Using high-capacity models to understand mind and brain. Behav Brain Sci 2023; 46:e404. [PMID: 38054291 DOI: 10.1017/s0140525x2300153x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural networks (DNNs) provide a unique opportunity to move towards a generic modelling framework in psychology. The high representational capacity of these models combined with the possibility for further extensions has already allowed us to investigate the forest, namely the complex landscape of representations and processes that underlie human cognition, without forgetting about the trees, which include individual psychological phenomena.
Collapse
Affiliation(s)
| | - Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy ://webapps.unitn.it/du/en/Persona/PER0076943/Curriculum
| |
Collapse
|
10
|
Tomizawa Y, Minamino N, Shimokawa E, Kawamura S, Komatsu A, Hiwatashi T, Nishihama R, Ueda T, Kohchi T, Kondo Y. Harnessing Deep Learning to Analyze Cryptic Morphological Variability of Marchantia polymorpha. PLANT & CELL PHYSIOLOGY 2023; 64:1343-1355. [PMID: 37797211 DOI: 10.1093/pcp/pcad117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 09/20/2023] [Accepted: 09/29/2023] [Indexed: 10/07/2023]
Abstract
Characterizing phenotypes is a fundamental aspect of biological sciences, although it can be challenging due to various factors. For instance, the liverwort Marchantia polymorpha is a model system for plant biology and exhibits morphological variability, making it difficult to identify and quantify distinct phenotypic features using objective measures. To address this issue, we utilized a deep-learning-based image classifier that can handle plant images directly without manual extraction of phenotypic features and analyzed pictures of M. polymorpha. This dioicous plant species exhibits morphological differences between male and female wild accessions at an early stage of gemmaling growth, although it remains elusive whether the differences are attributable to sex chromosomes. To isolate the effects of sex chromosomes from autosomal polymorphisms, we established a male and female set of recombinant inbred lines (RILs) from a set of male and female wild accessions. We then trained deep learning models to classify the sexes of the RILs and the wild accessions. Our results showed that the trained classifiers accurately classified male and female gemmalings of wild accessions in the first week of growth, confirming the intuition of researchers in a reproducible and objective manner. In contrast, the RILs were less distinguishable, indicating that the differences between the parental wild accessions arose from autosomal variations. Furthermore, we validated our trained models by an 'eXplainable AI' technique that highlights image regions relevant to the classification. Our findings demonstrate that the classifier-based approach provides a powerful tool for analyzing plant species that lack standardized phenotyping metrics.
Collapse
Affiliation(s)
- Yoko Tomizawa
- Quantitative Biology Research Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji-cho, Okazak, Aichii, 444-8787 Japan
| | - Naoki Minamino
- Division of Cellular Dynamics, National Institute for Basic Biology, Nishigonaka 38, Myodaiji, Okazaki, Aichi, 444-8585 Japan
| | - Eita Shimokawa
- Graduate School of Biostudies, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo, Kyoto, 606-8502 Japan
| | - Shogo Kawamura
- Graduate School of Biostudies, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo, Kyoto, 606-8502 Japan
| | - Aino Komatsu
- Graduate School of Biostudies, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo, Kyoto, 606-8502 Japan
| | - Takuma Hiwatashi
- Division of Cellular Dynamics, National Institute for Basic Biology, Nishigonaka 38, Myodaiji, Okazaki, Aichi, 444-8585 Japan
| | - Ryuichi Nishihama
- Graduate School of Biostudies, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo, Kyoto, 606-8502 Japan
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba, 278-8510 Japan
| | - Takashi Ueda
- Division of Cellular Dynamics, National Institute for Basic Biology, Nishigonaka 38, Myodaiji, Okazaki, Aichi, 444-8585 Japan
- Department of Basic Biology, SOKENDAI (The Graduate University for Advanced Studies), Nishigonaka 38, Myodaiji, Okazaki, Aichi, 444-8585 Japan
| | - Takayuki Kohchi
- Graduate School of Biostudies, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo, Kyoto, 606-8502 Japan
| | - Yohei Kondo
- Quantitative Biology Research Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji-cho, Okazak, Aichii, 444-8787 Japan
- Division of Quantitative Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji-cho, Okazaki, Aichi, 444-8787 Japan
- Department of Basic Biology, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), 5-1 Higashiyama, Myodaiji-cho, Okazaki, Aichi, 444-8787 Japan
| |
Collapse
|
11
|
Moore JA, Wilms M, Gutierrez A, Ismail Z, Fakhar K, Hadaeghi F, Hilgetag CC, Forkert ND. Simulation of neuroplasticity in a CNN-based in-silico model of neurodegeneration of the visual system. Front Comput Neurosci 2023; 17:1274824. [PMID: 38105786 PMCID: PMC10722164 DOI: 10.3389/fncom.2023.1274824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/08/2023] [Indexed: 12/19/2023] Open
Abstract
The aim of this work was to enhance the biological feasibility of a deep convolutional neural network-based in-silico model of neurodegeneration of the visual system by equipping it with a mechanism to simulate neuroplasticity. Therefore, deep convolutional networks of multiple sizes were trained for object recognition tasks and progressively lesioned to simulate neurodegeneration of the visual cortex. More specifically, the injured parts of the network remained injured while we investigated how the added retraining steps were able to recover some of the model's object recognition baseline performance. The results showed with retraining, model object recognition abilities are subject to a smoother and more gradual decline with increasing injury levels than without retraining and, therefore, more similar to the longitudinal cognition impairments of patients diagnosed with Alzheimer's disease (AD). Moreover, with retraining, the injured model exhibits internal activation patterns similar to those of the healthy baseline model when compared to the injured model without retraining. Furthermore, we conducted this analysis on a network that had been extensively pruned, resulting in an optimized number of parameters or synapses. Our findings show that this network exhibited remarkably similar capability to recover task performance with decreasingly viable pathways through the network. In conclusion, adding a retraining step to the in-silico setup that simulates neuroplasticity improves the model's biological feasibility considerably and could prove valuable to test different rehabilitation approaches in-silico.
Collapse
Affiliation(s)
- Jasmine A. Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Matthias Wilms
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Alejandro Gutierrez
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
| | - Kayson Fakhar
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Fatemeh Hadaeghi
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Claus C. Hilgetag
- Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Department of Health Sciences, Boston University, Boston, MA, United States
| | - Nils D. Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
12
|
Gu Z, Jamison K, Sabuncu MR, Kuceyeski A. Human brain responses are modulated when exposed to optimized natural images or synthetically generated images. Commun Biol 2023; 6:1076. [PMID: 37872319 PMCID: PMC10593916 DOI: 10.1038/s42003-023-05440-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023] Open
Abstract
Understanding how human brains interpret and process information is important. Here, we investigated the selectivity and inter-individual differences in human brain responses to images via functional MRI. In our first experiment, we found that images predicted to achieve maximal activations using a group level encoding model evoke higher responses than images predicted to achieve average activations, and the activation gain is positively associated with the encoding model accuracy. Furthermore, anterior temporal lobe face area (aTLfaces) and fusiform body area 1 had higher activation in response to maximal synthetic images compared to maximal natural images. In our second experiment, we found that synthetic images derived using a personalized encoding model elicited higher responses compared to synthetic images from group-level or other subjects' encoding models. The finding of aTLfaces favoring synthetic images than natural images was also replicated. Our results indicate the possibility of using data-driven and generative approaches to modulate macro-scale brain region responses and probe inter-individual differences in and functional specialization of the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
13
|
Magri C, Elmoznino E, Bonner MF. Scene context is predictive of unconstrained object similarity judgments. Cognition 2023; 239:105535. [PMID: 37481806 DOI: 10.1016/j.cognition.2023.105535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/25/2023]
Abstract
What makes objects alike in the human mind? Computational approaches for characterizing object similarity have largely focused on the visual forms of objects or their linguistic associations. However, intuitive notions of object similarity may depend heavily on contextual reasoning-that is, objects may be grouped together in the mind if they occur in the context of similar scenes or events. Using large-scale analyses of natural scene statistics and human behavior, we found that a computational model of the associations between objects and their scene contexts is strongly predictive of how humans spontaneously group objects by similarity. Specifically, we learned contextual prototypes for a diverse set of object categories by taking the average response of a convolutional neural network (CNN) to the scene contexts in which the objects typically occurred. In behavioral experiments, we found that contextual prototypes were strongly predictive of human similarity judgments for a large set of objects and rivaled the performance of models based on CNN representations of the objects themselves or word embeddings for their names. Together, our findings reveal the remarkable degree to which the natural statistics of context predict commonsense notions of object similarity.
Collapse
Affiliation(s)
- Caterina Magri
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America
| | - Eric Elmoznino
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America
| | - Michael F Bonner
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, United States of America.
| |
Collapse
|
14
|
Farahat A, Effenberger F, Vinck M. A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations. Neural Netw 2023; 167:400-414. [PMID: 37673027 DOI: 10.1016/j.neunet.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 07/07/2023] [Accepted: 08/13/2023] [Indexed: 09/08/2023]
Abstract
Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.
Collapse
Affiliation(s)
- Amr Farahat
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands.
| | - Felix Effenberger
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Frankfurt Institute for Advanced Studies, Frankfurt, Germany
| | - Martin Vinck
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Veerabadran V, Goldman J, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow I, Shlens J, Sohl-Dickstein J, Mozer MC, Elsayed GF. Subtle adversarial image manipulations influence both human and machine perception. Nat Commun 2023; 14:4933. [PMID: 37582834 PMCID: PMC10427626 DOI: 10.1038/s41467-023-40499-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 08/01/2023] [Indexed: 08/17/2023] Open
Abstract
Although artificial neural networks (ANNs) were inspired by the brain, ANNs exhibit a brittleness not generally observed in human perception. One shortcoming of ANNs is their susceptibility to adversarial perturbations-subtle modulations of natural images that result in changes to classification decisions, such as confidently mislabelling an image of an elephant, initially classified correctly, as a clock. In contrast, a human observer might well dismiss the perturbations as an innocuous imaging artifact. This phenomenon may point to a fundamental difference between human and machine perception, but it drives one to ask whether human sensitivity to adversarial perturbations might be revealed with appropriate behavioral measures. Here, we find that adversarial perturbations that fool ANNs similarly bias human choice. We further show that the effect is more likely driven by higher-order statistics of natural images to which both humans and ANNs are sensitive, rather than by the detailed architecture of the ANN.
Collapse
Affiliation(s)
- Vijay Veerabadran
- Google, Mountain View, CA, USA
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | | | - Shreya Shankar
- Google, Mountain View, CA, USA
- University of California, Berkeley, CA, USA
| | - Brian Cheung
- Google, Mountain View, CA, USA
- MIT Brain and Cognitive Sciences, Cambridge, MA, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Ferrández MC, Golla SSV, Eertink JJ, de Vries BM, Lugtenburg PJ, Wiegers SE, Zwezerijnen GJC, Pieplenbosch S, Kurch L, Hüttmann A, Hanoun C, Dührsen U, de Vet HCW, Zijlstra JM, Boellaard R. An artificial intelligence method using FDG PET to predict treatment outcome in diffuse large B cell lymphoma patients. Sci Rep 2023; 13:13111. [PMID: 37573446 PMCID: PMC10423266 DOI: 10.1038/s41598-023-40218-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/07/2023] [Indexed: 08/14/2023] Open
Abstract
Convolutional neural networks (CNNs) may improve response prediction in diffuse large B-cell lymphoma (DLBCL). The aim of this study was to investigate the feasibility of a CNN using maximum intensity projection (MIP) images from 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) baseline scans to predict the probability of time-to-progression (TTP) within 2 years and compare it with the International Prognostic Index (IPI), i.e. a clinically used score. 296 DLBCL 18F-FDG PET/CT baseline scans collected from a prospective clinical trial (HOVON-84) were analysed. Cross-validation was performed using coronal and sagittal MIPs. An external dataset (340 DLBCL patients) was used to validate the model. Association between the probabilities, metabolic tumour volume and Dmaxbulk was assessed. Probabilities for PET scans with synthetically removed tumors were also assessed. The CNN provided a 2-year TTP prediction with an area under the curve (AUC) of 0.74, outperforming the IPI-based model (AUC = 0.68). Furthermore, high probabilities (> 0.6) of the original MIPs were considerably decreased after removing the tumours (< 0.4, generally). These findings suggest that MIP-based CNNs are able to predict treatment outcome in DLBCL.
Collapse
Affiliation(s)
- Maria C Ferrández
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands.
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands.
| | - Sandeep S V Golla
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Jakoba J Eertink
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Department of Hematology, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Bart M de Vries
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Pieternella J Lugtenburg
- Department of Hematology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Sanne E Wiegers
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Gerben J C Zwezerijnen
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Simone Pieplenbosch
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Department of Hematology, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Lars Kurch
- Department of Nuclear Medicine, Clinic and Polyclinic for Nuclear Medicine, University of Leipzig, Leipzig, Germany
| | - Andreas Hüttmann
- Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Christine Hanoun
- Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Ulrich Dührsen
- Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Henrica C W de Vet
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Methodology, Amsterdam Public Health Research Institute, Methodology, Amsterdam, The Netherlands
| | - Josée M Zijlstra
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Department of Hematology, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ronald Boellaard
- Cancer Center Amsterdam, Department of Radiology and Nuclear Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| |
Collapse
|
17
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.29.551089. [PMID: 37577646 PMCID: PMC10418076 DOI: 10.1101/2023.07.29.551089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide novel neurocomputational evidence that blurry visual experiences are very important for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| | - Frank Tong
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| |
Collapse
|
18
|
Celeghin A, Borriero A, Orsenigo D, Diano M, Méndez Guerrero CA, Perotti A, Petri G, Tamietto M. Convolutional neural networks for vision neuroscience: significance, developments, and outstanding issues. Front Comput Neurosci 2023; 17:1153572. [PMID: 37485400 PMCID: PMC10359983 DOI: 10.3389/fncom.2023.1153572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 06/19/2023] [Indexed: 07/25/2023] Open
Abstract
Convolutional Neural Networks (CNN) are a class of machine learning models predominately used in computer vision tasks and can achieve human-like performance through learning from experience. Their striking similarities to the structural and functional principles of the primate visual system allow for comparisons between these artificial networks and their biological counterparts, enabling exploration of how visual functions and neural representations may emerge in the real brain from a limited set of computational principles. After considering the basic features of CNNs, we discuss the opportunities and challenges of endorsing CNNs as in silico models of the primate visual system. Specifically, we highlight several emerging notions about the anatomical and physiological properties of the visual system that still need to be systematically integrated into current CNN models. These tenets include the implementation of parallel processing pathways from the early stages of retinal input and the reconsideration of several assumptions concerning the serial progression of information flow. We suggest design choices and architectural constraints that could facilitate a closer alignment with biology provide causal evidence of the predictive link between the artificial and biological visual systems. Adopting this principled perspective could potentially lead to new research questions and applications of CNNs beyond modeling object recognition.
Collapse
Affiliation(s)
| | | | - Davide Orsenigo
- Department of Psychology, University of Torino, Turin, Italy
| | - Matteo Diano
- Department of Psychology, University of Torino, Turin, Italy
| | | | | | | | - Marco Tamietto
- Department of Psychology, University of Torino, Turin, Italy
- Department of Medical and Clinical Psychology, and CoRPS–Center of Research on Psychology in Somatic Diseases–Tilburg University, Tilburg, Netherlands
| |
Collapse
|
19
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
20
|
Hawkins RD, Sano M, Goodman ND, Fan JE. Visual resemblance and interaction history jointly constrain pictorial meaning. Nat Commun 2023; 14:2199. [PMID: 37069160 PMCID: PMC10110538 DOI: 10.1038/s41467-023-37737-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 03/28/2023] [Indexed: 04/19/2023] Open
Abstract
How do drawings-ranging from detailed illustrations to schematic diagrams-reliably convey meaning? Do viewers understand drawings based on how strongly they resemble an entity (i.e., as images) or based on socially mediated conventions (i.e., as symbols)? Here we evaluate a cognitive account of pictorial meaning in which visual and social information jointly support visual communication. Pairs of participants used drawings to repeatedly communicate the identity of a target object among multiple distractor objects. We manipulated social cues across three experiments and a full replication, finding that participants developed object-specific and interaction-specific strategies for communicating more efficiently over time, beyond what task practice or a resemblance-based account alone could explain. Leveraging model-based image analyses and crowdsourced annotations, we further determined that drawings did not drift toward "arbitrariness," as predicted by a pure convention-based account, but preserved visually diagnostic features. Taken together, these findings advance psychological theories of how successful graphical conventions emerge.
Collapse
Affiliation(s)
- Robert D Hawkins
- Department of Psychology, Stanford University, Stanford, CA, USA.
- Department of Psychology, Princeton University, Princeton, NJ, USA.
| | - Megumi Sano
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Noah D Goodman
- Department of Psychology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Judith E Fan
- Department of Psychology, Stanford University, Stanford, CA, USA.
- Department of Psychology, University of California, San Diego, SC, USA.
| |
Collapse
|
21
|
Tsvetkov C, Malhotra G, Evans BD, Bowers JS. The role of capacity constraints in Convolutional Neural Networks for learning random versus natural data. Neural Netw 2023; 161:515-524. [PMID: 36805266 DOI: 10.1016/j.neunet.2023.01.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/17/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023]
Abstract
Convolutional neural networks (CNNs) are often described as promising models of human vision, yet they show many differences from human abilities. We focus on a superhuman capacity of top-performing CNNs, namely, their ability to learn very large datasets of random patterns. We verify that human learning on such tasks is extremely limited, even with few stimuli. We argue that the performance difference is due to CNNs' overcapacity and introduce biologically inspired mechanisms to constrain it, while retaining the good test set generalisation to structured images as characteristic of CNNs. We investigate the efficacy of adding noise to hidden units' activations, restricting early convolutional layers with a bottleneck, and using a bounded activation function. Internal noise was the most potent intervention and the only one which, by itself, could reduce random data performance in the tested models to chance levels. We also investigated whether networks with biologically inspired capacity constraints show improved generalisation to out-of-distribution stimuli, however little benefit was observed. Our results suggest that constraining networks with biologically motivated mechanisms paves the way for closer correspondence between network and human performance, but the few manipulations we have tested are only a small step towards that goal.
Collapse
Affiliation(s)
- Christian Tsvetkov
- School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.
| | - Benjamin D Evans
- School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK; Department of Informatics, School of Engineering and Informatics, University of Sussex, Falmer, Brighton, BN1 9RH, UK.
| | - Jeffrey S Bowers
- School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.
| |
Collapse
|
22
|
Fan J, Zeng Y. Challenging deep learning models with image distortion based on the abutting grating illusion. PATTERNS (NEW YORK, N.Y.) 2023; 4:100695. [PMID: 36960449 PMCID: PMC10028432 DOI: 10.1016/j.patter.2023.100695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 11/07/2022] [Accepted: 02/01/2023] [Indexed: 03/06/2023]
Abstract
Even state-of-the-art deep learning models lack fundamental abilities compared with humans. While many image distortions have been proposed to compare deep learning with humans, they depend on mathematical transformations instead of human cognitive functions. Here, we propose an image distortion based on the abutting grating illusion, which is a phenomenon discovered in humans and animals. The distortion generates illusory contour perception using line gratings abutting each other. We applied the method to MNIST, high-resolution MNIST, and "16-class-ImageNet" silhouettes. Many models, including models trained from scratch and 109 models pretrained with ImageNet or various data augmentation techniques, were tested. Our results show that abutting grating distortion is challenging even for state-of-the-art deep learning models. We discovered that DeepAugment models outperformed other pretrained models. Visualization of early layers indicates that better-performing models exhibit the endstopping property, which is consistent with neuroscience discoveries. Twenty-four human subjects classified distorted samples to validate the distortion.
Collapse
Affiliation(s)
- Jinyu Fan
- Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zeng
- Brain-inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
- Corresponding author
| |
Collapse
|
23
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
24
|
Jha A, Peterson JC, Griffiths TL. Extracting Low-Dimensional Psychological Representations from Convolutional Neural Networks. Cogn Sci 2023; 47:e13226. [PMID: 36617318 DOI: 10.1111/cogs.13226] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 11/21/2022] [Accepted: 11/23/2022] [Indexed: 01/09/2023]
Abstract
Convolutional neural networks (CNNs) are increasingly widely used in psychology and neuroscience to predict how human minds and brains respond to visual images. Typically, CNNs represent these images using thousands of features that are learned through extensive training on image datasets. This raises a question: How many of these features are really needed to model human behavior? Here, we attempt to estimate the number of dimensions in CNN representations that are required to capture human psychological representations in two ways: (1) directly, using human similarity judgments and (2) indirectly, in the context of categorization. In both cases, we find that low-dimensional projections of CNN representations are sufficient to predict human behavior. We show that these low-dimensional representations can be easily interpreted, providing further insight into how people represent visual information. A series of control studies indicate that these findings are not due to the size of the dataset we used and may be due to a high level of redundancy in the features appearing in CNN representations.
Collapse
Affiliation(s)
- Aditi Jha
- Department of Electrical and Computer Engineering, Princeton University.,Princeton Neuroscience Institute, Princeton University
| | | | - Thomas L Griffiths
- Department of Computer Science, Princeton University.,Department of Psychology, Princeton University
| |
Collapse
|
25
|
Early experience with low-pass filtered images facilitates visual category learning in a neural network model. PLoS One 2023; 18:e0280145. [PMID: 36608003 PMCID: PMC9821476 DOI: 10.1371/journal.pone.0280145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
Humans are born with very low contrast sensitivity, meaning that inputs to the infant visual system are both blurry and low contrast. Is this solely a byproduct of maturational processes or is there a functional advantage for beginning life with poor visual acuity? We addressed the impact of poor vision during early learning by exploring whether reduced visual acuity facilitated the acquisition of basic-level categories in a convolutional neural network model (CNN), as well as whether any such benefit transferred to subordinate-level category learning. Using the ecoset dataset to simulate basic-level category learning, we manipulated model training curricula along three dimensions: presence of blurred inputs early in training, rate of blur reduction over time, and grayscale versus color inputs. First, a training regime where blur was initially high and was gradually reduced over time-as in human development-improved basic-level categorization performance in a CNN relative to a regime in which non-blurred inputs were used throughout training. Second, when basic-level models were fine-tuned on a task including both basic-level and subordinate-level categories (using the ImageNet dataset), models initially trained with blurred inputs showed a greater performance benefit as compared to models trained exclusively on non-blurred inputs, suggesting that the benefit of blurring generalized from basic-level to subordinate-level categorization. Third, analogous to the low sensitivity to color that infants experience during the first 4-6 months of development, these advantages were observed only when grayscale images were used as inputs. We conclude that poor visual acuity in human newborns may confer functional advantages, including, as demonstrated here, more rapid and accurate acquisition of visual object categories at multiple levels.
Collapse
|
26
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
27
|
Li YF, Ying H. Disrupted visual input unveils the computational details of artificial neural networks for face perception. Front Comput Neurosci 2022; 16:1054421. [PMID: 36523327 PMCID: PMC9744930 DOI: 10.3389/fncom.2022.1054421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/10/2022] [Indexed: 09/19/2023] Open
Abstract
Background Convolutional Neural Network (DCNN), with its great performance, has attracted attention of researchers from many disciplines. The studies of the DCNN and that of biological neural systems have inspired each other reciprocally. The brain-inspired neural networks not only achieve great performance but also serve as a computational model of biological neural systems. Methods Here in this study, we trained and tested several typical DCNNs (AlexNet, VGG11, VGG13, VGG16, DenseNet, MobileNet, and EfficientNet) with a face ethnicity categorization task for experiment 1, and an emotion categorization task for experiment 2. We measured the performance of DCNNs by testing them with original and lossy visual inputs (various kinds of image occlusion) and compared their performance with human participants. Moreover, the class activation map (CAM) method allowed us to visualize the foci of the "attention" of these DCNNs. Results The results suggested that the VGG13 performed the best: Its performance closely resembled human participants in terms of psychophysics measurements, it utilized similar areas of visual inputs as humans, and it had the most consistent performance with inputs having various kinds of impairments. Discussion In general, we examined the processing mechanism of DCNNs using a new paradigm and found that VGG13 might be the most human-like DCNN in this task. This study also highlighted a possible paradigm to study and develop DCNNs using human perception as a benchmark.
Collapse
Affiliation(s)
| | - Haojiang Ying
- Department of Psychology, Soochow University, Suzhou, China
| |
Collapse
|
28
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
29
|
Baker N, Elder JH. Deep learning models fail to capture the configural nature of human shape perception. iScience 2022; 25:104913. [PMID: 36060067 PMCID: PMC9429800 DOI: 10.1016/j.isci.2022.104913] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/06/2022] [Accepted: 08/08/2022] [Indexed: 11/26/2022] Open
|
30
|
Abstract
Sentences contain structure that determines their meaning beyond that of individual words. An influential study by Ding and colleagues (2016) used frequency tagging of phrases and sentences to show that the human brain is sensitive to structure by finding peaks of neural power at the rate at which structures were presented. Since then, there has been a rich debate on how to best explain this pattern of results with profound impact on the language sciences. Models that use hierarchical structure building, as well as models based on associative sequence processing, can predict the neural response, creating an inferential impasse as to which class of models explains the nature of the linguistic computations reflected in the neural readout. In the current manuscript, we discuss pitfalls and common fallacies seen in the conclusions drawn in the literature illustrated by various simulations. We conclude that inferring the neural operations of sentence processing based on these neural data, and any like it, alone, is insufficient. We discuss how to best evaluate models and how to approach the modeling of neural readouts to sentence processing in a manner that remains faithful to cognitive, neural, and linguistic principles.
Collapse
Affiliation(s)
- Sanne Ten Oever
- Language and Computation in Neural Systems Group, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, the Netherlands
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Karthikeya Kaushik
- Language and Computation in Neural Systems Group, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Language and Computation in Neural Systems Group, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, the Netherlands
- * E-mail:
| |
Collapse
|
31
|
Fiser J, Lengyel G. Statistical Learning in Vision. Annu Rev Vis Sci 2022; 8:265-290. [PMID: 35727961 DOI: 10.1146/annurev-vision-100720-103343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Vision and learning have long been considered to be two areas of research linked only distantly. However, recent developments in vision research have changed the conceptual definition of vision from a signal-evaluating process to a goal-oriented interpreting process, and this shift binds learning, together with the resulting internal representations, intimately to vision. In this review, we consider various types of learning (perceptual, statistical, and rule/abstract) associated with vision in the past decades and argue that they represent differently specialized versions of the fundamental learning process, which must be captured in its entirety when applied to complex visual processes. We show why the generalized version of statistical learning can provide the appropriate setup for such a unified treatment of learning in vision, what computational framework best accommodates this kind of statistical learning, and what plausible neural scheme could feasibly implement this framework. Finally, we list the challenges that the field of statistical learning faces in fulfilling the promise of being the right vehicle for advancing our understanding of vision in its entirety. Expected final online publication date for the Annual Review of Vision Science, Volume 8 is September 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- József Fiser
- Department of Cognitive Science, Center for Cognitive Computation, Central European University, Vienna 1100, Austria;
| | - Gábor Lengyel
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
32
|
Malhotra G, Dujmović M, Bowers JS. Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Comput Biol 2022; 18:e1009572. [PMID: 35560155 PMCID: PMC9132323 DOI: 10.1371/journal.pcbi.1009572] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 05/25/2022] [Accepted: 03/19/2022] [Indexed: 12/02/2022] Open
Abstract
Humans rely heavily on the shape of objects to recognise them. Recently, it has been argued that Convolutional Neural Networks (CNNs) can also show a shape-bias, provided their learning environment contains this bias. This has led to the proposal that CNNs provide good mechanistic models of shape-bias and, more generally, human visual processing. However, it is also possible that humans and CNNs show a shape-bias for very different reasons, namely, shape-bias in humans may be a consequence of architectural and cognitive constraints whereas CNNs show a shape-bias as a consequence of learning the statistics of the environment. We investigated this question by exploring shape-bias in humans and CNNs when they learn in a novel environment. We observed that, in this new environment, humans (i) focused on shape and overlooked many non-shape features, even when non-shape features were more diagnostic, (ii) learned based on only one out of multiple predictive features, and (iii) failed to learn when global features, such as shape, were absent. This behaviour contrasted with the predictions of a statistical inference model with no priors, showing the strong role that shape-bias plays in human feature selection. It also contrasted with CNNs that (i) preferred to categorise objects based on non-shape features, and (ii) increased reliance on these non-shape features as they became more predictive. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results suggest that shape-bias has a different source in humans and CNNs: while learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects. Any object consists of hundreds of visual features that can be used to recognise it. How do humans select which feature to use? Do we always choose features that are best at predicting the object? In a series of experiments using carefully designed stimuli, we find that humans frequently ignore many features that are clearly visible and highly predictive. This behaviour is statistically inefficient and we show that it contrasts with statistical inference models such as state-of-the-art neural networks. Unlike humans, these models learn to rely on the most predictive feature when trained on the same data. We argue that the reason underlying human behaviour may be a bias to look for features that are less hungry for cognitive resources and generalise better to novel instances. Models that incorporate cognitive constraints may not only allow us to better understand human vision but also help us develop machine learning models that are more robust to changes in incidental features of objects.
Collapse
Affiliation(s)
- Gaurav Malhotra
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- * E-mail:
| | - Marin Dujmović
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| | - Jeffrey S. Bowers
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
33
|
Tiedemann H, Morgenstern Y, Schmidt F, Fleming RW. One-shot generalization in humans revealed through a drawing task. eLife 2022; 11:75485. [PMID: 35536739 PMCID: PMC9090327 DOI: 10.7554/elife.75485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 05/01/2022] [Indexed: 11/13/2022] Open
Abstract
Humans have the amazing ability to learn new visual concepts from just a single exemplar. How we achieve this remains mysterious. State-of-the-art theories suggest observers rely on internal 'generative models', which not only describe observed objects, but can also synthesize novel variations. However, compelling evidence for generative models in human one-shot learning remains sparse. In most studies, participants merely compare candidate objects created by the experimenters, rather than generating their own ideas. Here, we overcame this key limitation by presenting participants with 2D 'Exemplar' shapes and asking them to draw their own 'Variations' belonging to the same class. The drawings reveal that participants inferred-and synthesized-genuine novel categories that were far more varied than mere copies. Yet, there was striking agreement between participants about which shape features were most distinctive, and these tended to be preserved in the drawn Variations. Indeed, swapping distinctive parts caused objects to swap apparent category. Our findings suggest that internal generative models are key to how humans generalize from single exemplars. When observers see a novel object for the first time, they identify its most distinctive features and infer a generative model of its shape, allowing them to mentally synthesize plausible variants.
Collapse
Affiliation(s)
- Henning Tiedemann
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Yaniv Morgenstern
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Laboratory of Experimental Psychology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Filipp Schmidt
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Giessen, Germany
| | - Roland W Fleming
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
34
|
Dai D, Li Y, Wang Y, Bao H, Wang G. Rethinking the image feature biases exhibited by deep convolutional neural network models in image recognition. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Dawei Dai
- College of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing China
| | - Yutang Li
- College of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing China
| | - Yuqi Wang
- College of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing China
| | - Huanan Bao
- College of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing China
| | - Guoyin Wang
- College of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing China
| |
Collapse
|
35
|
BTN: Neuroanatomical aligning between visual object tracking in deep neural network and smooth pursuit in brain. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Charles Leek E, Leonardis A, Heinke D. Deep neural networks and image classification in biological vision. Vision Res 2022; 197:108058. [PMID: 35487146 DOI: 10.1016/j.visres.2022.108058] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 10/18/2022]
Abstract
In this paper we consider recent advances in the use of deep convolutional neural networks to understanding biological vision. We focus on claims about the plausibility of feedforward deep convolutional neural networks (fDCNNs) as models of image classification in the biological system. Despite the putative similarity of these networks to some properties of the biological vision system, and the remarkable levels of performance accuracy of some fDCNNs, we argue that their plausibility as a framework for understanding image classification remains unclear. We highlight two key issues that we suggest are relevant to the evaluation of any form of DNN used to examine biological vision: (1) Network transparency under analysis - that is, the challenge of understanding what networks do, and how they do it. (2) Identifying appropriate benchmarks for comparing network performance and the biological system using both quantitative and qualitative performance measures. We show that there are important divergences between fDCNNs and biological vision that reflect fundamental differences in computational architectures, and representational structures, supporting image classification in these networks and the biological system.
Collapse
Affiliation(s)
| | | | - Dietmar Heinke
- School of Computer Science, University of Birmingham, UK
| |
Collapse
|
37
|
Zhou L, Yang A, Meng M, Zhou K. Emerged human-like facial expression representation in a deep convolutional neural network. SCIENCE ADVANCES 2022; 8:eabj4383. [PMID: 35319988 PMCID: PMC8942361 DOI: 10.1126/sciadv.abj4383] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 02/02/2022] [Indexed: 06/14/2023]
Abstract
Recent studies found that the deep convolutional neural networks (DCNNs) trained to recognize facial identities spontaneously learned features that support facial expression recognition, and vice versa. Here, we showed that the self-emerged expression-selective units in a VGG-Face trained for facial identification were tuned to distinct basic expressions and, importantly, exhibited hallmarks of human expression recognition (i.e., facial expression confusion and categorical perception). We then investigated whether the emergence of expression-selective units is attributed to either face-specific experience or domain-general processing by conducting the same analysis on a VGG-16 trained for object classification and an untrained VGG-Face without any visual experience, both having the identical architecture with the pretrained VGG-Face. Although similar expression-selective units were found in both DCNNs, they did not exhibit reliable human-like characteristics of facial expression perception. Together, these findings revealed the necessity of domain-specific visual experience of face identity for the development of facial expression perception, highlighting the contribution of nurture to form human-like facial expression perception.
Collapse
Affiliation(s)
- Liqin Zhou
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| | - Anmin Yang
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| | - Ming Meng
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou 510631, China
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Ke Zhou
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
38
|
Ferko KM, Blumenthal A, Martin CB, Proklova D, Minos AN, Saksida LM, Bussey TJ, Khan AR, Köhler S. Activity in perirhinal and entorhinal cortex predicts perceived visual similarities among category exemplars with highest precision. eLife 2022; 11:66884. [PMID: 35311645 PMCID: PMC9020819 DOI: 10.7554/elife.66884] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/17/2022] [Indexed: 01/22/2023] Open
Abstract
Vision neuroscience has made great strides in understanding the hierarchical organization of object representations along the ventral visual stream (VVS). How VVS representations capture fine-grained visual similarities between objects that observers subjectively perceive has received limited examination so far. In the current study, we addressed this question by focussing on perceived visual similarities among subordinate exemplars of real-world categories. We hypothesized that these perceived similarities are reflected with highest fidelity in neural activity patterns downstream from inferotemporal regions, namely in perirhinal (PrC) and anterolateral entorhinal cortex (alErC) in the medial temporal lobe. To address this issue with functional magnetic resonance imaging (fMRI), we administered a modified 1-back task that required discrimination between category exemplars as well as categorization. Further, we obtained observer-specific ratings of perceived visual similarities, which predicted behavioural discrimination performance during scanning. As anticipated, we found that activity patterns in PrC and alErC predicted the structure of perceived visual similarity relationships among category exemplars, including its observer-specific component, with higher precision than any other VVS region. Our findings provide new evidence that subjective aspects of object perception that rely on fine-grained visual differentiation are reflected with highest fidelity in the medial temporal lobe.
Collapse
Affiliation(s)
- Kayla M Ferko
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Robarts Research Institute Schulich School of Medicine and Dentistry, University of Western Ontario, London, Canada
| | - Anna Blumenthal
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Cervo Brain Research Center, University of Laval, Quebec, Canada
| | - Chris B Martin
- Department of Psychology, Florida State University, Tallahassee, United States
| | - Daria Proklova
- Brain and Mind Institute, University of Western Ontario, London, Canada
| | - Alexander N Minos
- Brain and Mind Institute, University of Western Ontario, London, Canada
| | - Lisa M Saksida
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Robarts Research Institute Schulich School of Medicine and Dentistry, University of Western Ontario, London, Canada.,Department of Physiology and Pharmacology, University of Western Ontario, London, Canada
| | - Timothy J Bussey
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Robarts Research Institute Schulich School of Medicine and Dentistry, University of Western Ontario, London, Canada.,Department of Physiology and Pharmacology, University of Western Ontario, London, Canada
| | - Ali R Khan
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Robarts Research Institute Schulich School of Medicine and Dentistry, University of Western Ontario, London, Canada.,School of Biomedical Engineering, University of Western Ontario, London, Canada.,Department of Medical Biophysics, University of Western Ontario, London, Canada
| | - Stefan Köhler
- Brain and Mind Institute, University of Western Ontario, London, Canada.,Department of Psychology, University of Western Ontario, London, Canada
| |
Collapse
|
39
|
Tamura H, Prokott KE, Fleming RW. Distinguishing mirror from glass: A "big data" approach to material perception. J Vis 2022; 22:4. [PMID: 35266961 PMCID: PMC8934559 DOI: 10.1167/jov.22.4.4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Distinguishing mirror from glass is a challenging visual inference, because both materials derive their appearance from their surroundings, yet we rarely experience difficulties in telling them apart. Very few studies have investigated how the visual system distinguishes reflections from refractions and to date, there is no image-computable model that emulates human judgments. Here we sought to develop a deep neural network that reproduces the patterns of visual judgments human observers make. To do this, we trained thousands of convolutional neural networks on more than 750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on "hand-engineered" image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. However, to assess how similar models are to humans, it is not sufficient to compare accuracy or correlation on random images. A good model should also predict the characteristic errors that humans make. We, therefore, painstakingly assembled a diagnostic image set for which humans make systematic errors, allowing us to isolate signatures of human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow (three-layer) networks predicted human judgments better than any other models we tested. This is the first image-computable model that emulates human errors and succeeds in distinguishing mirror from glass, and hints that mid-level visual processing might be particularly important for the task.
Collapse
Affiliation(s)
- Hideki Tamura
- Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan.,
| | - Konrad Eugen Prokott
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,
| | - Roland W Fleming
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany.,
| |
Collapse
|
40
|
Mei N, Santana R, Soto D. Informative neural representations of unseen contents during higher-order processing in human brains and deep artificial networks. Nat Hum Behav 2022; 6:720-731. [PMID: 35115676 DOI: 10.1038/s41562-021-01274-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 12/08/2021] [Indexed: 11/09/2022]
Abstract
A framework to pinpoint the scope of unconscious processing is critical to improve models of visual consciousness. Previous research observed brain signatures of unconscious processing in visual cortex, but these were not reliably identified. Further, whether unconscious contents are represented in high-level stages of the ventral visual stream and linked parieto-frontal areas remains unknown. Using a within-subject, high-precision functional magnetic resonance imaging approach, we show that unconscious contents can be decoded from multi-voxel patterns that are highly distributed alongside the ventral visual pathway and also involving parieto-frontal substrates. Classifiers trained with multi-voxel patterns of conscious items generalized to predict the unconscious counterparts, indicating that their neural representations overlap. These findings suggest revisions to models of consciousness such as the neuronal global workspace. We then provide a computational simulation of visual processing/representation without perceptual sensitivity by using deep neural networks performing a similar visual task. The work provides a framework for pinpointing the representation of unconscious knowledge across different task domains.
Collapse
Affiliation(s)
- Ning Mei
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Roberto Santana
- Computer Science and Artificial Intelligence Department, University of Basque Country, San Sebastian, Spain
| | - David Soto
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain. .,Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
| |
Collapse
|
41
|
Singer JJD, Seeliger K, Kietzmann TC, Hebart MN. From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction. J Vis 2022; 22:4. [PMID: 35129578 PMCID: PMC8822363 DOI: 10.1167/jov.22.2.4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Line drawings convey meaning with just a few strokes. Despite strong simplifications, humans can recognize objects depicted in such abstracted images without effort. To what degree do deep convolutional neural networks (CNNs) mirror this human ability to generalize to abstracted object images? While CNNs trained on natural images have been shown to exhibit poor classification performance on drawings, other work has demonstrated highly similar latent representations in the networks for abstracted and natural images. Here, we address these seemingly conflicting findings by analyzing the activation patterns of a CNN trained on natural images across a set of photographs, drawings, and sketches of the same objects and comparing them to human behavior. We find a highly similar representational structure across levels of visual abstraction in early and intermediate layers of the network. This similarity, however, does not translate to later stages in the network, resulting in low classification performance for drawings and sketches. We identified that texture bias in CNNs contributes to the dissimilar representational structure in late layers and the poor performance on drawings. Finally, by fine-tuning late network layers with object drawings, we show that performance can be largely restored, demonstrating the general utility of features learned on natural images in early and intermediate layers for the recognition of drawings. In conclusion, generalization to abstracted images, such as drawings, seems to be an emergent property of CNNs trained on natural images, which is, however, suppressed by domain-related biases that arise during later processing stages in the network.
Collapse
Affiliation(s)
- Johannes J D Singer
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Department of Psychology, Ludwig Maximilian University, Munich, Germany.,
| | - Katja Seeliger
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| | - Tim C Kietzmann
- Donders Institute for Brain, Cognition and Behavior, Nijmegen, The Netherlands.,
| | - Martin N Hebart
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,
| |
Collapse
|
42
|
Deep neural network models of sound localization reveal how perception is adapted to real-world environments. Nat Hum Behav 2022; 6:111-133. [PMID: 35087192 PMCID: PMC8830739 DOI: 10.1038/s41562-021-01244-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 10/29/2021] [Indexed: 11/15/2022]
Abstract
Mammals localize sounds using information from their two ears.
Localization in real-world conditions is challenging, as echoes provide
erroneous information, and noises mask parts of target sounds. To better
understand real-world localization we equipped a deep neural network with human
ears and trained it to localize sounds in a virtual environment. The resulting
model localized accurately in realistic conditions with noise and reverberation.
In simulated experiments, the model exhibited many features of human spatial
hearing: sensitivity to monaural spectral cues and interaural time and level
differences, integration across frequency, biases for sound onsets, and limits
on localization of concurrent sources. But when trained in unnatural
environments without either reverberation, noise, or natural sounds, these
performance characteristics deviated from those of humans. The results show how
biological hearing is adapted to the challenges of real-world environments and
illustrate how artificial neural networks can reveal the real-world constraints
that shape perception.
Collapse
|
43
|
Liu K, Cao G, Zhou F, Liu B, Duan J, Qiu G. Towards Disentangling Latent Space for Unsupervised Semantic Face Editing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1475-1489. [PMID: 35044915 DOI: 10.1109/tip.2022.3142527] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Facial attributes in StyleGAN generated images are entangled in the latent space which makes it very difficult to independently control a specific attribute without affecting the others. Supervised attribute editing requires annotated training data which is difficult to obtain and limits the editable attributes to those with labels. Therefore, unsupervised attribute editing in an disentangled latent space is key to performing neat and versatile semantic face editing. In this paper, we present a new technique termed Structure-Texture Independent Architecture with Weight Decomposition and Orthogonal Regularization (STIA-WO) to disentangle the latent space for unsupervised semantic face editing. By applying STIA-WO to GAN, we have developed a StyleGAN termed STGAN-WO which performs weight decomposition through utilizing the style vector to construct a fully controllable weight matrix to regulate image synthesis, and employs orthogonal regularization to ensure each entry of the style vector only controls one independent feature matrix. To further disentangle the facial attributes, STGAN-WO introduces a structure-texture independent architecture which utilizes two independently and identically distributed (i.i.d.) latent vectors to control the synthesis of the texture and structure components in a disentangled way. Unsupervised semantic editing is achieved by moving the latent code in the coarse layers along its orthogonal directions to change texture related attributes or changing the latent code in the fine layers to manipulate structure related ones. We present experimental results which show that our new STGAN-WO can achieve better attribute editing than state of the art methods.
Collapse
|
44
|
Sörensen LKA, Zambrano D, Slagter HA, Bohté SM, Scholte HS. Leveraging Spiking Deep Neural Networks to Understand the Neural Mechanisms Underlying Selective Attention. J Cogn Neurosci 2022; 34:655-674. [DOI: 10.1162/jocn_a_01819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Spatial attention enhances sensory processing of goal-relevant information and improves perceptual sensitivity. Yet, the specific neural mechanisms underlying the effects of spatial attention on performance are still contested. Here, we examine different attention mechanisms in spiking deep convolutional neural networks. We directly contrast effects of precision (internal noise suppression) and two different gain modulation mechanisms on performance on a visual search task with complex real-world images. Unlike standard artificial neurons, biological neurons have saturating activation functions, permitting implementation of attentional gain as gain on a neuron's input or on its outgoing connection. We show that modulating the connection is most effective in selectively enhancing information processing by redistributing spiking activity and by introducing additional task-relevant information, as shown by representational similarity analyses. Precision only produced minor attentional effects in performance. Our results, which mirror empirical findings, show that it is possible to adjudicate between attention mechanisms using more biologically realistic models and natural stimuli.
Collapse
Affiliation(s)
| | - Davide Zambrano
- Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- École Polytechnique Fédérale de Lausanne, Switzerland
| | | | - Sander M. Bohté
- University of Amsterdam, The Netherlands
- Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Rijksuniversiteit Groningen, The Netherlands
| | | |
Collapse
|
45
|
Wammes J, Norman KA, Turk-Browne N. Increasing stimulus similarity drives nonmonotonic representational change in hippocampus. eLife 2022; 11:e68344. [PMID: 34989336 PMCID: PMC8735866 DOI: 10.7554/elife.68344] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/09/2021] [Indexed: 12/16/2022] Open
Abstract
Studies of hippocampal learning have obtained seemingly contradictory results, with manipulations that increase coactivation of memories sometimes leading to differentiation of these memories, but sometimes not. These results could potentially be reconciled using the nonmonotonic plasticity hypothesis, which posits that representational change (memories moving apart or together) is a U-shaped function of the coactivation of these memories during learning. Testing this hypothesis requires manipulating coactivation over a wide enough range to reveal the full U-shape. To accomplish this, we used a novel neural network image synthesis procedure to create pairs of stimuli that varied parametrically in their similarity in high-level visual regions that provide input to the hippocampus. Sequences of these pairs were shown to human participants during high-resolution fMRI. As predicted, learning changed the representations of paired images in the dentate gyrus as a U-shaped function of image similarity, with neural differentiation occurring only for moderately similar images.
Collapse
Affiliation(s)
- Jeffrey Wammes
- Department of Psychology, Yale UniversityNew HavenUnited States
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Kenneth A Norman
- Department of Psychology, Princeton UniversityPrincetonUnited States
- Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| | | |
Collapse
|
46
|
Baker N, Garrigan P, Phillips A, Kellman PJ. Configural relations in humans and deep convolutional neural networks. Front Artif Intell 2022; 5:961595. [PMID: 36937367 PMCID: PMC10014814 DOI: 10.3389/frai.2022.961595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Accepted: 12/23/2022] [Indexed: 03/05/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) have attracted considerable interest as useful devices and as possible windows into understanding perception and cognition in biological systems. In earlier work, we showed that DCNNs differ dramatically from human perceivers in that they have no sensitivity to global object shape. Here, we investigated whether those findings are symptomatic of broader limitations of DCNNs regarding the use of relations. We tested learning and generalization of DCNNs (AlexNet and ResNet-50) for several relations involving objects. One involved classifying two shapes in an otherwise empty field as same or different. Another involved enclosure. Every display contained a closed figure among contour noise fragments and one dot; correct responding depended on whether the dot was inside or outside the figure. The third relation we tested involved a classification that depended on which of two polygons had more sides. One polygon always contained a dot, and correct classification of each display depended on whether the polygon with the dot had a greater number of sides. We used DCNNs that had been trained on the ImageNet database, and we used both restricted and unrestricted transfer learning (connection weights at all layers could change with training). For the same-different experiment, there was little restricted transfer learning (82.2%). Generalization tests showed near chance performance for new shapes. Results for enclosure were at chance for restricted transfer learning and somewhat better for unrestricted (74%). Generalization with two new kinds of shapes showed reduced but above-chance performance (≈66%). Follow-up studies indicated that the networks did not access the enclosure relation in their responses. For the relation of more or fewer sides of polygons, DCNNs showed successful learning with polygons having 3-5 sides under unrestricted transfer learning, but showed chance performance in generalization tests with polygons having 6-10 sides. Experiments with human observers showed learning from relatively few examples of all of the relations tested and complete generalization of relational learning to new stimuli. These results using several different relations suggest that DCNNs have crucial limitations that derive from their lack of computations involving abstraction and relational processing of the sort that are fundamental in human perception.
Collapse
Affiliation(s)
- Nicholas Baker
- Department of Psychology, Loyola University Chicago, Chicago, IL, United States
| | - Patrick Garrigan
- Department of Psychology, Saint Joseph's University, Philadelphia, PA, United States
| | - Austin Phillips
- UCLA Human Perception Laboratory, Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - Philip J. Kellman
- UCLA Human Perception Laboratory, Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
- *Correspondence: Philip J. Kellman
| |
Collapse
|
47
|
Biological convolutions improve DNN robustness to noise and generalisation. Neural Netw 2021; 148:96-110. [PMID: 35114495 DOI: 10.1016/j.neunet.2021.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 11/11/2021] [Accepted: 12/07/2021] [Indexed: 11/19/2022]
Abstract
Deep Convolutional Neural Networks (DNNs) have achieved superhuman accuracy on standard image classification benchmarks. Their success has reignited significant interest in their use as models of the primate visual system, bolstered by claims of their architectural and representational similarities. However, closer scrutiny of these models suggests that they rely on various forms of shortcut learning to achieve their impressive performance, such as using texture rather than shape information. Such superficial solutions to image recognition have been shown to make DNNs brittle in the face of more challenging tests such as noise-perturbed or out-of-distribution images, casting doubt on their similarity to their biological counterparts. In the present work, we demonstrate that adding fixed biological filter banks, in particular banks of Gabor filters, helps to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerance to noise. Importantly, they also gained around 20-35% improved accuracy when generalising to our novel out-of-distribution test image sets over standard end-to-end trained architectures. We take these findings to suggest that these properties of the primate visual system should be incorporated into DNNs to make them more able to cope with real-world vision and better capture some of the more impressive aspects of human visual perception such as generalisation.
Collapse
|
48
|
Daube C, Xu T, Zhan J, Webb A, Ince RA, Garrod OG, Schyns PG. Grounding deep neural network predictions of human categorization behavior in understandable functional features: The case of face identity. PATTERNS (NEW YORK, N.Y.) 2021; 2:100348. [PMID: 34693374 PMCID: PMC8515012 DOI: 10.1016/j.patter.2021.100348] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/30/2020] [Accepted: 08/20/2021] [Indexed: 01/24/2023]
Abstract
Deep neural networks (DNNs) can resolve real-world categorization tasks with apparent human-level performance. However, true equivalence of behavioral performance between humans and their DNN models requires that their internal mechanisms process equivalent features of the stimulus. To develop such feature equivalence, our methodology leveraged an interpretable and experimentally controlled generative model of the stimuli (realistic three-dimensional textured faces). Humans rated the similarity of randomly generated faces to four familiar identities. We predicted these similarity ratings from the activations of five DNNs trained with different optimization objectives. Using information theoretic redundancy, reverse correlation, and the testing of generalization gradients, we show that DNN predictions of human behavior improve because their shape and texture features overlap with those that subsume human behavior. Thus, we must equate the functional features that subsume the behavioral performances of the brain and its models before comparing where, when, and how these features are processed.
Collapse
Affiliation(s)
- Christoph Daube
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Tian Xu
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, England, UK
| | - Jiayu Zhan
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Andrew Webb
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Robin A.A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Oliver G.B. Garrod
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Philippe G. Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| |
Collapse
|
49
|
Sun ED, Dekel R. ImageNet-trained deep neural networks exhibit illusion-like response to the Scintillating grid. J Vis 2021; 21:15. [PMID: 34677575 PMCID: PMC8543405 DOI: 10.1167/jov.21.11.15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Deep neural network (DNN) models for computer vision are capable of human-level object recognition. Consequently, similarities between DNN and human vision are of interest. Here, we characterize DNN representations of Scintillating grid visual illusion images in which white disks are perceived to be partially black. Specifically, we use VGG-19 and ResNet-101 DNN models that were trained for image classification and consider the representational dissimilarity (\(L^1\) distance in the penultimate layer) between pairs of images: one with white Scintillating grid disks and the other with disks of decreasing luminance levels. Results showed a nonmonotonic relation, such that decreasing disk luminance led to an increase and subsequently a decrease in representational dissimilarity. That is, the Scintillating grid image with white disks was closer, in terms of the representation, to images with black disks than images with gray disks. In control nonillusion images, such nonmonotonicity was rare. These results suggest that nonmonotonicity in a deep computational representation is a potential test for illusion-like response geometry in DNN models.
Collapse
Affiliation(s)
- Eric D Sun
- Mather House, Harvard University, Cambridge, MA, USA.,
| | - Ron Dekel
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, PA, Israel.,
| |
Collapse
|
50
|
Abudarham N, Grosbard I, Yovel G. Face Recognition Depends on Specialized Mechanisms Tuned to View-Invariant Facial Features: Insights from Deep Neural Networks Optimized for Face or Object Recognition. Cogn Sci 2021; 45:e13031. [PMID: 34490907 DOI: 10.1111/cogs.13031] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/10/2021] [Accepted: 07/11/2021] [Indexed: 11/30/2022]
Abstract
Face recognition is a computationally challenging classification task. Deep convolutional neural networks (DCNNs) are brain-inspired algorithms that have recently reached human-level performance in face and object recognition. However, it is not clear to what extent DCNNs generate a human-like representation of face identity. We have recently revealed a subset of facial features that are used by humans for face recognition. This enables us now to ask whether DCNNs rely on the same facial information and whether this human-like representation depends on a system that is optimized for face identification. In the current study, we examined the representation of DCNNs of faces that differ in features that are critical or non-critical for human face recognition. Our findings show that DCNNs optimized for face identification are tuned to the same facial features used by humans for face recognition. Sensitivity to these features was highly correlated with performance of the DCNN on a benchmark face recognition task. Moreover, sensitivity to these features and a view-invariant face representation emerged at higher layers of a DCNN optimized for face recognition but not for object recognition. This finding parallels the division to a face and an object system in high-level visual cortex. Taken together, these findings validate human perceptual models of face recognition, enable us to use DCNNs to test predictions about human face and object recognition as well as contribute to the interpretability of DCNNs.
Collapse
Affiliation(s)
| | | | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University.,Sagol School of Neuroscience, Tel Aviv University
| |
Collapse
|