1
|
Gupta P, Dobs K. Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition. PLoS Comput Biol 2025; 21:e1012751. [PMID: 39869654 PMCID: PMC11790231 DOI: 10.1371/journal.pcbi.1012751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/03/2025] [Accepted: 12/24/2024] [Indexed: 01/29/2025] Open
Abstract
The human visual system possesses a remarkable ability to detect and process faces across diverse contexts, including the phenomenon of face pareidolia--seeing faces in inanimate objects. Despite extensive research, it remains unclear why the visual system employs such broadly tuned face detection capabilities. We hypothesized that face pareidolia results from the visual system's optimization for recognizing both faces and objects. To test this hypothesis, we used task-optimized deep convolutional neural networks (CNNs) and evaluated their alignment with human behavioral signatures and neural responses, measured via magnetoencephalography (MEG), related to pareidolia processing. Specifically, we trained CNNs on tasks involving combinations of face identification, face detection, object categorization, and object detection. Using representational similarity analysis, we found that CNNs that included object categorization in their training tasks represented pareidolia faces, real faces, and matched objects more similarly to neural responses than those that did not. Although these CNNs showed similar overall alignment with neural data, a closer examination of their internal representations revealed that specific training tasks had distinct effects on how pareidolia faces were represented across layers. Finally, interpretability methods revealed that only a CNN trained for both face identification and object categorization relied on face-like features-such as 'eyes'-to classify pareidolia stimuli as faces, mirroring findings in human perception. Our results suggest that human-like face pareidolia may emerge from the visual system's optimization for face identification within the context of generalized object categorization.
Collapse
Affiliation(s)
- Pranjul Gupta
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Katharina Dobs
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain, and Behavior, Universities of Marburg, Giessen and Darmstadt, Marburg, Germany
| |
Collapse
|
2
|
Mathis MW, Perez Rotondo A, Chang EF, Tolias AS, Mathis A. Decoding the brain: From neural representations to mechanistic models. Cell 2024; 187:5814-5832. [PMID: 39423801 PMCID: PMC11637322 DOI: 10.1016/j.cell.2024.08.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/29/2024] [Accepted: 08/26/2024] [Indexed: 10/21/2024]
Abstract
A central principle in neuroscience is that neurons within the brain act in concert to produce perception, cognition, and adaptive behavior. Neurons are organized into specialized brain areas, dedicated to different functions to varying extents, and their function relies on distributed circuits to continuously encode relevant environmental and body-state features, enabling other areas to decode (interpret) these representations for computing meaningful decisions and executing precise movements. Thus, the distributed brain can be thought of as a series of computations that act to encode and decode information. In this perspective, we detail important concepts of neural encoding and decoding and highlight the mathematical tools used to measure them, including deep learning methods. We provide case studies where decoding concepts enable foundational and translational science in motor, visual, and language processing.
Collapse
Affiliation(s)
- Mackenzie Weygandt Mathis
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland.
| | - Adriana Perez Rotondo
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland
| | - Edward F Chang
- Department of Neurological Surgery, UCSF, San Francisco, CA, USA
| | - Andreas S Tolias
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, CA, USA; Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Stanford BioX, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Alexander Mathis
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland
| |
Collapse
|
3
|
Chen Y, Beech P, Yin Z, Jia S, Zhang J, Yu Z, Liu JK. Decoding dynamic visual scenes across the brain hierarchy. PLoS Comput Biol 2024; 20:e1012297. [PMID: 39093861 PMCID: PMC11324145 DOI: 10.1371/journal.pcbi.1012297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 08/14/2024] [Accepted: 07/03/2024] [Indexed: 08/04/2024] Open
Abstract
Understanding the computational mechanisms that underlie the encoding and decoding of environmental stimuli is a crucial investigation in neuroscience. Central to this pursuit is the exploration of how the brain represents visual information across its hierarchical architecture. A prominent challenge resides in discerning the neural underpinnings of the processing of dynamic natural visual scenes. Although considerable research efforts have been made to characterize individual components of the visual pathway, a systematic understanding of the distinctive neural coding associated with visual stimuli, as they traverse this hierarchical landscape, remains elusive. In this study, we leverage the comprehensive Allen Visual Coding-Neuropixels dataset and utilize the capabilities of deep learning neural network models to study neural coding in response to dynamic natural visual scenes across an expansive array of brain regions. Our study reveals that our decoding model adeptly deciphers visual scenes from neural spiking patterns exhibited within each distinct brain area. A compelling observation arises from the comparative analysis of decoding performances, which manifests as a notable encoding proficiency within the visual cortex and subcortical nuclei, in contrast to a relatively reduced encoding activity within hippocampal neurons. Strikingly, our results unveil a robust correlation between our decoding metrics and well-established anatomical and functional hierarchy indexes. These findings corroborate existing knowledge in visual coding related to artificial visual stimuli and illuminate the functional role of these deeper brain regions using dynamic stimuli. Consequently, our results suggest a novel perspective on the utility of decoding neural network models as a metric for quantifying the encoding quality of dynamic natural visual scenes represented by neural responses, thereby advancing our comprehension of visual coding within the complex hierarchy of the brain.
Collapse
Affiliation(s)
- Ye Chen
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Peter Beech
- School of Computing, University of Leeds, Leeds, United Kingdom
| | - Ziwei Yin
- School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Shanshan Jia
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jiayi Zhang
- Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science and Institute for Medical and Engineering Innovation, Eye & ENT Hospital, Fudan University, Shanghai, China
| | - Zhaofei Yu
- School of Computer Science, Peking University, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Jian K. Liu
- School of Computing, University of Leeds, Leeds, United Kingdom
- School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
4
|
Rudelt L, González Marx D, Spitzner FP, Cramer B, Zierenberg J, Priesemann V. Signatures of hierarchical temporal processing in the mouse visual system. PLoS Comput Biol 2024; 20:e1012355. [PMID: 39173067 PMCID: PMC11373856 DOI: 10.1371/journal.pcbi.1012355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 09/04/2024] [Accepted: 07/23/2024] [Indexed: 08/24/2024] Open
Abstract
A core challenge for the brain is to process information across various timescales. This could be achieved by a hierarchical organization of temporal processing through intrinsic mechanisms (e.g., recurrent coupling or adaptation), but recent evidence from spike recordings of the rodent visual system seems to conflict with this hypothesis. Here, we used an optimized information-theoretic and classical autocorrelation analysis to show that information- and correlation timescales of spiking activity increase along the anatomical hierarchy of the mouse visual system under visual stimulation, while information-theoretic predictability decreases. Moreover, intrinsic timescales for spontaneous activity displayed a similar hierarchy, whereas the hierarchy of predictability was stimulus-dependent. We could reproduce these observations in a basic recurrent network model with correlated sensory input. Our findings suggest that the rodent visual system employs intrinsic mechanisms to achieve longer integration for higher cortical areas, while simultaneously reducing predictability for an efficient neural code.
Collapse
Affiliation(s)
- Lucas Rudelt
- Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany
- Institute for the Dynamics of Complex Systems, University of Göttingen, Göttingen, Germany
| | - Daniel González Marx
- Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany
- Institute for the Dynamics of Complex Systems, University of Göttingen, Göttingen, Germany
| | - F Paul Spitzner
- Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany
- Institute for the Dynamics of Complex Systems, University of Göttingen, Göttingen, Germany
| | - Benjamin Cramer
- Kirchhoff-Institute for Physics, Heidelberg University, Heidelberg, Germany
| | - Johannes Zierenberg
- Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany
- Institute for the Dynamics of Complex Systems, University of Göttingen, Göttingen, Germany
| | - Viola Priesemann
- Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany
- Institute for the Dynamics of Complex Systems, University of Göttingen, Göttingen, Germany
- Bernstein Center for Computational Neuroscience (BCCN), Göttingen, Germany
| |
Collapse
|
5
|
Parthasarathy N, Hénaff OJ, Simoncelli EP. Layerwise complexity-matched learning yields an improved model of cortical area V2. ARXIV 2024:arXiv:2312.11436v3. [PMID: 39070038 PMCID: PMC11275700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git.
Collapse
Affiliation(s)
- Nikhil Parthasarathy
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| |
Collapse
|
6
|
Matteucci G, Piasini E, Zoccolan D. Unsupervised learning of mid-level visual representations. Curr Opin Neurobiol 2024; 84:102834. [PMID: 38154417 DOI: 10.1016/j.conb.2023.102834] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/30/2023]
Abstract
Recently, a confluence between trends in neuroscience and machine learning has brought a renewed focus on unsupervised learning, where sensory processing systems learn to exploit the statistical structure of their inputs in the absence of explicit training targets or rewards. Sophisticated experimental approaches have enabled the investigation of the influence of sensory experience on neural self-organization and its synaptic bases. Meanwhile, novel algorithms for unsupervised and self-supervised learning have become increasingly popular both as inspiration for theories of the brain, particularly for the function of intermediate visual cortical areas, and as building blocks of real-world learning machines. Here we review some of these recent developments, placing them in historical context and highlighting some research lines that promise exciting breakthroughs in the near future.
Collapse
Affiliation(s)
- Giulio Matteucci
- Department of Basic Neurosciences, University of Geneva, Geneva, 1206, Switzerland. https://twitter.com/giulio_matt
| | - Eugenio Piasini
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy
| | - Davide Zoccolan
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy.
| |
Collapse
|
7
|
Dyballa L, Rudzite AM, Hoseini MS, Thapa M, Stryker MP, Field GD, Zucker SW. Population encoding of stimulus features along the visual hierarchy. Proc Natl Acad Sci U S A 2024; 121:e2317773121. [PMID: 38227668 PMCID: PMC10823231 DOI: 10.1073/pnas.2317773121] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/13/2023] [Indexed: 01/18/2024] Open
Abstract
The retina and primary visual cortex (V1) both exhibit diverse neural populations sensitive to diverse visual features. Yet it remains unclear how neural populations in each area partition stimulus space to span these features. One possibility is that neural populations are organized into discrete groups of neurons, with each group signaling a particular constellation of features. Alternatively, neurons could be continuously distributed across feature-encoding space. To distinguish these possibilities, we presented a battery of visual stimuli to the mouse retina and V1 while measuring neural responses with multi-electrode arrays. Using machine learning approaches, we developed a manifold embedding technique that captures how neural populations partition feature space and how visual responses correlate with physiological and anatomical properties of individual neurons. We show that retinal populations discretely encode features, while V1 populations provide a more continuous representation. Applying the same analysis approach to convolutional neural networks that model visual processing, we demonstrate that they partition features much more similarly to the retina, indicating they are more like big retinas than little brains.
Collapse
Affiliation(s)
- Luciano Dyballa
- Department of Computer Science, Yale University, New Haven, CT06511
| | | | - Mahmood S. Hoseini
- Department of Physiology, University of California, San Francisco, CA94143
| | - Mishek Thapa
- Department of Neurobiology, Duke University, Durham, NC27708
- Department of Ophthalmology, David Geffen School of Medicine, Stein Eye Institute, University of California, Los Angeles, CA90095
| | - Michael P. Stryker
- Department of Physiology, University of California, San Francisco, CA94143
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, CA94143
| | - Greg D. Field
- Department of Neurobiology, Duke University, Durham, NC27708
- Department of Ophthalmology, David Geffen School of Medicine, Stein Eye Institute, University of California, Los Angeles, CA90095
| | - Steven W. Zucker
- Department of Computer Science, Yale University, New Haven, CT06511
- Department of Biomedical Engineering, Yale University, New Haven, CT06511
| |
Collapse
|
8
|
Schnell AE, Leemans M, Vinken K, Op de Beeck H. A computationally informed comparison between the strategies of rodents and humans in visual object recognition. eLife 2023; 12:RP87719. [PMID: 38079481 PMCID: PMC10712954 DOI: 10.7554/elife.87719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In this study, we created a large multidimensional stimulus set and designed a visual discrimination task partially based upon modelling with a convolutional deep neural network (CNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 45). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a CNN. A direct comparison with CNN representations and visual feature analyses revealed that rat performance was best captured by late convolutional layers and partially by visual features such as brightness and pixel-level similarity, while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
Collapse
Affiliation(s)
| | - Maarten Leemans
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| | - Kasper Vinken
- Department of Neurobiology, Harvard Medical SchoolBostonUnited States
| | - Hans Op de Beeck
- Department of Brain and Cognition & Leuven Brain InstituteLeuvenBelgium
| |
Collapse
|
9
|
Xu A, Hou Y, Niell CM, Beyeler M. Multimodal Deep Learning Model Unveils Behavioral Dynamics of V1 Activity in Freely Moving Mice. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:15341-15357. [PMID: 39005944 PMCID: PMC11242920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Despite their immense success as a model of macaque visual cortex, deep convolutional neural networks (CNNs) have struggled to predict activity in visual cortex of the mouse, which is thought to be strongly dependent on the animal's behavioral state. Furthermore, most computational models focus on predicting neural responses to static images presented under head fixation, which are dramatically different from the dynamic, continuous visual stimuli that arise during movement in the real world. Consequently, it is still unknown how natural visual input and different behavioral variables may integrate over time to generate responses in primary visual cortex (V1). To address this, we introduce a multimodal recurrent neural network that integrates gaze-contingent visual input with behavioral and temporal dynamics to explain V1 activity in freely moving mice. We show that the model achieves state-of-the-art predictions of V1 activity during free exploration and demonstrate the importance of each component in an extensive ablation study. Analyzing our model using maximally activating stimuli and saliency maps, we reveal new insights into cortical function, including the prevalence of mixed selectivity for behavioral variables in mouse V1. In summary, our model offers a comprehensive deep-learning framework for exploring the computational principles underlying V1 neurons in freely-moving animals engaged in natural behavior.
Collapse
Affiliation(s)
- Aiwen Xu
- Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93117
| | - Yuchen Hou
- Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93117
| | - Cristopher M Niell
- Department of Biology, Institute of Neuroscience University of Oregon Eugene, OR 97403
| | - Michael Beyeler
- Department of Computer Science Department of Psychological & Brain Sciences University of California, Santa Barbara Santa Barbara, CA 93117
| |
Collapse
|
10
|
Singer Y, Taylor L, Willmore BDB, King AJ, Harper NS. Hierarchical temporal prediction captures motion processing along the visual pathway. eLife 2023; 12:e52599. [PMID: 37844199 PMCID: PMC10629830 DOI: 10.7554/elife.52599] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open
Abstract
Visual neurons respond selectively to features that become increasingly complex from the eyes to the cortex. Retinal neurons prefer flashing spots of light, primary visual cortical (V1) neurons prefer moving bars, and those in higher cortical areas favor complex features like moving textures. Previously, we showed that V1 simple cell tuning can be accounted for by a basic model implementing temporal prediction - representing features that predict future sensory input from past input (Singer et al., 2018). Here, we show that hierarchical application of temporal prediction can capture how tuning properties change across at least two levels of the visual system. This suggests that the brain does not efficiently represent all incoming information; instead, it selectively represents sensory inputs that help in predicting the future. When applied hierarchically, temporal prediction extracts time-varying features that depend on increasingly high-level statistics of the sensory input.
Collapse
Affiliation(s)
- Yosef Singer
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| | - Luke Taylor
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| | - Ben DB Willmore
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| | - Andrew J King
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| | - Nicol S Harper
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| |
Collapse
|