101
|
Ranjbar A, Suratgar AA, Menhaj MB, Abbasi-Asl R. Structurally-constrained encoding framework using a multi-voxel reduced-rank latent model for human natural vision. J Neural Eng 2024; 21:046027. [PMID: 38986451 DOI: 10.1088/1741-2552/ad6184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 07/10/2024] [Indexed: 07/12/2024]
Abstract
Objective. Voxel-wise visual encoding models based on convolutional neural networks (CNNs) have emerged as one of the prominent predictive tools of human brain activity via functional magnetic resonance imaging signals. While CNN-based models imitate the hierarchical structure of the human visual cortex to generate explainable features in response to natural visual stimuli, there is still a need for a brain-inspired model to predict brain responses accurately based on biomedical data.Approach. To bridge this gap, we propose a response prediction module called the Structurally Constrained Multi-Output (SCMO) module to include homologous correlations that arise between a group of voxels in a cortical region and predict more accurate responses.Main results. This module employs all the responses across a visual area to predict individual voxel-wise BOLD responses and therefore accounts for the population activity and collective behavior of voxels. Such a module can determine the relationships within each visual region by creating a structure matrix that represents the underlying voxel-to-voxel interactions. Moreover, since each response module in visual encoding tasks relies on the image features, we conducted experiments using two different feature extraction modules to assess the predictive performance of our proposed module. Specifically, we employed a recurrent CNN that integrates both feedforward and recurrent interactions, as well as the popular AlexNet model that utilizes feedforward connections.Significance.We demonstrate that the proposed framework provides a reliable predictive ability to generate brain responses across multiple areas, outperforming benchmark models in terms of stability and coherency of features.
Collapse
Affiliation(s)
- Amin Ranjbar
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Amir Abolfazl Suratgar
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Mohammad Bagher Menhaj
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Reza Abbasi-Asl
- Department of Neurology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States of America
- UCSF Weill Institute for Neurosciences, San Francisco, CA, United States of America
| |
Collapse
|
102
|
Lahner B, Dwivedi K, Iamshchinina P, Graumann M, Lascelles A, Roig G, Gifford AT, Pan B, Jin S, Ratan Murty NA, Kay K, Oliva A, Cichy R. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat Commun 2024; 15:6241. [PMID: 39048577 PMCID: PMC11269733 DOI: 10.1038/s41467-024-50310-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/04/2024] [Indexed: 07/27/2024] Open
Abstract
Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
| | - Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Polina Iamshchinina
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Monika Graumann
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Alex Lascelles
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
- The Hessian Center for AI (hessian.AI), Darmstadt, Germany
| | | | - Bowen Pan
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - SouYoung Jin
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - N Apurva Ratan Murty
- Department of Brain and Cognitive Science, MIT, Cambridge, MA, USA
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Radoslaw Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
103
|
Parthasarathy N, Hénaff OJ, Simoncelli EP. Layerwise complexity-matched learning yields an improved model of cortical area V2. ARXIV 2024:arXiv:2312.11436v3. [PMID: 39070038 PMCID: PMC11275700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git.
Collapse
Affiliation(s)
- Nikhil Parthasarathy
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| |
Collapse
|
104
|
Margalit E, Lee H, Finzi D, DiCarlo JJ, Grill-Spector K, Yamins DLK. A unifying framework for functional organization in early and higher ventral visual cortex. Neuron 2024; 112:2435-2451.e7. [PMID: 38733985 PMCID: PMC11257790 DOI: 10.1016/j.neuron.2024.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/08/2023] [Accepted: 04/15/2024] [Indexed: 05/13/2024]
Abstract
A key feature of cortical systems is functional organization: the arrangement of functionally distinct neurons in characteristic spatial patterns. However, the principles underlying the emergence of functional organization in the cortex are poorly understood. Here, we develop the topographic deep artificial neural network (TDANN), the first model to predict several aspects of the functional organization of multiple cortical areas in the primate visual system. We analyze the factors driving the TDANN's success and find that it balances two objectives: learning a task-general sensory representation and maximizing the spatial smoothness of responses according to a metric that scales with cortical surface area. In turn, the representations learned by the TDANN are more brain-like than in spatially unconstrained models. Finally, we provide evidence that the TDANN's functional organization balances performance with between-area connection length. Our results offer a unified principle for understanding the functional organization of the primate ventral visual system.
Collapse
Affiliation(s)
- Eshed Margalit
- Neurosciences Graduate Program, Stanford University, Stanford, CA 94305, USA.
| | - Hyodong Lee
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dawn Finzi
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kalanit Grill-Spector
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA
| | - Daniel L K Yamins
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
105
|
Idrees S, Manookin MB, Rieke F, Field GD, Zylberberg J. Biophysical neural adaptation mechanisms enable artificial neural networks to capture dynamic retinal computation. Nat Commun 2024; 15:5957. [PMID: 39009568 PMCID: PMC11251147 DOI: 10.1038/s41467-024-50114-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 06/28/2024] [Indexed: 07/17/2024] Open
Abstract
Adaptation is a universal aspect of neural systems that changes circuit computations to match prevailing inputs. These changes facilitate efficient encoding of sensory inputs while avoiding saturation. Conventional artificial neural networks (ANNs) have limited adaptive capabilities, hindering their ability to reliably predict neural output under dynamic input conditions. Can embedding neural adaptive mechanisms in ANNs improve their performance? To answer this question, we develop a new deep learning model of the retina that incorporates the biophysics of photoreceptor adaptation at the front-end of conventional convolutional neural networks (CNNs). These conventional CNNs build on 'Deep Retina,' a previously developed model of retinal ganglion cell (RGC) activity. CNNs that include this new photoreceptor layer outperform conventional CNN models at predicting male and female primate and rat RGC responses to naturalistic stimuli that include dynamic local intensity changes and large changes in the ambient illumination. These improved predictions result directly from adaptation within the phototransduction cascade. This research underscores the potential of embedding models of neural adaptation in ANNs and using them to determine how neural circuits manage the complexities of encoding natural inputs that are dynamic and span a large range of light levels.
Collapse
Affiliation(s)
- Saad Idrees
- Department of Physics and Astronomy, York University, Toronto, ON, Canada.
- Centre for Vision Research, York University, Toronto, ON, Canada.
| | | | - Fred Rieke
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
| | - Greg D Field
- Stein Eye Institute, Department of Ophthalmology, University of California, Los Angeles, CA, USA
| | - Joel Zylberberg
- Department of Physics and Astronomy, York University, Toronto, ON, Canada.
- Centre for Vision Research, York University, Toronto, ON, Canada.
- Learning in Machines and Brains Program, Canadian Institute for Advanced Research, Toronto, ON, Canada.
| |
Collapse
|
106
|
Quaia C, Krauzlis RJ. Object recognition in primates: what can early visual areas contribute? Front Behav Neurosci 2024; 18:1425496. [PMID: 39070778 PMCID: PMC11272660 DOI: 10.3389/fnbeh.2024.1425496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 07/01/2024] [Indexed: 07/30/2024] Open
Abstract
Introduction If neuroscientists were asked which brain area is responsible for object recognition in primates, most would probably answer infero-temporal (IT) cortex. While IT is likely responsible for fine discriminations, and it is accordingly dominated by foveal visual inputs, there is more to object recognition than fine discrimination. Importantly, foveation of an object of interest usually requires recognizing, with reasonable confidence, its presence in the periphery. Arguably, IT plays a secondary role in such peripheral recognition, and other visual areas might instead be more critical. Methods To investigate how signals carried by early visual processing areas (such as LGN and V1) could be used for object recognition in the periphery, we focused here on the task of distinguishing faces from non-faces. We tested how sensitive various models were to nuisance parameters, such as changes in scale and orientation of the image, and the type of image background. Results We found that a model of V1 simple or complex cells could provide quite reliable information, resulting in performance better than 80% in realistic scenarios. An LGN model performed considerably worse. Discussion Because peripheral recognition is both crucial to enable fine recognition (by bringing an object of interest on the fovea), and probably sufficient to account for a considerable fraction of our daily recognition-guided behavior, we think that the current focus on area IT and foveal processing is too narrow. We propose that rather than a hierarchical system with IT-like properties as its primary aim, object recognition should be seen as a parallel process, with high-accuracy foveal modules operating in parallel with lower-accuracy and faster modules that can operate across the visual field.
Collapse
Affiliation(s)
- Christian Quaia
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD, United States
| | | |
Collapse
|
107
|
Turishcheva P, Fahey PG, Vystrčilová M, Hansel L, Froebe R, Ponder K, Qiu Y, Willeke KF, Bashiri M, Baikulov R, Zhu Y, Ma L, Yu S, Huang T, Li BM, Wulf WD, Kudryashova N, Hennig MH, Rochefort NL, Onken A, Wang E, Ding Z, Tolias AS, Sinz FH, Ecker AS. Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos. ARXIV 2024:arXiv:2407.09100v1. [PMID: 39040641 PMCID: PMC11261979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different model on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the SENSORIUM 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice. This dataset includes responses from 78,853 neurons to 2 hours of dynamic stimuli per neuron, together with the behavioral measurements such as running speed, pupil dilation, and eye movements. The competition ranked models in two tracks based on predictive performance for neuronal responses on a held-out test set: one focusing on predicting in-domain natural stimuli and another on out-of-distribution (OOD) stimuli to assess model generalization. As part of the NeurIPS 2023 competition track, we received more than 160 model submissions from 22 teams. Several new architectures for predictive models were proposed, and the winning teams improved the previous state-of-the-art model by 50%. Access to the dataset as well as the benchmarking infrastructure will remain online at www.sensorium-competition.net.
Collapse
Affiliation(s)
- Polina Turishcheva
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Paul G. Fahey
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Michaela Vystrčilová
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Laura Hansel
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Rachel Froebe
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Kayla Ponder
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Yongrong Qiu
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Konstantin F. Willeke
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | - Mohammad Bashiri
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | | | - Yu Zhu
- Institute of Automation, Chinese Academy of Sciences, China
- Beijing Academy of Artificial Intelligence, China
| | - Lei Ma
- Beijing Academy of Artificial Intelligence, China
| | - Shan Yu
- Institute of Automation, Chinese Academy of Sciences, China
| | - Tiejun Huang
- Beijing Academy of Artificial Intelligence, China
| | - Bryan M. Li
- The Alan Turing Institute, UK
- School of Informatics, University of Edinburgh, UK
| | - Wolf De Wulf
- School of Informatics, University of Edinburgh, UK
| | | | | | - Nathalie L. Rochefort
- Centre for Discovery Brain Sciences, University of Edinburgh, UK
- Simons Initiative for the Developing Brain, University of Edinburgh, UK
| | - Arno Onken
- School of Informatics, University of Edinburgh, UK
| | - Eric Wang
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Zhiwei Ding
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
| | - Andreas S. Tolias
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- Department of Electrical Engineering, Stanford University, Stanford, CA, US
| | - Fabian H. Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Neuroscience & Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, USA
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Germany
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
108
|
Turishcheva P, Fahey PG, Vystrčilová M, Hansel L, Froebe R, Ponder K, Qiu Y, Willeke KF, Bashiri M, Wang E, Ding Z, Tolias AS, Sinz FH, Ecker AS. The Dynamic Sensorium competition for predicting large-scale mouse visual cortex activity from videos. ARXIV 2024:arXiv:2305.19654v2. [PMID: 37396602 PMCID: PMC10312815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Understanding how biological visual systems process information is challenging due to the complex nonlinear relationship between neuronal responses and high-dimensional visual input. Artificial neural networks have already improved our understanding of this system by allowing computational neuroscientists to create predictive models and bridge biological and machine vision. During the Sensorium 2022 competition, we introduced benchmarks for vision models with static input (i.e. images). However, animals operate and excel in dynamic environments, making it crucial to study and understand how the brain functions under these conditions. Moreover, many biological theories, such as predictive coding, suggest that previous input is crucial for current input processing. Currently, there is no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we propose the Sensorium 2023 Benchmark Competition with dynamic input (https://www.sensorium-competition.net/). This competition includes the collection of a new large-scale dataset from the primary visual cortex of ten mice, containing responses from over 78,000 neurons to over 2 hours of dynamic stimuli per neuron. Participants in the main benchmark track will compete to identify the best predictive models of neuronal responses for dynamic input (i.e. video). We will also host a bonus track in which submission performance will be evaluated on out-of-domain input, using withheld neuronal responses to dynamic input stimuli whose statistics differ from the training set. Both tracks will offer behavioral data along with video stimuli. As before, we will provide code, tutorials, and strong pre-trained baseline models to encourage participation. We hope this competition will continue to strengthen the accompanying Sensorium benchmarks collection as a standard tool to measure progress in large-scale neural system identification models of the entire mouse visual hierarchy and beyond.
Collapse
Affiliation(s)
- Polina Turishcheva
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Paul G Fahey
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Michaela Vystrčilová
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Laura Hansel
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
| | - Rachel Froebe
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Kayla Ponder
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Yongrong Qiu
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Konstantin F Willeke
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Mohammad Bashiri
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Eric Wang
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Zhiwei Ding
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Andreas S Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- Department of Electrical Engineering, Stanford University, Stanford, CA, US
| | - Fabian H Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- International Max Planck Research School for Intelligent Systems, University of Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
109
|
Almasi A, Sun SH, Jung YJ, Ibbotson M, Meffin H. Data-driven modelling of visual receptive fields: comparison between the generalized quadratic model and the nonlinear input model. J Neural Eng 2024; 21:046014. [PMID: 38941988 DOI: 10.1088/1741-2552/ad5d15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 06/28/2024] [Indexed: 06/30/2024]
Abstract
Objective: Neurons in primary visual cortex (V1) display a range of sensitivity in their response to translations of their preferred visual features within their receptive field: from high specificity to a precise position through to complete invariance. This visual feature selectivity and invariance is frequently modeled by applying a selection of linear spatial filters to the input image, that define the feature selectivity, followed by a nonlinear function that combines the filter outputs, that defines the invariance, to predict the neural response. We compare two such classes of model, that are both popular and parsimonious, the generalized quadratic model (GQM) and the nonlinear input model (NIM). These two classes of model differ primarily in that the NIM can accommodate a greater diversity in the form of nonlinearity that is applied to the outputs of the filters.Approach: We compare the two model types by applying them to data from multielectrode recordings from cat primary visual cortex in response to spatially white Gaussian noise After fitting both classes of model to a database of 342 single units (SUs), we analyze the qualitative and quantitative differences in the visual feature processing performed by the two models and their ability to predict neural response.Main results: We find that the NIM predicts response rates on a held-out data at least as well as the GQM for 95% of SUs. Superior performance occurs predominantly for those units with above average spike rates and is largely due to the NIMs ability to capture aspects of the model's nonlinear function cannot be captured with the GQM rather than differences in the visual features being processed by the two different models.Significance: These results can help guide model choice for data-driven receptive field modelling.
Collapse
Affiliation(s)
- Ali Almasi
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Shi H Sun
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Young Jun Jung
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Michael Ibbotson
- National Vision Research Institute, Carlton, VIC 3053, Australia
- Department of Optometry and Vision Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Hamish Meffin
- National Vision Research Institute, Carlton, VIC 3053, Australia
- Department of Biomedical Engineering, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
110
|
Chandran KS, Ghosh K. A deep learning based cognitive model to probe the relation between psychophysics and electrophysiology of flicker stimulus. Brain Inform 2024; 11:18. [PMID: 38987386 PMCID: PMC11236830 DOI: 10.1186/s40708-024-00231-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 06/14/2024] [Indexed: 07/12/2024] Open
Abstract
The flicker stimulus is a visual stimulus of intermittent illumination. A flicker stimulus can appear flickering or steady to a human subject, depending on the physical parameters associated with the stimulus. When the flickering light appears steady, flicker fusion is said to have occurred. This work aims to bridge the gap between the psychophysics of flicker fusion and the electrophysiology associated with flicker stimulus through a Deep Learning based computational model of flicker perception. Convolutional Recurrent Neural Networks (CRNNs) were trained with psychophysics data of flicker stimulus obtained from a human subject. We claim that many of the reported features of electrophysiology of the flicker stimulus, including the presence of fundamentals and harmonics of the stimulus, can be explained as the result of a temporal convolution operation on the flicker stimulus. We further show that the convolution layer output of a CRNN trained with psychophysics data is more responsive to specific frequencies as in human EEG response to flicker, and the convolution layer of a trained CRNN can give a nearly sinusoidal output for 10 hertz flicker stimulus as reported for some human subjects.
Collapse
Affiliation(s)
- Keerthi S Chandran
- Center for Soft Computing Research, Indian Statistical Institue, 203 BT Road, Kolkata, West Bengal, 700108, India.
- Machine Intelligence Unit, Indian Statistical Institute, 203 BT Road, Kolkata, West Bengal, 700108, India.
| | - Kuntal Ghosh
- Center for Soft Computing Research, Indian Statistical Institue, 203 BT Road, Kolkata, West Bengal, 700108, India
- Machine Intelligence Unit, Indian Statistical Institute, 203 BT Road, Kolkata, West Bengal, 700108, India
| |
Collapse
|
111
|
Johnsen KA, Cruzado NA, Menard ZC, Willats AA, Charles AS, Markowitz JE, Rozell CJ. Bridging model and experiment in systems neuroscience with Cleo: the Closed-Loop, Electrophysiology, and Optophysiology simulation testbed. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.01.27.525963. [PMID: 39026717 PMCID: PMC11257437 DOI: 10.1101/2023.01.27.525963] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Systems neuroscience has experienced an explosion of new tools for reading and writing neural activity, enabling exciting new experiments such as all-optical or closed-loop control that effect powerful causal interventions. At the same time, improved computational models are capable of reproducing behavior and neural activity with increasing fidelity. Unfortunately, these advances have drastically increased the complexity of integrating different lines of research, resulting in the missed opportunities and untapped potential of suboptimal experiments. Experiment simulation can help bridge this gap, allowing model and experiment to better inform each other by providing a low-cost testbed for experiment design, model validation, and methods engineering. Specifically, this can be achieved by incorporating the simulation of the experimental interface into our models, but no existing tool integrates optogenetics, two-photon calcium imaging, electrode recording, and flexible closed-loop processing with neural population simulations. To address this need, we have developed Cleo: the Closed-Loop, Electrophysiology, and Optophysiology experiment simulation testbed. Cleo is a Python package enabling injection of recording and stimulation devices as well as closed-loop control with realistic latency into a Brian spiking neural network model. It is the only publicly available tool currently supporting two-photon and multi-opsin/wavelength optogenetics. To facilitate adoption and extension by the community, Cleo is open-source, modular, tested, and documented, and can export results to various data formats. Here we describe the design and features of Cleo, validate output of individual components and integrated experiments, and demonstrate its utility for advancing optogenetic techniques in prospective experiments using previously published systems neuroscience models.
Collapse
Affiliation(s)
- Kyle A. Johnsen
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | | | - Zachary C. Menard
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Adam A. Willats
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Adam S. Charles
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Jeffrey E. Markowitz
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | | |
Collapse
|
112
|
Lindsey JW, Issa EB. Factorized visual representations in the primate visual system and deep neural networks. eLife 2024; 13:RP91685. [PMID: 38968311 PMCID: PMC11226229 DOI: 10.7554/elife.91685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024] Open
Abstract
Object classification has been proposed as a principal objective of the primate ventral visual stream and has been used as an optimization target for deep neural network models (DNNs) of the visual system. However, visual brain areas represent many different types of information, and optimizing for classification of object identity alone does not constrain how other information may be encoded in visual representations. Information about different scene parameters may be discarded altogether ('invariance'), represented in non-interfering subspaces of population activity ('factorization') or encoded in an entangled fashion. In this work, we provide evidence that factorization is a normative principle of biological visual representations. In the monkey ventral visual hierarchy, we found that factorization of object pose and background information from object identity increased in higher-level regions and strongly contributed to improving object identity decoding performance. We then conducted a large-scale analysis of factorization of individual scene parameters - lighting, background, camera viewpoint, and object pose - in a diverse library of DNN models of the visual system. Models which best matched neural, fMRI, and behavioral data from both monkeys and humans across 12 datasets tended to be those which factorized scene parameters most strongly. Notably, invariance to these parameters was not as consistently associated with matches to neural and behavioral data, suggesting that maintaining non-class information in factorized activity subspaces is often preferred to dropping it altogether. Thus, we propose that factorization of visual scene information is a widely used strategy in brains and DNN models thereof.
Collapse
Affiliation(s)
- Jack W Lindsey
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| | - Elias B Issa
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| |
Collapse
|
113
|
Quaia C, Krauzlis RJ. Object recognition in primates: What can early visual areas contribute? ARXIV 2024:arXiv:2407.04816v1. [PMID: 39398202 PMCID: PMC11468158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
If neuroscientists were asked which brain area is responsible for object recognition in primates, most would probably answer infero-temporal (IT) cortex. While IT is likely responsible for fine discriminations, and it is accordingly dominated by foveal visual inputs, there is more to object recognition than fine discrimination. Importantly, foveation of an object of interest usually requires recognizing, with reasonable confidence, its presence in the periphery. Arguably, IT plays a secondary role in such peripheral recognition, and other visual areas might instead be more critical. To investigate how signals carried by early visual processing areas (such as LGN and V1) could be used for object recognition in the periphery, we focused here on the task of distinguishing faces from non-faces. We tested how sensitive various models were to nuisance parameters, such as changes in scale and orientation of the image, and the type of image background. We found that a model of V1 simple or complex cells could provide quite reliable information, resulting in performance better than 80% in realistic scenarios. An LGN model performed considerably worse. Because peripheral recognition is both crucial to enable fine recognition (by bringing an object of interest on the fovea), and probably sufficient to account for a considerable fraction of our daily recognition-guided behavior, we think that the current focus on area IT and foveal processing is too narrow. We propose that rather than a hierarchical system with IT-like properties as its primary aim, object recognition should be seen as a parallel process, with high-accuracy foveal modules operating in parallel with lower-accuracy and faster modules that can operate across the visual field.
Collapse
Affiliation(s)
- Christian Quaia
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD, USA
| | - Richard J Krauzlis
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
114
|
Wu N, Valera I, Sinz F, Ecker A, Euler T, Qiu Y. Probabilistic neural transfer function estimation with Bayesian system identification. PLoS Comput Biol 2024; 20:e1012354. [PMID: 39083559 PMCID: PMC11318871 DOI: 10.1371/journal.pcbi.1012354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 08/12/2024] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Neural population responses in sensory systems are driven by external physical stimuli. This stimulus-response relationship is typically characterized by receptive fields, which have been estimated by neural system identification approaches. Such models usually require a large amount of training data, yet, the recording time for animal experiments is limited, giving rise to epistemic uncertainty for the learned neural transfer functions. While deep neural network models have demonstrated excellent power on neural prediction, they usually do not provide the uncertainty of the resulting neural representations and derived statistics, such as most exciting inputs (MEIs), from in silico experiments. Here, we present a Bayesian system identification approach to predict neural responses to visual stimuli, and explore whether explicitly modeling network weight variability can be beneficial for identifying neural response properties. To this end, we use variational inference to estimate the posterior distribution of each model weight given the training data. Tests with different neural datasets demonstrate that this method can achieve higher or comparable performance on neural prediction, with a much higher data efficiency compared to Monte Carlo dropout methods and traditional models using point estimates of the model parameters. At the same time, our variational method provides us with an effectively infinite ensemble, avoiding the idiosyncrasy of any single model, to generate MEIs. This allows us to estimate the uncertainty of stimulus-response function, which we have found to be negatively correlated with the predictive performance at model level and may serve to evaluate models. Furthermore, our approach enables us to identify response properties with credible intervals and to determine whether the inferred features are meaningful by performing statistical tests on MEIs. Finally, in silico experiments show that our model generates stimuli driving neuronal activity significantly better than traditional models in the limited-data regime.
Collapse
Affiliation(s)
- Nan Wu
- Department of Computer Science, Saarland University, Saarbrücken, Germany
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
| | - Isabel Valera
- Department of Computer Science, Saarland University, Saarbrücken, Germany
| | - Fabian Sinz
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
| | - Alexander Ecker
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| | - Thomas Euler
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
| | - Yongrong Qiu
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, California, United State of America
- Stanford Bio-X, Stanford University, Stanford, California, United State of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United State of America
| |
Collapse
|
115
|
Rathkopf C, Heinrichs B. Learning to Live with Strange Error: Beyond Trustworthiness in Artificial Intelligence Ethics. Camb Q Healthc Ethics 2024; 33:333-345. [PMID: 36621773 DOI: 10.1017/s0963180122000688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Position papers on artificial intelligence (AI) ethics are often framed as attempts to work out technical and regulatory strategies for attaining what is commonly called trustworthy AI. In such papers, the technical and regulatory strategies are frequently analyzed in detail, but the concept of trustworthy AI is not. As a result, it remains unclear. This paper lays out a variety of possible interpretations of the concept and concludes that none of them is appropriate. The central problem is that, by framing the ethics of AI in terms of trustworthiness, we reinforce unjustified anthropocentric assumptions that stand in the way of clear analysis. Furthermore, even if we insist on a purely epistemic interpretation of the concept, according to which trustworthiness just means measurable reliability, it turns out that the analysis will, nevertheless, suffer from a subtle form of anthropocentrism. The paper goes on to develop the concept of strange error, which serves both to sharpen the initial diagnosis of the inadequacy of trustworthy AI and to articulate the novel epistemological situation created by the use of AI. The paper concludes with a discussion of how strange error puts pressure on standard practices of assessing moral culpability, particularly in the context of medicine.
Collapse
Affiliation(s)
| | - Bert Heinrichs
- INM-7, Forschungszentrum Jülich GmbH, Jülich, Germany
- The Institute for Science and Ethics (IWE) The University of Bonn Bonner Talweg 57, 53113, Germany
| |
Collapse
|
116
|
Ostojic S, Fusi S. Computational role of structure in neural activity and connectivity. Trends Cogn Sci 2024; 28:677-690. [PMID: 38553340 DOI: 10.1016/j.tics.2024.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 02/29/2024] [Accepted: 03/07/2024] [Indexed: 07/05/2024]
Abstract
One major challenge of neuroscience is identifying structure in seemingly disorganized neural activity. Different types of structure have different computational implications that can help neuroscientists understand the functional role of a particular brain area. Here, we outline a unified approach to characterize structure by inspecting the representational geometry and the modularity properties of the recorded activity and show that a similar approach can also reveal structure in connectivity. We start by setting up a general framework for determining geometry and modularity in activity and connectivity and relating these properties with computations performed by the network. We then use this framework to review the types of structure found in recent studies of model networks performing three classes of computations.
Collapse
Affiliation(s)
- Srdjan Ostojic
- Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole Normale Superieure - PSL Research University, 75005 Paris, France.
| | - Stefano Fusi
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA; Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Neuroscience, Columbia University, New York, NY, USA; Kavli Institute for Brain Science, Columbia University, New York, NY, USA
| |
Collapse
|
117
|
Driscoll LN, Shenoy K, Sussillo D. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs. Nat Neurosci 2024; 27:1349-1363. [PMID: 38982201 PMCID: PMC11239504 DOI: 10.1038/s41593-024-01668-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 04/26/2024] [Indexed: 07/11/2024]
Abstract
Flexible computation is a hallmark of intelligent behavior. However, little is known about how neural networks contextually reconfigure for different computations. In the present work, we identified an algorithmic neural substrate for modular computation through the study of multitasking artificial recurrent neural networks. Dynamical systems analyses revealed learned computational strategies mirroring the modular subtask structure of the training task set. Dynamical motifs, which are recurring patterns of neural activity that implement specific computations through dynamics, such as attractors, decision boundaries and rotations, were reused across tasks. For example, tasks requiring memory of a continuous circular variable repurposed the same ring attractor. We showed that dynamical motifs were implemented by clusters of units when the unit activation function was restricted to be positive. Cluster lesions caused modular performance deficits. Motifs were reconfigured for fast transfer learning after an initial phase of learning. This work establishes dynamical motifs as a fundamental unit of compositional computation, intermediate between neuron and network. As whole-brain studies simultaneously record activity from multiple specialized systems, the dynamical motif framework will guide questions about specialization and generalization.
Collapse
Affiliation(s)
- Laura N Driscoll
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
| | - Krishna Shenoy
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
- Department of Neurosurgery, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Neurobiology, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
- Bio-X Institute, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, USA
| | - David Sussillo
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
118
|
Zhang J, Zhou H, Wang S. Distinct visual processing networks for foveal and peripheral visual fields. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600415. [PMID: 38979165 PMCID: PMC11230199 DOI: 10.1101/2024.06.24.600415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Foveal and peripheral vision are two distinct modes of visual processing essential for navigating the world. However, it remains unclear if they engage different neural mechanisms and circuits within the visual attentional system. Here, we trained macaques to perform a free-gaze visual search task using natural face and object stimuli and recorded a large number of 14588 visually responsive neurons from a broadly distributed network of brain regions involved in visual attentional processing. Foveal and peripheral units had substantially different proportions across brain regions and exhibited systematic differences in encoding visual information and visual attention. The spike-LFP coherence of foveal units was more extensively modulated by both attention and visual selectivity, thus indicating differential engagement of the attention and visual coding network compared to peripheral units. Furthermore, we delineated the interaction and coordination between foveal and peripheral processing for spatial attention and saccade selection. Finally, the search became more efficient with increasing target-induced desynchronization, and foveal and peripheral units exhibited different correlations between neural responses and search behavior. Together, the systematic differences between foveal and peripheral processing provide valuable insights into how the brain processes and integrates visual information from different regions of the visual field. Significance Statement This study investigates the systematic differences between foveal and peripheral vision, two crucial components of visual processing essential for navigating our surroundings. By simultaneously recording from a large number of neurons in the visual attentional neural network, we revealed substantial variations in the proportion and functional characteristics of foveal and peripheral units across different brain regions. We uncovered differential modulation of functional connectivity by attention and visual selectivity, elucidated the intricate interplay between foveal and peripheral processing in spatial attention and saccade selection, and linked neural responses to search behavior. Overall, our study contributes to a deeper understanding of how the brain processes and integrates visual information for active visual behaviors.
Collapse
|
119
|
Zhang J, Cao R, Zhu X, Zhou H, Wang S. Distinct attentional profile and functional connectivity of neurons with visual feature coding in the primate brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600401. [PMID: 38979388 PMCID: PMC11230157 DOI: 10.1101/2024.06.24.600401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Visual attention and object recognition are two critical cognitive functions that significantly influence our perception of the world. While these neural processes converge on the temporal cortex, the exact nature of their interactions remains largely unclear. Here, we systematically investigated the interplay between visual attention and object feature coding by training macaques to perform a free-gaze visual search task using natural face and object stimuli. With a large number of units recorded from multiple brain areas, we discovered that units exhibiting visual feature coding displayed a distinct attentional response profile and functional connectivity compared to units not exhibiting feature coding. Attention directed towards search targets enhanced the pattern separation of stimuli across brain areas, and this enhancement was more pronounced for units encoding visual features. Our findings suggest two stages of neural processing, with the early stage primarily focused on processing visual features and the late stage dedicated to processing attention. Importantly, feature coding in the early stage could predict the attentional effect in the late stage. Together, our results suggest an intricate interplay between visual feature and attention coding in the primate brain, which can be attributed to the differential functional connectivity and neural networks engaged in these processes.
Collapse
|
120
|
Subramaniam V, Conwell C, Wang C, Kreiman G, Katz B, Cases I, Barbu A. Revealing Vision-Language Integration in the Brain with Multimodal Networks. ARXIV 2024:arXiv:2406.14481v1. [PMID: 38947929 PMCID: PMC11213144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoen-cephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.
Collapse
Affiliation(s)
| | - Colin Conwell
- Department of Cognitive Science, Johns Hopkins University
| | | | | | | | | | | |
Collapse
|
121
|
Waldrop MM. Can ChatGPT help researchers understand how the human brain handles language? Proc Natl Acad Sci U S A 2024; 121:e2410196121. [PMID: 38875152 PMCID: PMC11194597 DOI: 10.1073/pnas.2410196121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024] Open
|
122
|
Wang R, Chen ZS. Large-scale foundation models and generative AI for BigData neuroscience. Neurosci Res 2024:S0168-0102(24)00075-0. [PMID: 38897235 PMCID: PMC11649861 DOI: 10.1016/j.neures.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 04/15/2024] [Accepted: 05/15/2024] [Indexed: 06/21/2024]
Abstract
Recent advances in machine learning have led to revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.
Collapse
Affiliation(s)
- Ran Wang
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Neuroscience and Physiology, Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA.
| |
Collapse
|
123
|
Li Y, Yang H, Gu S. Enhancing neural encoding models for naturalistic perception with a multi-level integration of deep neural networks and cortical networks. Sci Bull (Beijing) 2024; 69:1738-1747. [PMID: 38490889 DOI: 10.1016/j.scib.2024.02.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 06/27/2023] [Accepted: 02/23/2024] [Indexed: 03/17/2024]
Abstract
Cognitive neuroscience aims to develop computational models that can accurately predict and explain neural responses to sensory inputs in the cortex. Recent studies attempt to leverage the representation power of deep neural networks (DNNs) to predict the brain response and suggest a correspondence between artificial and biological neural networks in their feature representations. However, typical voxel-wise encoding models tend to rely on specific networks designed for computer vision tasks, leading to suboptimal brain-wide correspondence during cognitive tasks. To address this challenge, this work proposes a novel approach that upgrades voxel-wise encoding models through multi-level integration of features from DNNs and information from brain networks. Our approach combines DNN feature-level ensemble learning and brain atlas-level model integration, resulting in significant improvements in predicting whole-brain neural activity during naturalistic video perception. Furthermore, this multi-level integration framework enables a deeper understanding of the brain's neural representation mechanism, accurately predicting the neural response to complex visual concepts. We demonstrate that neural encoding models can be optimized by leveraging a framework that integrates both data-driven approaches and theoretical insights into the functional structure of the cortical networks.
Collapse
Affiliation(s)
- Yuanning Li
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai 201210, China.
| | - Huzheng Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shi Gu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 518110, China.
| |
Collapse
|
124
|
Hosoda K, Nishida K, Seno S, Mashita T, Kashioka H, Ohzawa I. A single fast Hebbian-like process enabling one-shot class addition in deep neural networks without backbone modification. Front Neurosci 2024; 18:1344114. [PMID: 38933813 PMCID: PMC11202076 DOI: 10.3389/fnins.2024.1344114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 05/16/2024] [Indexed: 06/28/2024] Open
Abstract
One-shot learning, the ability to learn a new concept from a single instance, is a distinctive brain function that has garnered substantial interest in machine learning. While modeling physiological mechanisms poses challenges, advancements in artificial neural networks have led to performances in specific tasks that rival human capabilities. Proposing one-shot learning methods with these advancements, especially those involving simple mechanisms, not only enhance technological development but also contribute to neuroscience by proposing functionally valid hypotheses. Among the simplest methods for one-shot class addition with deep learning image classifiers is "weight imprinting," which uses neural activity from a new class image data as the corresponding new synaptic weights. Despite its simplicity, its relevance to neuroscience is ambiguous, and it often interferes with original image classification, which is a significant drawback in practical applications. This study introduces a novel interpretation where a part of the weight imprinting process aligns with the Hebbian rule. We show that a single Hebbian-like process enables pre-trained deep learning image classifiers to perform one-shot class addition without any modification to the original classifier's backbone. Using non-parametric normalization to mimic brain's fast Hebbian plasticity significantly reduces the interference observed in previous methods. Our method is one of the simplest and most practical for one-shot class addition tasks, and its reliance on a single fast Hebbian-like process contributes valuable insights to neuroscience hypotheses.
Collapse
Affiliation(s)
- Kazufumi Hosoda
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
- Life and Medical Sciences Area, Health Sciences Discipline, Kobe University, Kobe, Japan
| | - Keigo Nishida
- Laboratory for Computational Molecular Design, RIKEN Center for Biosystems Dynamics Research, Suita, Japan
| | - Shigeto Seno
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Suita, Japan
| | | | - Hideki Kashioka
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
| | - Izumi Ohzawa
- Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan
| |
Collapse
|
125
|
Bredenberg C, Savin C. Desiderata for Normative Models of Synaptic Plasticity. Neural Comput 2024; 36:1245-1285. [PMID: 38776950 DOI: 10.1162/neco_a_01671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/06/2024] [Indexed: 05/25/2024]
Abstract
Normative models of synaptic plasticity use computational rationales to arrive at predictions of behavioral and network-level adaptive phenomena. In recent years, there has been an explosion of theoretical work in this realm, but experimental confirmation remains limited. In this review, we organize work on normative plasticity models in terms of a set of desiderata that, when satisfied, are designed to ensure that a given model demonstrates a clear link between plasticity and adaptive behavior, is consistent with known biological evidence about neural plasticity and yields specific testable predictions. As a prototype, we include a detailed analysis of the REINFORCE algorithm. We also discuss how new models have begun to improve on the identified criteria and suggest avenues for further development. Overall, we provide a conceptual guide to help develop neural learning theories that are precise, powerful, and experimentally testable.
Collapse
Affiliation(s)
- Colin Bredenberg
- Center for Neural Science, New York University, New York, NY 10003, U.S.A
- Mila-Quebec AI Institute, Montréal, QC H2S 3H1, Canada
| | - Cristina Savin
- Center for Neural Science, New York University, New York, NY 10003, U.S.A
- Center for Data Science, New York University, New York, NY 10011, U.S.A.
| |
Collapse
|
126
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgements of natural stimuli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580134. [PMID: 38464018 PMCID: PMC10925131 DOI: 10.1101/2024.02.14.580134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
In natural behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on simple backgrounds. Natural viewing, however, carries a set of challenges that are inaccessible using artificial stimuli, including neural responses to background objects that are task-irrelevant. An emerging body of evidence suggests that the visual abilities of humans and animals can be modeled through the linear decoding of task-relevant information from visual cortex. This idea suggests the hypothesis that irrelevant features of a natural scene should impair performance on a visual task only if their neural representations intrude on the linear readout of the task relevant feature, as would occur if the representations of task-relevant and irrelevant features are not orthogonal in the underlying neural population. We tested this hypothesis using human psychophysics and monkey neurophysiology, in response to parametrically variable naturalistic stimuli. We demonstrate that 1) the neural representation of one feature (the position of a central object) in visual area V4 is orthogonal to those of several background features, 2) the ability of human observers to precisely judge object position was largely unaffected by task-irrelevant variation in those background features, and 3) many features of the object and the background are orthogonally represented by V4 neural responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of objects and features despite the tremendous richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Amy M. Ni
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marlene R. Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- equal contribution
| | - David H. Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
- equal contribution
| |
Collapse
|
127
|
Miao HY, Tong F. Convolutional neural network models applied to neuronal responses in macaque V1 reveal limited nonlinear processing. J Vis 2024; 24:1. [PMID: 38829629 PMCID: PMC11156204 DOI: 10.1167/jov.24.6.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/03/2024] [Indexed: 06/05/2024] Open
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple nonlinearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more nonlinear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven nonlinear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although the predictive accuracy of VGG-19 was somewhat better than that of standard AlexNet, we found that a modified version of AlexNet could match the performance of VGG-19 after only a few nonlinear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for nonlinear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few nonlinear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
128
|
Yax N, Anlló H, Palminteri S. Studying and improving reasoning in humans and machines. COMMUNICATIONS PSYCHOLOGY 2024; 2:51. [PMID: 39242743 PMCID: PMC11332180 DOI: 10.1038/s44271-024-00091-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/19/2024] [Indexed: 09/09/2024]
Abstract
In the present study, we investigate and compare reasoning in large language models (LLMs) and humans, using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. We presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models' limitations disappearing almost entirely in more recent LLMs' releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.
Collapse
Affiliation(s)
- Nicolas Yax
- Laboratoire de neurosciences cognitives et computationnelles, Institut national de la santé et de la recherche médicale, Paris, France
- Département d'études cognitives, Ecole normale supérieure - PSL Research University, Paris, France
- FLOWERS Lab, Institut national de recherche en informatique et en automatique, Bordeaux, France
| | - Hernán Anlló
- Laboratoire de neurosciences cognitives et computationnelles, Institut national de la santé et de la recherche médicale, Paris, France
- Département d'études cognitives, Ecole normale supérieure - PSL Research University, Paris, France
| | - Stefano Palminteri
- Laboratoire de neurosciences cognitives et computationnelles, Institut national de la santé et de la recherche médicale, Paris, France.
- Département d'études cognitives, Ecole normale supérieure - PSL Research University, Paris, France.
| |
Collapse
|
129
|
Romeni S, Toni L, Artoni F, Micera S. Decoding electroencephalographic responses to visual stimuli compatible with electrical stimulation. APL Bioeng 2024; 8:026123. [PMID: 38894958 PMCID: PMC11184972 DOI: 10.1063/5.0195680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Electrical stimulation of the visual nervous system could improve the quality of life of patients affected by acquired blindness by restoring some visual sensations, but requires careful optimization of stimulation parameters to produce useful perceptions. Neural correlates of elicited perceptions could be used for fast automatic optimization, with electroencephalography as a natural choice as it can be acquired non-invasively. Nonetheless, its low signal-to-noise ratio may hinder discrimination of similar visual patterns, preventing its use in the optimization of electrical stimulation. Our work investigates for the first time the discriminability of the electroencephalographic responses to visual stimuli compatible with electrical stimulation, employing a newly acquired dataset whose stimuli encompass the concurrent variation of several features, while neuroscience research tends to study the neural correlates of single visual features. We then performed above-chance single-trial decoding of multiple features of our newly crafted visual stimuli using relatively simple machine learning algorithms. A decoding scheme employing the information from multiple stimulus presentations was implemented, substantially improving our decoding performance, suggesting that such methods should be used systematically in future applications. The significance of the present work relies in the determination of which visual features can be decoded from electroencephalographic responses to electrical stimulation-compatible stimuli and at which granularity they can be discriminated. Our methods pave the way to using electroencephalographic correlates to optimize electrical stimulation parameters, thus increasing the effectiveness of current visual neuroprostheses.
Collapse
Affiliation(s)
| | | | - Fiorenzo Artoni
- Department of Clinical Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | | |
Collapse
|
130
|
Xiao W, Sharma S, Kreiman G, Livingstone MS. Feature-selective responses in macaque visual cortex follow eye movements during natural vision. Nat Neurosci 2024; 27:1157-1166. [PMID: 38684892 PMCID: PMC11156562 DOI: 10.1038/s41593-024-01631-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/26/2024] [Indexed: 05/02/2024]
Abstract
In natural vision, primates actively move their eyes several times per second via saccades. It remains unclear whether, during this active looking, visual neurons exhibit classical retinotopic properties, anticipate gaze shifts or mirror the stable quality of perception, especially in complex natural scenes. Here, we let 13 monkeys freely view thousands of natural images across 4.6 million fixations, recorded 883 h of neuronal responses in six areas spanning primary visual to anterior inferior temporal cortex and analyzed spatial, temporal and featural selectivity in these responses. Face neurons tracked their receptive field contents, indicated by category-selective responses. Self-consistency analysis showed that general feature-selective responses also followed eye movements and remained gaze-dependent over seconds of viewing the same image. Computational models of feature-selective responses located retinotopic receptive fields during free viewing. We found limited evidence for feature-selective predictive remapping and no viewing-history integration. Thus, ventral visual neurons represent the world in a predominantly eye-centered reference frame during natural vision.
Collapse
Affiliation(s)
- Will Xiao
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
| | - Saloni Sharma
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Gabriel Kreiman
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
131
|
Djambazovska S, Zafer A, Ramezanpour H, Kreiman G, Kar K. The Impact of Scene Context on Visual Object Recognition: Comparing Humans, Monkeys, and Computational Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596127. [PMID: 38854011 PMCID: PMC11160639 DOI: 10.1101/2024.05.27.596127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
During natural vision, we rarely see objects in isolation but rather embedded in rich and complex contexts. Understanding how the brain recognizes objects in natural scenes by integrating contextual information remains a key challenge. To elucidate neural mechanisms compatible with human visual processing, we need an animal model that behaves similarly to humans, so that inferred neural mechanisms can provide hypotheses relevant to the human brain. Here we assessed whether rhesus macaques could model human context-driven object recognition by quantifying visual object identification abilities across variations in the amount, quality, and congruency of contextual cues. Behavioral metrics revealed strikingly similar context-dependent patterns between humans and monkeys. However, neural responses in the inferior temporal (IT) cortex of monkeys that were never explicitly trained to discriminate objects in context, as well as current artificial neural network models, could only partially explain this cross-species correspondence. The shared behavioral variance unexplained by context-naive neural data or computational models highlights fundamental knowledge gaps. Our findings demonstrate an intriguing alignment of human and monkey visual object processing that defies full explanation by either brain activity in a key visual region or state-of-the-art models.
Collapse
Affiliation(s)
- Sara Djambazovska
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
- Children’s Hospital, Harvard Medical School, MA, USA
| | - Anaa Zafer
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | - Hamidreza Ramezanpour
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | | | - Kohitij Kar
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| |
Collapse
|
132
|
Balwani A, Cho S, Choi H. Exploring the Architectural Biases of the Canonical Cortical Microcircuit. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595629. [PMID: 38826320 PMCID: PMC11142214 DOI: 10.1101/2024.05.23.595629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The cortex plays a crucial role in various perceptual and cognitive functions, driven by its basic unit, the canonical cortical microcircuit. Yet, we remain short of a framework that definitively explains the structure-function relationships of this fundamental neuroanatomical motif. To better understand how physical substrates of cortical circuitry facilitate their neuronal dynamics, we employ a computational approach using recurrent neural networks and representational analyses. We examine the differences manifested by the inclusion and exclusion of biologically-motivated inter-areal laminar connections on the computational roles of different neuronal populations in the microcircuit of two hierarchically-related areas, throughout learning. Our findings show that the presence of feedback connections correlates with the functional modularization of cortical populations in different layers, and provides the microcircuit with a natural inductive bias to differentiate expected and unexpected inputs at initialization. Furthermore, when testing the effects of training the microcircuit and its variants with a predictive-coding inspired strategy, we find that doing so helps better encode noisy stimuli in areas of the cortex that receive feedback, all of which combine to suggest evidence for a predictive-coding mechanism serving as an intrinsic operative logic in the cortex.
Collapse
Affiliation(s)
- Aishwarya Balwani
- School of Electrical & Computer Engineering, Georgia Institute of Technology
| | - Suhee Cho
- Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science Technology
| | - Hannah Choi
- School of Mathematics, Georgia Institute of Technology
| |
Collapse
|
133
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
134
|
Katayama R, Shiraki R, Ishii S, Yoshida W. Belief inference for hierarchical hidden states in spatial navigation. Commun Biol 2024; 7:614. [PMID: 38773301 PMCID: PMC11109253 DOI: 10.1038/s42003-024-06316-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 05/10/2024] [Indexed: 05/23/2024] Open
Abstract
Uncertainty abounds in the real world, and in environments with multiple layers of unobservable hidden states, decision-making requires resolving uncertainties based on mutual inference. Focusing on a spatial navigation problem, we develop a Tiger maze task that involved simultaneously inferring the local hidden state and the global hidden state from probabilistically uncertain observation. We adopt a Bayesian computational approach by proposing a hierarchical inference model. Applying this to human task behaviour, alongside functional magnetic resonance brain imaging, allows us to separate the neural correlates associated with reinforcement and reassessment of belief in hidden states. The imaging results also suggest that different layers of uncertainty differentially involve the basal ganglia and dorsomedial prefrontal cortex, and that the regions responsible are organised along the rostral axis of these areas according to the type of inference and the level of abstraction of the hidden state, i.e. higher-order state inference involves more anterior parts.
Collapse
Affiliation(s)
- Risa Katayama
- Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan.
- Department of AI-Brain Integration, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan.
| | - Ryo Shiraki
- Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
| | - Shin Ishii
- Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
- Neural Information Analysis Laboratories, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan
- International Research Center for Neurointelligence, the University of Tokyo, Tokyo, 113-0033, Japan
| | - Wako Yoshida
- Department of Neural Computation for Decision-Making, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| |
Collapse
|
135
|
Caplette L, Turk-Browne NB. Computational reconstruction of mental representations using human behavior. Nat Commun 2024; 15:4183. [PMID: 38760341 PMCID: PMC11101448 DOI: 10.1038/s41467-024-48114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024] Open
Abstract
Revealing how the mind represents information is a longstanding goal of cognitive science. However, there is currently no framework for reconstructing the broad range of mental representations that humans possess. Here, we ask participants to indicate what they perceive in images made of random visual features in a deep neural network. We then infer associations between the semantic features of their responses and the visual features of the images. This allows us to reconstruct the mental representations of multiple visual concepts, both those supplied by participants and other concepts extrapolated from the same semantic space. We validate these reconstructions in separate participants and further generalize our approach to predict behavior for new stimuli and in a new task. Finally, we reconstruct the mental representations of individual observers and of a neural network. This framework enables a large-scale investigation of conceptual representations.
Collapse
Affiliation(s)
| | - Nicholas B Turk-Browne
- Department of Psychology, Yale University, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| |
Collapse
|
136
|
Lengyel M. Marr's three levels of analysis are useful as a framework for neuroscience. J Physiol 2024; 602:1911-1914. [PMID: 38628044 DOI: 10.1113/jp279549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/29/2023] [Indexed: 05/04/2024] Open
Affiliation(s)
- Máté Lengyel
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK
- Center for Cognitive Computation, Department of Cognitive Science, Central European University, Budapest, Hungary
| |
Collapse
|
137
|
Cadena SA, Willeke KF, Restivo K, Denfield G, Sinz FH, Bethge M, Tolias AS, Ecker AS. Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks. PLoS Comput Biol 2024; 20:e1012056. [PMID: 38781156 PMCID: PMC11115319 DOI: 10.1371/journal.pcbi.1012056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 04/08/2024] [Indexed: 05/25/2024] Open
Abstract
Responses to natural stimuli in area V4-a mid-level area of the visual ventral stream-are well predicted by features from convolutional neural networks (CNNs) trained on image classification. This result has been taken as evidence for the functional role of V4 in object classification. However, we currently do not know if and to what extent V4 plays a role in solving other computational objectives. Here, we investigated normative accounts of V4 (and V1 for comparison) by predicting macaque single-neuron responses to natural images from the representations extracted by 23 CNNs trained on different computer vision tasks including semantic, geometric, 2D, and 3D types of tasks. We found that V4 was best predicted by semantic classification features and exhibited high task selectivity, while the choice of task was less consequential to V1 performance. Consistent with traditional characterizations of V4 function that show its high-dimensional tuning to various 2D and 3D stimulus directions, we found that diverse non-semantic tasks explained aspects of V4 function that are not captured by individual semantic tasks. Nevertheless, jointly considering the features of a pair of semantic classification tasks was sufficient to yield one of our top V4 models, solidifying V4's main functional role in semantic processing and suggesting that V4's selectivity to 2D or 3D stimulus properties found by electrophysiologists can result from semantic functional goals.
Collapse
Affiliation(s)
- Santiago A. Cadena
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
| | - Konstantin F. Willeke
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University Tübingen, Tübingen, Germany
| | - Kelli Restivo
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - George Denfield
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - Fabian H. Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| | - Andreas S. Tolias
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, Texas, United States of America
| | - Alexander S. Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
138
|
Scherberger H. Modeling proprioception with task-driven neural network models. Neuron 2024; 112:1384-1386. [PMID: 38614104 DOI: 10.1016/j.neuron.2024.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 04/15/2024]
Abstract
In a recent issue of Cell, Vargas and colleagues1 demonstrate that task-driven neural network models are superior at predicting proprioceptive activity in the primate cuneate nucleus and sensorimotor cortex compared with other models. This provides valuable insights for better understanding the proprioceptive pathway.
Collapse
Affiliation(s)
- Hansjörg Scherberger
- German Primate Center, 37077 Göttingen, Germany; University of Göttingen, Department of Biology and Psychology, 37077 Göttingen, Germany.
| |
Collapse
|
139
|
Ramdya P. AI networks reveal how flies find a mate. Nature 2024; 629:1010-1011. [PMID: 38778186 DOI: 10.1038/d41586-024-01320-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
|
140
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
141
|
She L, Benna MK, Shi Y, Fusi S, Tsao DY. Temporal multiplexing of perception and memory codes in IT cortex. Nature 2024; 629:861-868. [PMID: 38750353 PMCID: PMC11111405 DOI: 10.1038/s41586-024-07349-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 03/25/2024] [Indexed: 05/24/2024]
Abstract
A central assumption of neuroscience is that long-term memories are represented by the same brain areas that encode sensory stimuli1. Neurons in inferotemporal (IT) cortex represent the sensory percept of visual objects using a distributed axis code2-4. Whether and how the same IT neural population represents the long-term memory of visual objects remains unclear. Here we examined how familiar faces are encoded in the IT anterior medial face patch (AM), perirhinal face patch (PR) and temporal pole face patch (TP). In AM and PR we observed that the encoding axis for familiar faces is rotated relative to that for unfamiliar faces at long latency; in TP this memory-related rotation was much weaker. Contrary to previous claims, the relative response magnitude to familiar versus unfamiliar faces was not a stable indicator of familiarity in any patch5-11. The mechanism underlying the memory-related axis change is likely intrinsic to IT cortex, because inactivation of PR did not affect axis change dynamics in AM. Overall, our results suggest that memories of familiar faces are represented in AM and perirhinal cortex by a distinct long-latency code, explaining how the same cell population can encode both the percept and memory of faces.
Collapse
Affiliation(s)
- Liang She
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA.
| | - Marcus K Benna
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York City, NY, USA
- Neurobiology Section, Division of Biological Sciences, University of California, San Diego, San Diego, CA, USA
| | - Yuelin Shi
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA
| | - Stefano Fusi
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York City, NY, USA
| | - Doris Y Tsao
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA.
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA.
- Department of Neuroscience, University of California, Berkeley, CA, USA.
| |
Collapse
|
142
|
Nguyen P, Sooriyaarachchi J, Huang Q, Baker CL. Estimating receptive fields of simple and complex cells in early visual cortex: A convolutional neural network model with parameterized rectification. PLoS Comput Biol 2024; 20:e1012127. [PMID: 38820562 PMCID: PMC11168683 DOI: 10.1371/journal.pcbi.1012127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 06/12/2024] [Accepted: 05/01/2024] [Indexed: 06/02/2024] Open
Abstract
Neurons in the primary visual cortex respond selectively to simple features of visual stimuli, such as orientation and spatial frequency. Simple cells, which have phase-sensitive responses, can be modeled by a single receptive field filter in a linear-nonlinear model. However, it is challenging to analyze phase-invariant complex cells, which require more elaborate models having a combination of nonlinear subunits. Estimating parameters of these models is made additionally more difficult by cortical neurons' trial-to-trial response variability. We develop a simple convolutional neural network method to estimate receptive field models for both simple and complex visual cortex cells from their responses to natural images. The model consists of a spatiotemporal filter, a parameterized rectifier unit (PReLU), and a two-dimensional Gaussian "map" of the receptive field envelope. A single model parameter determines the simple vs. complex nature of the receptive field, capturing complex cell responses as a summation of homogeneous subunits, and collapsing to a linear-nonlinear model for simple type cells. The convolutional method predicts simple and complex cell responses to natural image stimuli as well as grating tuning curves. The fitted models yield a continuum of values for the PReLU parameter across the sampled neurons, showing that the simple/complex nature of cells can vary in a continuous manner. We demonstrate that complex-like cells respond less reliably than simple-like cells. However, compensation for this unreliability with noise ceiling analysis reveals predictive performance for complex cells proportionately closer to that for simple cells. Most spatial receptive field structures are well fit by Gabor functions, whose parameters confirm well-known properties of cat A17/18 receptive fields.
Collapse
Affiliation(s)
- Philippe Nguyen
- Department of Biomedical Engineering, McGill University, Montreal, Quebec, Canada
| | | | - Qianyu Huang
- Department of Biology, McGill University, Montreal, Quebec, Canada
| | - Curtis L. Baker
- Department of Ophthalmology and Visual Sciences, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
143
|
Ren Y, Bashivan P. How well do models of visual cortex generalize to out of distribution samples? PLoS Comput Biol 2024; 20:e1011145. [PMID: 38820563 PMCID: PMC11216589 DOI: 10.1371/journal.pcbi.1011145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/01/2024] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs' object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
Collapse
Affiliation(s)
- Yifei Ren
- Department of Computer Science, McGill University, Montreal, Canada
| | - Pouya Bashivan
- Department of Computer Science, McGill University, Montreal, Canada
- Department of Computer Physiology, McGill University, Montreal, Canada
- Mila, Université de Montréal, Montreal, Canada
| |
Collapse
|
144
|
Farzmahdi A, Zarco W, Freiwald WA, Kriegeskorte N, Golan T. Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks. eLife 2024; 13:e90256. [PMID: 38661128 PMCID: PMC11142642 DOI: 10.7554/elife.90256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/25/2024] [Indexed: 04/26/2024] Open
Abstract
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g. left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
Collapse
Affiliation(s)
- Amirhossein Farzmahdi
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- School of Cognitive Sciences, Institute for Research in Fundamental SciencesTehranIslamic Republic of Iran
| | - Wilbert Zarco
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
| | - Winrich A Freiwald
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- The Center for Brains, Minds & MachinesCambridgeUnited States
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Psychology, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
| | - Tal Golan
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
145
|
Lu Z, Wang Y, Golomb JD. Achieving more human brain-like vision via human EEG representational alignment. ARXIV 2024:arXiv:2401.17231v2. [PMID: 38351926 PMCID: PMC10862929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, for the first time, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
Collapse
Affiliation(s)
- Zitong Lu
- Department of Psychology, The Ohio State University
| | - Yile Wang
- Department of Neuroscience, The University of Texas at Dallas
| | | |
Collapse
|
146
|
Tamura H. An analysis of information segregation in parallel streams of a multi-stream convolutional neural network. Sci Rep 2024; 14:9097. [PMID: 38643326 PMCID: PMC11032341 DOI: 10.1038/s41598-024-59930-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 04/16/2024] [Indexed: 04/22/2024] Open
Abstract
Visual information is processed in hierarchically organized parallel streams in the primate brain. In the present study, information segregation in parallel streams was examined by constructing a convolutional neural network with parallel architecture in all of the convolutional layers. Although filter weights for convolution were initially set to random values, color information was segregated from shape information in most model instances after training. Deletion of the color-related stream decreased recognition accuracy of animate images, whereas deletion of the shape-related stream decreased recognition accuracy of both animate and inanimate images. The results suggest that properties of filters and functions of a stream are spontaneously segregated in parallel streams of neural networks.
Collapse
Affiliation(s)
- Hiroshi Tamura
- Cognitive Neuroscience Group, Graduate School of Frontier Biosciences, The University of Osaka, 1-4 Yamadaoka, Suita, Osaka, 565-0871, Japan.
- Center for Information and Neural Networks, Suita, Osaka, 565-0871, Japan.
| |
Collapse
|
147
|
Qu Y, Wei C, Du P, Che W, Zhang C, Ouyang W, Bian Y, Xu F, Hu B, Du K, Wu H, Liu J, Liu Q. Integration of cognitive tasks into artificial general intelligence test for large models. iScience 2024; 27:109550. [PMID: 38595796 PMCID: PMC11001637 DOI: 10.1016/j.isci.2024.109550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024] Open
Abstract
During the evolution of large models, performance evaluation is necessary for assessing their capabilities. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, including crystallized, fluid, social, and embodied intelligence. The AGI tests consist of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society.
Collapse
Affiliation(s)
- Youzhi Qu
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Chen Wei
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Penghui Du
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Wenxin Che
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Chi Zhang
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | | | | | - Feiyang Xu
- iFLYTEK AI Research, Hefei 230088, China
| | - Bin Hu
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Kai Du
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| | - Haiyan Wu
- Centre for Cognitive and Brain Sciences and Department of Psychology, University of Macau, Macau 999078, China
| | - Jia Liu
- Department of Psychology, Tsinghua University, Beijing 100084, China
| | - Quanying Liu
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
148
|
Wang Y, Cao R, Wang S. Encoding of Visual Objects in the Human Medial Temporal Lobe. J Neurosci 2024; 44:e2135232024. [PMID: 38429107 PMCID: PMC11026346 DOI: 10.1523/jneurosci.2135-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/10/2024] [Accepted: 02/25/2024] [Indexed: 03/03/2024] Open
Abstract
The human medial temporal lobe (MTL) plays a crucial role in recognizing visual objects, a key cognitive function that relies on the formation of semantic representations. Nonetheless, it remains unknown how visual information of general objects is translated into semantic representations in the MTL. Furthermore, the debate about whether the human MTL is involved in perception has endured for a long time. To address these questions, we investigated three distinct models of neural object coding-semantic coding, axis-based feature coding, and region-based feature coding-in each subregion of the human MTL, using high-resolution fMRI in two male and six female participants. Our findings revealed the presence of semantic coding throughout the MTL, with a higher prevalence observed in the parahippocampal cortex (PHC) and perirhinal cortex (PRC), while axis coding and region coding were primarily observed in the earlier regions of the MTL. Moreover, we demonstrated that voxels exhibiting axis coding supported the transition to region coding and contained information relevant to semantic coding. Together, by providing a detailed characterization of neural object coding schemes and offering a comprehensive summary of visual coding information for each MTL subregion, our results not only emphasize a clear role of the MTL in perceptual processing but also shed light on the translation of perception-driven representations of visual features into memory-driven representations of semantics along the MTL processing pathway.
Collapse
Affiliation(s)
- Yue Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, Missouri 63110
| |
Collapse
|
149
|
Zhang Q, Zhang Y, Liu N, Sun X. Understanding of facial features in face perception: insights from deep convolutional neural networks. Front Comput Neurosci 2024; 18:1209082. [PMID: 38655070 PMCID: PMC11035738 DOI: 10.3389/fncom.2024.1209082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Introduction Face recognition has been a longstanding subject of interest in the fields of cognitive neuroscience and computer vision research. One key focus has been to understand the relative importance of different facial features in identifying individuals. Previous studies in humans have demonstrated the crucial role of eyebrows in face recognition, potentially even surpassing the importance of the eyes. However, eyebrows are not only vital for face recognition but also play a significant role in recognizing facial expressions and intentions, which might occur simultaneously and influence the face recognition process. Methods To address these challenges, our current study aimed to leverage the power of deep convolutional neural networks (DCNNs), an artificial face recognition system, which can be specifically tailored for face recognition tasks. In this study, we investigated the relative importance of various facial features in face recognition by selectively blocking feature information from the input to the DCNN. Additionally, we conducted experiments in which we systematically blurred the information related to eyebrows to varying degrees. Results Our findings aligned with previous human research, revealing that eyebrows are the most critical feature for face recognition, followed by eyes, mouth, and nose, in that order. The results demonstrated that the presence of eyebrows was more crucial than their specific high-frequency details, such as edges and textures, compared to other facial features, where the details also played a significant role. Furthermore, our results revealed that, unlike other facial features, the activation map indicated that the significance of eyebrows areas could not be readily adjusted to compensate for the absence of eyebrow information. This finding explains why masking eyebrows led to more significant deficits in face recognition performance. Additionally, we observed a synergistic relationship among facial features, providing evidence for holistic processing of faces within the DCNN. Discussion Overall, our study sheds light on the underlying mechanisms of face recognition and underscores the potential of using DCNNs as valuable tools for further exploration in this field.
Collapse
Affiliation(s)
- Qianqian Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Yueyi Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Ning Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyan Sun
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| |
Collapse
|
150
|
Lahner B, Mohsenzadeh Y, Mullin C, Oliva A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol 2024; 22:e3002564. [PMID: 38557761 PMCID: PMC10984539 DOI: 10.1371/journal.pbio.3002564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Yalda Mohsenzadeh
- The Brain and Mind Institute, The University of Western Ontario, London, Canada
- Department of Computer Science, The University of Western Ontario, London, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Caitlin Mullin
- Vision: Science to Application (VISTA), York University, Toronto, Ontario, Canada
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|