1
|
Carboni L, Nwaigwe D, Mainsant M, Bayle R, Reyboz M, Mermillod M, Dojat M, Achard S. Exploring continual learning strategies in artificial neural networks through graph-based analysis of connectivity: Insights from a brain-inspired perspective. Neural Netw 2025; 185:107125. [PMID: 39847940 DOI: 10.1016/j.neunet.2025.107125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 11/24/2024] [Accepted: 01/02/2025] [Indexed: 01/25/2025]
Abstract
Artificial Neural Networks (ANNs) aim at mimicking information processing in biological networks. In cognitive neuroscience, graph modeling is a powerful framework widely used to study brain structural and functional connectivity. Yet, the extension of graph modeling to ANNs has been poorly explored especially in terms of functional connectivity (i.e. the contextual change of the activity's units in networks). In the perspective of designing more robust and interpretable ANNs, we study how a brain-inspired graph-based approach can be extended and used to investigate ANN properties and behaviors. We focus our study on different continual learning strategies inspired by the biological mechanisms and modeled with ANNs. We show that graph modeling offers a simple and elegant framework to deeply investigate ANNs, compare their performances, and explore deleterious behaviors such as catastrophic forgetting.
Collapse
Affiliation(s)
- Lucrezia Carboni
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France
| | - Dwight Nwaigwe
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France
| | - Marion Mainsant
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France; Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Raphael Bayle
- Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Marina Reyboz
- Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Martial Mermillod
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Michel Dojat
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France.
| | - Sophie Achard
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
2
|
Wang EY, Fahey PG, Ding Z, Papadopoulos S, Ponder K, Weis MA, Chang A, Muhammad T, Patel S, Ding Z, Tran D, Fu J, Schneider-Mizell CM, Reid RC, Collman F, da Costa NM, Franke K, Ecker AS, Reimer J, Pitkow X, Sinz FH, Tolias AS. Foundation model of neural activity predicts response to new stimulus types. Nature 2025; 640:470-477. [PMID: 40205215 PMCID: PMC11981942 DOI: 10.1038/s41586-025-08829-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 02/21/2025] [Indexed: 04/11/2025]
Abstract
The complexity of neural circuits makes it challenging to decipher the brain's algorithms of intelligence. Recent breakthroughs in deep learning have produced models that accurately simulate brain activity, enhancing our understanding of the brain's computational objectives and neural coding. However, it is difficult for such models to generalize beyond their training distribution, limiting their utility. The emergence of foundation models1 trained on vast datasets has introduced a new artificial intelligence paradigm with remarkable generalization capabilities. Here we collected large amounts of neural activity from visual cortices of multiple mice and trained a foundation model to accurately predict neuronal responses to arbitrary natural videos. This model generalized to new mice with minimal training and successfully predicted responses across various new stimulus domains, such as coherent motion and noise patterns. Beyond neural response prediction, the model also accurately predicted anatomical cell types, dendritic features and neuronal connectivity within the MICrONS functional connectomics dataset2. Our work is a crucial step towards building foundation models of the brain. As neuroscience accumulates larger, multimodal datasets, foundation models will reveal statistical regularities, enable rapid adaptation to new tasks and accelerate research.
Collapse
Affiliation(s)
- Eric Y Wang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Paul G Fahey
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Zhuokun Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Stelios Papadopoulos
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Kayla Ponder
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Marissa A Weis
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
| | - Andersen Chang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Taliah Muhammad
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Saumil Patel
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Zhiwei Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Dat Tran
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Jiakun Fu
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | | | - R Clay Reid
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Katrin Franke
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| | - Jacob Reimer
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Xaq Pitkow
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Fabian H Sinz
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Andreas S Tolias
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Bio-X, Stanford University, Stanford, CA, USA.
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
3
|
Scott H, Murphy AJ, Briggs F, Snyder AC. Using Generative Models of Naturalistic Scenes to Sample Neural Population Tuning Manifolds. Eur J Neurosci 2025; 61:e70088. [PMID: 40162802 DOI: 10.1111/ejn.70088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 03/04/2025] [Accepted: 03/15/2025] [Indexed: 04/02/2025]
Abstract
Investigations into sensory coding in the visual system have typically relied on the use of either simple, unnatural visual stimuli or natural images. Simple stimuli, such as Gabor patches, have been effective when looking at single neurons in early visual areas such as V1 but seldom produce large responses from mid-level visual neurons or neural populations with diverse tuning. Many types of "naturalistic" image models have been developed recently, which bridge the gap between overly simple stimuli and experimentally infeasible natural images. These stimuli can vary along a large number of feature dimensions, introducing new challenges when trying to map those features to neural activity. This "curse of dimensionality" is exacerbated when neural responses are themselves high dimensional, such as when recording neural populations with implanted multielectrode arrays. We propose a method that searches high-dimensional stimulus spaces for characterizing neural population manifolds in a closed-loop experimental design. Stimuli were generated using a deep neural network in each block by using neural responses to previous stimuli to make predictions about the relationship between the latent space of the image model and neural responses. We found that these latent variables from the deep generative image model explained stronger linear relationships with neural activity than various alternative forms of image compression. This result reinforces the potential for deep generative image models for efficient characterization of high-dimensional tuning manifolds for visual neural populations.
Collapse
Affiliation(s)
- Hayden Scott
- Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA
- Center for Visual Science, University of Rochester, Rochester, New York, USA
| | - Allison J Murphy
- Center for Visual Science, University of Rochester, Rochester, New York, USA
- Neuroscience, University of Rochester Medical Center, Rochester, New York, USA
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, Maryland, USA
| | - Farran Briggs
- Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA
- Center for Visual Science, University of Rochester, Rochester, New York, USA
- Neuroscience, University of Rochester Medical Center, Rochester, New York, USA
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, Maryland, USA
| | - Adam C Snyder
- Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA
- Center for Visual Science, University of Rochester, Rochester, New York, USA
- Neuroscience, University of Rochester Medical Center, Rochester, New York, USA
| |
Collapse
|
4
|
Altavini TS, Chen M, Astorga G, Yan Y, Li W, Freiwald W, Gilbert CD. Expectation-dependent stimulus selectivity in the ventral visual cortical pathway. Proc Natl Acad Sci U S A 2025; 122:e2406684122. [PMID: 40146852 PMCID: PMC12002251 DOI: 10.1073/pnas.2406684122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 02/13/2025] [Indexed: 03/29/2025] Open
Abstract
The hierarchical view of the ventral object recognition pathway is primarily based on feedforward mechanisms, starting from a fixed basis set of object primitives and ending on a representation of whole objects in the inferotemporal cortex. Here, we provide a different view. Rather than being a fixed "labeled line" for a specific feature, neurons are continually changing their stimulus selectivities on a moment-to-moment basis, as dictated by top-down influences of object expectation and perceptual task. Here, we also derive the selectivity for stimulus features from an ethologically curated stimulus set, based on a delayed match-to-sample task, that finds components that are informative for object recognition in addition to full objects, though the top-down effects were seen for both informative and uninformative components. Cortical areas responding to these stimuli were identified with functional MRI in order to guide placement of chronically implanted electrode arrays.
Collapse
Affiliation(s)
- Tiago S. Altavini
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Minggui Chen
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Guadalupe Astorga
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Yin Yan
- Beijing Normal University, Beijing100875, China
| | - Wu Li
- Beijing Normal University, Beijing100875, China
| | - Winrich Freiwald
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Charles D. Gilbert
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| |
Collapse
|
5
|
An NM, Roh H, Kim S, Kim JH, Im M. Machine Learning Techniques for Simulating Human Psychophysical Testing of Low-Resolution Phosphene Face Images in Artificial Vision. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2405789. [PMID: 39985243 PMCID: PMC12005743 DOI: 10.1002/advs.202405789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 01/18/2025] [Indexed: 02/24/2025]
Abstract
To evaluate the quality of artificial visual percepts generated by emerging methodologies, researchers often rely on labor-intensive and tedious human psychophysical experiments. These experiments necessitate repeated iterations upon any major/minor modifications in the hardware/software configurations. Here, the capacity of standard machine learning (ML) models is investigated to accurately replicate quaternary match-to-sample tasks using low-resolution facial images represented by arrays of phosphenes as input stimuli. Initially, the performance of the ML models trained to approximate innate human facial recognition abilities across a dataset comprising 3600 phosphene images of human faces is analyzed. Subsequently, due to the time constraints and the potential for subject fatigue, the psychophysical test is limited to presenting only 720 low-resolution phosphene images to 36 human subjects. Notably, the superior model adeptly mirrors the behavioral trend of human subjects, offering precise predictions for 8 out of 9 phosphene quality levels on the overlapping test queries. Subsequently, human recognition performances for untested phosphene images are predicted, streamlining the process and minimizing the need for additional psychophysical tests. The findings underscore the transformative potential of ML in reshaping the research paradigm of visual prosthetics, facilitating the expedited advancement of prostheses.
Collapse
Affiliation(s)
- Na Min An
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Present address:
Kim Jaechul Graduate School of AIKAISTSeoul02455Republic of Korea
| | - Hyeonhee Roh
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
| | - Sein Kim
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
| | - Jae Hun Kim
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Sensor System Research CenterAdvanced Materials and Systems Research DivisionKISTSeoul02792Republic of Korea
| | - Maesoon Im
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Division of Bio‐Medical Science and TechnologyUniversity of Science and Technology (UST)Seoul02792Republic of Korea
- KHU‐KIST Department of Converging Science and TechnologyKyung Hee UniversitySeoul02447Republic of Korea
| |
Collapse
|
6
|
Zhang J, Cao R, Zhu X, Zhou H, Wang S. Distinct attentional characteristics of neurons with visual feature coding in the primate brain. SCIENCE ADVANCES 2025; 11:eadq0332. [PMID: 40117351 PMCID: PMC11927616 DOI: 10.1126/sciadv.adq0332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 02/14/2025] [Indexed: 03/23/2025]
Abstract
Visual attention and object recognition are two critical cognitive functions that shape our perception of the world. While these neural processes converge in the temporal cortex, the nature of their interactions remains largely unclear. Here, we systematically investigated the interplay between visual attention and stimulus feature coding by training macaques to perform a free-gaze visual search task with natural stimuli. Recording from a large number of units across multiple brain areas, we found that units exhibiting visual feature coding showed stronger attentional modulation of responses and spike-local field potential coherence than units without feature coding. Across brain areas, attention directed toward search targets enhanced the neuronal pattern separation of stimuli, with this enhancement more pronounced for units encoding visual features. Together, our results suggest a complex interplay between visual feature and attention coding in the primate brain, likely driven by interactions between brain areas engaged in these processes.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
- Peng Cheng Laboratory, Shenzhen 518000, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Xiaocang Zhu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Huihui Zhou
- Peng Cheng Laboratory, Shenzhen 518000, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
7
|
Jang G, Kragel PA. Understanding human amygdala function with artificial neural networks. J Neurosci 2025; 45:e1436242025. [PMID: 40086868 PMCID: PMC12044042 DOI: 10.1523/jneurosci.1436-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 01/07/2025] [Accepted: 01/16/2025] [Indexed: 03/16/2025] Open
Abstract
The amygdala is a cluster of subcortical nuclei that receives diverse sensory inputs and projects to the cortex, midbrain, and other subcortical structures. Numerous accounts of amygdalar contributions to social and emotional behavior have been offered, yet an overarching description of amygdala function remains elusive. Here we adopt a computationally explicit framework that aims to develop a model of amygdala function based on the types of sensory inputs it receives, rather than individual constructs such as threat, arousal, or valence. Characterizing human fMRI signal acquired as male and female participants viewed a full-length film, we developed encoding models that predict both patterns of amygdala activity and self-reported valence evoked by naturalistic images. We use deep image synthesis to generate artificial stimuli that distinctly engage encoding models of amygdala subregions that systematically differ from one another in terms of their low-level visual properties. These findings characterize how the amygdala compresses high-dimensional sensory inputs into low-dimensional representations relevant for behavior.Significance Statement The amygdala is a cluster of subcortical nuclei critical for motivation, emotion, and social behavior. Characterizing the contribution of the amygdala to behavior has been challenging due to its structural complexity, broad connectivity, and functional heterogeneity. Here we use a combination of human neuroimaging and computational modeling to investigate how visual inputs relate to low-dimensional representations encoded in the amygdala. We find that the amygdala encodes an array of visual features, which systematically vary across specific nuclei and relate to the affective properties of the sensory environment.
Collapse
|
8
|
Jung T, Zeng N, Fabbri JD, Eichler G, Li Z, Zabeh E, Das A, Willeke K, Wingel KE, Dubey A, Huq R, Sharma M, Hu Y, Ramakrishnan G, Tien K, Mantovani P, Parihar A, Yin H, Oswalt D, Misdorp A, Uguz I, Shinn T, Rodriguez GJ, Nealley C, Sanborn S, Gonzales I, Roukes M, Knecht J, Yoshor D, Canoll P, Spinazzi E, Carloni LP, Pesaran B, Patel S, Jacobs J, Youngerman B, Cotton RJ, Tolias A, Shepard KL. Stable, chronic in-vivo recordings from a fully wireless subdural-contained 65,536-electrode brain-computer interface device. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.05.17.594333. [PMID: 38798494 PMCID: PMC11118429 DOI: 10.1101/2024.05.17.594333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Minimally invasive, high-bandwidth brain-computer-interface (BCI) devices can revolutionize human applications. With orders-of-magnitude improvements in volumetric efficiency over other BCI technologies, we developed a 50-μm-thick, mechanically flexible micro-electrocorticography (μECoG) BCI, integrating a 256×256 array of electrodes, signal processing, data telemetry, and wireless powering on a single complementary metal-oxide-semiconductor (CMOS) substrate containing 65,536 recording channels, from which we can simultaneously record a selectable subset of up to 1024 channels at a given time. Fully implanted below the dura, our chip is wirelessly powered, communicating bi-directionally with an external relay station outside the body. We demonstrated chronic, reliable recordings for up to two weeks in pigs and up to two months in behaving non-human primates from somatosensory, motor, and visual cortices, decoding brain signals at high spatiotemporal resolution.
Collapse
Affiliation(s)
- Taesung Jung
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Nanyu Zeng
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Jason D. Fabbri
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Guy Eichler
- Department of Computer Science, Columbia University; New York, NY 10027, USA
| | - Zhe Li
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
| | - Erfan Zabeh
- Department of Biomedical Engineering, Columbia University; New York, NY 10027, USA
| | - Anup Das
- Department of Biomedical Engineering, Columbia University; New York, NY 10027, USA
| | - Konstantin Willeke
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen; Germany
| | - Katie E. Wingel
- Center for Neural Science, New York University; New York, NY 10003, USA
- Department of Neurosurgery, University of Pennsylvania; Philadelphia PA 19118, USA
| | - Agrita Dubey
- Center for Neural Science, New York University; New York, NY 10003, USA
- Department of Neurosurgery, University of Pennsylvania; Philadelphia PA 19118, USA
| | - Rizwan Huq
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Mohit Sharma
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Yaoxing Hu
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Girish Ramakrishnan
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Kevin Tien
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Paolo Mantovani
- Department of Computer Science, Columbia University; New York, NY 10027, USA
| | - Abhinav Parihar
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Heyu Yin
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Denise Oswalt
- Department of Neurosurgery, University of Pennsylvania; Philadelphia PA 19118, USA
- Department of Neuroscience, University of Pennsylvania; Philadelphia, PA 19118, USA
- Department of Bioengineering, University of Pennsylvania; Philadelphia, PA 19118, USA
| | - Alexander Misdorp
- Department of Computer Science, Columbia University; New York, NY 10027, USA
| | - Ilke Uguz
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
| | - Tori Shinn
- Department of Bioengineering, University of Pennsylvania; Philadelphia, PA 19118, USA
| | - Gabrielle J. Rodriguez
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
| | - Cate Nealley
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
| | - Sophia Sanborn
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
| | - Ian Gonzales
- Department of Neurological Surgery, Columbia University; New York, NY 10032, USA
| | - Michael Roukes
- Departments of Physics, Applied Physics, and Bioengineering, Caltech; Pasadena, CA 91125, USA
| | - Jeffrey Knecht
- Lincoln Laboratory, Massachusetts Institute of Technology; Lexington, MA 02421, USA
| | - Daniel Yoshor
- Department of Neurosurgery, University of Pennsylvania; Philadelphia PA 19118, USA
| | - Peter Canoll
- Department of Pathology and Cell Biology, Columbia University; New York, NY 10032, USA
| | - Eleonora Spinazzi
- Department of Neurological Surgery, Columbia University; New York, NY 10032, USA
| | - Luca P. Carloni
- Department of Computer Science, Columbia University; New York, NY 10027, USA
| | - Bijan Pesaran
- Center for Neural Science, New York University; New York, NY 10003, USA
- Department of Neurosurgery, University of Pennsylvania; Philadelphia PA 19118, USA
- Department of Neuroscience, University of Pennsylvania; Philadelphia, PA 19118, USA
- Department of Bioengineering, University of Pennsylvania; Philadelphia, PA 19118, USA
| | - Saumil Patel
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
| | - Joshua Jacobs
- Department of Biomedical Engineering, Columbia University; New York, NY 10027, USA
- Department of Neurological Surgery, Columbia University; New York, NY 10032, USA
| | - Brett Youngerman
- Department of Neurological Surgery, Columbia University; New York, NY 10032, USA
| | - R. James Cotton
- Shirley Ryan Ability Labs; Chicago, IL, USA
- Department of Physical Medicine and Rehabilitation, Northwestern University; Chicago, IL, USA
| | - Andreas Tolias
- Department of Ophthalmology, Byers Eye Institute, Stanford University; Stanford, CA 94305, USA
- Stanford Bio-X, Stanford University, Stanford University; Stanford, CA 94304, USA
- Wu Tsai Neurosciences Institute, Stanford University; Stanford, CA 94304, USA
- Center for Neuroscience and Artificial Intelligence, Department of Neuroscience, Baylor College of Medicine; Houston, TX 77030, USA
- Department of Electrical Engineering, Stanford University; Stanford, CA 94304, USA
| | - Kenneth L. Shepard
- Department of Electrical Engineering, Columbia University; New York, NY 10027, USA
- Department of Biomedical Engineering, Columbia University; New York, NY 10027, USA
- Department of Neurological Surgery, Columbia University; New York, NY 10032, USA
| |
Collapse
|
9
|
Papale P, Wang F, Self MW, Roelfsema PR. An extensive dataset of spiking activity to reveal the syntax of the ventral stream. Neuron 2025; 113:539-553.e5. [PMID: 39809277 DOI: 10.1016/j.neuron.2024.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/16/2024] [Accepted: 12/03/2024] [Indexed: 01/16/2025]
Abstract
Visual neuroscience benefits from high-quality datasets with neuronal responses to many images. Several neuroimaging datasets have been published in recent years, but no comparable dataset with spiking activity exists. Here, we introduce the THINGS ventral stream spiking dataset (TVSD). We extensively sampled neuronal activity in response to >25,000 natural images from the THINGS database in macaques, using high-channel-count implants in three key cortical regions: primary visual cortex (V1), V4, and the inferotemporal cortex. We showcase the utility of TVSD by using an artificial neural network to visualize the tuning of neurons. We also characterize the correlated fluctuations in activity within and between areas and demonstrate that these noise correlations are strongest between neurons with similar tuning. The TVSD allows researchers to answer many questions about neuronal tuning, analyze the interactions within and between cortical regions, and compare spiking activity in monkeys to human neuroimaging data.
Collapse
Affiliation(s)
- Paolo Papale
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands.
| | - Feng Wang
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands
| | - Matthew W Self
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands
| | - Pieter R Roelfsema
- Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), 1105 BA Amsterdam, the Netherlands; Department of Integrative Neurophysiology, VU University, De Boelelaan 1085, 1081 HV Amsterdam, the Netherlands; Department of Neurosurgery, Academic Medical Centre, Postbus 22660, 1100 DD Amsterdam, the Netherlands; Laboratory of Visual Brain Therapy, Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, 75012 Paris, France.
| |
Collapse
|
10
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgments of natural stimuli. Sci Rep 2025; 15:5316. [PMID: 39939679 PMCID: PMC11821992 DOI: 10.1038/s41598-025-88910-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/31/2025] [Indexed: 02/14/2025] Open
Abstract
In natural visually guided behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on blank backgrounds. Natural images, however, contain task-irrelevant background elements that might interfere with the perception of object features. Recent studies suggest that visual feature estimation can be modeled through the linear decoding of task-relevant information from visual cortex. So, if the representations of task-relevant and irrelevant features are not orthogonal in the neural population, then variation in the task-irrelevant features would impair task performance. We tested this hypothesis using human psychophysics and monkey neurophysiology combined with parametrically variable naturalistic stimuli. We demonstrate that (1) the neural representation of one feature (the position of an object) in visual area V4 is orthogonal to those of several background features, (2) the ability of human observers to precisely judge object position was largely unaffected by those background features, and (3) many features of the object and the background (and of objects from a separate stimulus set) are orthogonally represented in V4 neural population responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of object features despite the richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - Amy M Ni
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marlene R Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
11
|
Cao R, Brunner P, Chakravarthula PN, Wahlstrom KL, Inman C, Smith EH, Li X, Mamelak AN, Brandmeir NJ, Rutishauser U, Willie JT, Wang S. A neuronal code for object representation and memory in the human amygdala and hippocampus. Nat Commun 2025; 16:1510. [PMID: 39929825 PMCID: PMC11811184 DOI: 10.1038/s41467-025-56793-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 01/29/2025] [Indexed: 02/13/2025] Open
Abstract
How the brain encodes, recognizes, and memorizes general visual objects is a fundamental question in neuroscience. Here, we investigated the neural processes underlying visual object perception and memory by recording from 3173 single neurons in the human amygdala and hippocampus across four experiments. We employed both passive-viewing and recognition memory tasks involving a diverse range of naturalistic object stimuli. Our findings reveal a region-based feature code for general objects, where neurons exhibit receptive fields in the high-level visual feature space. This code can be validated by independent new stimuli and replicated across all experiments, including fixation-based analyses with large natural scenes. This region code explains the long-standing visual category selectivity, preferentially enhances memory of encoded stimuli, predicts memory performance, encodes image memorability, and exhibits intricate interplay with memory contexts. Together, region-based feature coding provides an important mechanism for visual object processing in the human brain.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, USA.
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA
| | | | | | - Cory Inman
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Elliot H Smith
- Department of Neurosurgery, University of Utah, Salt Lake City, UT, USA
| | - Xin Li
- Department of Computer Science, University at Albany, Albany, NY, USA
| | - Adam N Mamelak
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Ueli Rutishauser
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA.
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
12
|
Greco A, Siegel M. A spatiotemporal style transfer algorithm for dynamic visual stimulus generation. NATURE COMPUTATIONAL SCIENCE 2025; 5:155-169. [PMID: 39706876 PMCID: PMC11860245 DOI: 10.1038/s43588-024-00746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 11/21/2024] [Indexed: 12/23/2024]
Abstract
Understanding how visual information is encoded in biological and artificial systems often requires the generation of appropriate stimuli to test specific hypotheses, but available methods for video generation are scarce. Here we introduce the spatiotemporal style transfer (STST) algorithm, a dynamic visual stimulus generation framework that allows the manipulation and synthesis of video stimuli for vision research. We show how stimuli can be generated that match the low-level spatiotemporal features of their natural counterparts, but lack their high-level semantic features, providing a useful tool to study object recognition. We used these stimuli to probe PredNet, a predictive coding deep network, and found that its next-frame predictions were not disrupted by the omission of high-level information, with human observers also confirming the preservation of low-level features and lack of high-level information in the generated stimuli. We also introduce a procedure for the independent spatiotemporal factorization of dynamic stimuli. Testing such factorized stimuli on humans and deep vision models suggests a spatial bias in how humans and deep vision models encode dynamic visual information. These results showcase potential applications of the STST algorithm as a versatile tool for dynamic stimulus generation in vision science.
Collapse
Affiliation(s)
- Antonino Greco
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.
- Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- MEG Center, University of Tübingen, Tübingen, Germany.
| | - Markus Siegel
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.
- Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- MEG Center, University of Tübingen, Tübingen, Germany.
- German Center for Mental Health (DZPG), Tübingen, Germany.
| |
Collapse
|
13
|
Cao R, Brunner P, Brandmeir NJ, Willie JT, Wang S. A human single-neuron dataset for object recognition. Sci Data 2025; 12:79. [PMID: 39814742 PMCID: PMC11735812 DOI: 10.1038/s41597-024-04265-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 12/09/2024] [Indexed: 01/18/2025] Open
Abstract
Object recognition is fundamental to how we interact with and interpret the world around us. The human amygdala and hippocampus play a key role in object recognition, contributing to both the encoding and retrieval of visual information. Here, we recorded single-neuron activity from the human amygdala and hippocampus when neurosurgical epilepsy patients performed a one-back task using naturalistic object stimuli. We employed two sets of naturalistic object images from leading datasets extensively used in primate neural recordings and computer vision models: we recorded 1204 neurons using the ImageNet stimuli, which included broader object categories (10 different images per category for 50 categories), and we recorded 512 neurons using the Microsoft COCO stimuli, which featured a higher number of images per category (50 different images per category for 10 categories). Together, our extensive dataset, offering the highest spatial and temporal resolution currently available in humans, will not only facilitate a comprehensive analysis of the neural correlates of object recognition but also provide valuable opportunities for training and validating computational models.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, 63110, USA.
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Nicholas J Brandmeir
- Department of Neurosurgery, West Virginia University, Morgantown, WV, 26506, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, 63110, USA.
| |
Collapse
|
14
|
Marczak-Czajka A, Redgrave T, Mitcheff M, Villano M, Czajka A. Assessment of human emotional reactions to visual stimuli "deep-dreamed" by artificial neural networks. Front Psychol 2024; 15:1509392. [PMID: 39776961 PMCID: PMC11703666 DOI: 10.3389/fpsyg.2024.1509392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
Introduction While the fact that visual stimuli synthesized by Artificial Neural Networks (ANN) may evoke emotional reactions is documented, the precise mechanisms that connect the strength and type of such reactions with the ways of how ANNs are used to synthesize visual stimuli are yet to be discovered. Understanding these mechanisms allows for designing methods that synthesize images attenuating or enhancing selected emotional states, which may provide unobtrusive and widely-applicable treatment of mental dysfunctions and disorders. Methods The Convolutional Neural Network (CNN), a type of ANN used in computer vision tasks which models the ways humans solve visual tasks, was applied to synthesize ("dream" or "hallucinate") images with no semantic content to maximize activations of neurons in precisely-selected layers in the CNN. The evoked emotions of 150 human subjects observing these images were self-reported on a two-dimensional scale (arousal and valence) utilizing self-assessment manikin (SAM) figures. Correlations between arousal and valence values and image visual properties (e.g., color, brightness, clutter feature congestion, and clutter sub-band entropy) as well as the position of the CNN's layers stimulated to obtain a given image were calculated. Results Synthesized images that maximized activations of some of the CNN layers led to significantly higher or lower arousal and valence levels compared to average subject's reactions. Multiple linear regression analysis found that a small set of selected image global visual features (hue, feature congestion, and sub-band entropy) are significant predictors of the measured arousal, however no statistically significant dependencies were found between image global visual features and the measured valence. Conclusion This study demonstrates that the specific method of synthesizing images by maximizing small and precisely-selected parts of the CNN used in this work may lead to synthesis of visual stimuli that enhance or attenuate emotional reactions. This method paves the way for developing tools that stimulate, in a non-invasive way, to support wellbeing (manage stress, enhance mood) and to assist patients with certain mental conditions by complementing traditional methods of therapeutic interventions.
Collapse
Affiliation(s)
- Agnieszka Marczak-Czajka
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Timothy Redgrave
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Mahsa Mitcheff
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Michael Villano
- Department of Psychology, University of Notre Dame, Notre Dame, IN, United States
| | - Adam Czajka
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| |
Collapse
|
15
|
Ramirez JG, Vanhoyland M, Ratan Murty NA, Decramer T, Van Paesschen W, Bracci S, Op de Beeck H, Kanwisher N, Janssen P, Theys T. Intracortical recordings reveal the neuronal selectivity for bodies and body parts in the human visual cortex. Proc Natl Acad Sci U S A 2024; 121:e2408871121. [PMID: 39652751 PMCID: PMC11665852 DOI: 10.1073/pnas.2408871121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 10/22/2024] [Indexed: 02/13/2025] Open
Abstract
Body perception plays a fundamental role in social cognition. Yet, the neural mechanisms underlying this process in humans remain elusive given the spatiotemporal constraints of functional imaging. Here, we present intracortical recordings of single- and multiunit spiking activity in two epilepsy surgery patients in or near the extrastriate body area, a critical region for body perception. Our recordings revealed a strong preference for human bodies over a large range of control stimuli. Notably, body selectivity was driven by a distinct selectivity for body parts. The observed body selectivity generalized to nonphotographic depictions of bodies including silhouettes and stick figures. Overall, our study provides unique neural data that bridge the gap between human neuroimaging and macaque electrophysiology studies, laying a solid foundation for computational models of human body processing.
Collapse
Affiliation(s)
- Jesus Garcia Ramirez
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
| | - Michael Vanhoyland
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - N. A. Ratan Murty
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- The Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Thomas Decramer
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Wim Van Paesschen
- Laboratory for Epilepsy Research, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Stefania Bracci
- Department of Psychology and Cognitive Science, University of Trento, Trento38068, Italy
| | - Hans Op de Beeck
- Laboratory for Biological Psychology, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- The Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Peter Janssen
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
| | - Tom Theys
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| |
Collapse
|
16
|
Pandey L, Lee D, Wood SMW, Wood JN. Parallel development of object recognition in newborn chicks and deep neural networks. PLoS Comput Biol 2024; 20:e1012600. [PMID: 39621774 DOI: 10.1371/journal.pcbi.1012600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 12/17/2024] [Accepted: 10/29/2024] [Indexed: 12/18/2024] Open
Abstract
How do newborns learn to see? We propose that visual systems are space-time fitters, meaning visual development can be understood as a blind fitting process (akin to evolution) in which visual systems gradually adapt to the spatiotemporal data distributions in the newborn's environment. To test whether space-time fitting is a viable theory for learning how to see, we performed parallel controlled-rearing experiments on newborn chicks and deep neural networks (DNNs), including CNNs and transformers. First, we raised newborn chicks in impoverished environments containing a single object, then simulated those environments in a video game engine. Second, we recorded first-person images from agents moving through the virtual animal chambers and used those images to train DNNs. Third, we compared the viewpoint-invariant object recognition performance of the chicks and DNNs. When DNNs received the same visual diet (training data) as chicks, the models developed common object recognition skills as chicks. DNNs that used time as a teaching signal-space-time fitters-also showed common patterns of successes and failures across the test viewpoints as chicks. Thus, DNNs can learn object recognition in the same impoverished environments as newborn animals. We argue that space-time fitters can serve as formal scientific models of newborn visual systems, providing image-computable models for studying how newborns learn to see from raw visual experiences.
Collapse
Affiliation(s)
- Lalit Pandey
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
| | - Donsuk Lee
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
| | - Samantha M W Wood
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
- Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America
- Department of Neuroscience, Indiana University, Bloomington, Indiana, United States of America
| | - Justin N Wood
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
- Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America
- Department of Neuroscience, Indiana University, Bloomington, Indiana, United States of America
- Center for the Integrated Study of Animal Behavior, Indiana University, Bloomington, Indiana, United States of America
| |
Collapse
|
17
|
Ahmed B, Downer JD, Malone BJ, Makin JG. Deep Neural Networks Explain Spiking Activity in Auditory Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.12.623280. [PMID: 39605715 PMCID: PMC11601425 DOI: 10.1101/2024.11.12.623280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
For static stimuli or at gross (~1-s) time scales, artificial neural networks (ANNs) that have been trained on challenging engineering tasks, like image classification and automatic speech recognition, are now the best predictors of neural responses in primate visual and auditory cortex. It is, however, unknown whether this success can be extended to spiking activity at fine time scales, which are particularly relevant to audition. Here we address this question with ANNs trained on speech audio, and acute multi-electrode recordings from the auditory cortex of squirrel monkeys. We show that layers of trained ANNs can predict the spike counts of neurons responding to speech audio and to monkey vocalizations at bin widths of 50 ms and below. For some neurons, the ANNs explain close to all of the explainable variance-much more than traditional spectrotemporal-receptive-field models, and more than untrained networks. Non-primary neurons tend to be more predictable by deeper layers of the ANNs, but there is much variation by neuron, which would be invisible to coarser recording modalities.
Collapse
Affiliation(s)
- Bilal Ahmed
- Elmore School of Electrical and Computer Engineering, Purdue University
| | - Joshua D Downer
- Otolaryngology and Head and Neck Surgery, University of California, San Francisco
| | - Brian J Malone
- Otolaryngology and Head and Neck Surgery, University of California, San Francisco
- Center for Neurscience, U.C. Davis
| | - Joseph G Makin
- Elmore School of Electrical and Computer Engineering, Purdue University
| |
Collapse
|
18
|
Papale P, Zuiderbaan W, Teeuwen RRM, Gilhuis A, Self MW, Roelfsema PR, Dumoulin SO. V1 neurons are tuned to perceptual borders in natural scenes. Proc Natl Acad Sci U S A 2024; 121:e2221623121. [PMID: 39495929 PMCID: PMC11572972 DOI: 10.1073/pnas.2221623121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Accepted: 09/30/2024] [Indexed: 11/06/2024] Open
Abstract
The visual system needs to identify perceptually relevant borders to segment complex natural scenes. The primary visual cortex (V1) is thought to extract local borders, and higher visual areas are thought to identify the perceptually relevant borders between objects and the background. To test this conjecture, we used natural images that had been annotated by human observers who marked the perceptually relevant borders. We assessed the effect of perceptual relevance on V1 responses using human neuroimaging, macaque electrophysiology, and computational modeling. We report that perceptually relevant borders elicit stronger responses in the early visual cortex than irrelevant ones, even if simple features, such as contrast and the energy of oriented filters, are matched. Moreover, V1 neurons discriminate perceptually relevant borders surprisingly fast, during the early feedforward-driven activity at a latency of ~50 ms, indicating that they are tuned to the features that characterize them. We also revealed a delayed, contextual effect that enhances the V1 responses that are elicited by perceptually relevant borders at a longer latency. Our results reveal multiple mechanisms that allow V1 neurons to infer the layout of objects in natural images.
Collapse
Affiliation(s)
- Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam1105 BA, Netherlands
- Momilab Research Unit, Institutions, Markets, Technologies School for Advanced Studies Lucca, Lucca55100, Italy
| | - Wietske Zuiderbaan
- Department of Computational Cognitive Neuroscience and Neuroimaging, Netherlands Institute for Neuroscience (Koninklijke Nederlandse Akademie van Wetenschappen), Amsterdam1105 BA, Netherlands
- Spinoza Centre for Neuroimaging, Amsterdam1105 BK, Netherlands
| | - Rob R. M. Teeuwen
- Department of Vision and Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam1105 BA, Netherlands
| | - Amparo Gilhuis
- Department of Vision and Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam1105 BA, Netherlands
| | - Matthew W. Self
- Department of Vision and Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam1105 BA, Netherlands
| | - Pieter R. Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam1105 BA, Netherlands
- Department of Integrative Neurophysiology, Vrije UniversiteitAmsterdam1081 HV, Netherlands
- Department of Neurosurgery, Academic Medical Centre, Amsterdam1100 DD, Netherlands
- Laboratory of Visual Brain Therapy, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Institut de la Vision, Sorbonne Université, ParisF-75012, France
| | - Serge O. Dumoulin
- Department of Computational Cognitive Neuroscience and Neuroimaging, Netherlands Institute for Neuroscience (Koninklijke Nederlandse Akademie van Wetenschappen), Amsterdam1105 BA, Netherlands
- Spinoza Centre for Neuroimaging, Amsterdam1105 BK, Netherlands
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam1181 BT, Netherlands
- Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht3584 CS, Netherlands
| |
Collapse
|
19
|
Jarvers C, Neumann H. Teaching deep networks to see shape: Lessons from a simplified visual world. PLoS Comput Biol 2024; 20:e1012019. [PMID: 39527647 PMCID: PMC11581402 DOI: 10.1371/journal.pcbi.1012019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 11/21/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Deep neural networks have been remarkably successful as models of the primate visual system. One crucial problem is that they fail to account for the strong shape-dependence of primate vision. Whereas humans base their judgements of category membership to a large extent on shape, deep networks rely much more strongly on other features such as color and texture. While this problem has been widely documented, the underlying reasons remain unclear. We design simple, artificial image datasets in which shape, color, and texture features can be used to predict the image class. By training networks from scratch to classify images with single features and feature combinations, we show that some network architectures are unable to learn to use shape features, whereas others are able to use shape in principle but are biased towards the other features. We show that the bias can be explained by the interactions between the weight updates for many images in mini-batch gradient descent. This suggests that different learning algorithms with sparser, more local weight changes are required to make networks more sensitive to shape and improve their capability to describe human vision.
Collapse
Affiliation(s)
- Christian Jarvers
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
| |
Collapse
|
20
|
Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat Commun 2024; 15:9383. [PMID: 39477923 PMCID: PMC11526138 DOI: 10.1038/s41467-024-53147-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/01/2024] [Indexed: 11/02/2024] Open
Abstract
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
Collapse
Affiliation(s)
- Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Kendrick N Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
21
|
Pavuluri A, Kohn A. The representational geometry for naturalistic textures in macaque V1 and V2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.18.619102. [PMID: 39484570 PMCID: PMC11526966 DOI: 10.1101/2024.10.18.619102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Our understanding of visual cortical processing has relied primarily on studying the selectivity of individual neurons in different areas. A complementary approach is to study how the representational geometry of neuronal populations differs across areas. Though the geometry is derived from individual neuronal selectivity, it can reveal encoding strategies difficult to infer from single neuron responses. In addition, recent theoretical work has begun to relate distinct functional objectives to different representational geometries. To understand how the representational geometry changes across stages of processing, we measured neuronal population responses in primary visual cortex (V1) and area V2 of macaque monkeys to an ensemble of synthetic, naturalistic textures. Responses were lower dimensional in V2 than V1, and there was a better alignment of V2 population responses to different textures. The representational geometry in V2 afforded better discriminability between out-of-sample textures. We performed complementary analyses of standard convolutional network models, which did not replicate the representational geometry of cortex. We conclude that there is a shift in the representational geometry between V1 and V2, with the V2 representation exhibiting features of a low-dimensional, systematic encoding of different textures and of different instantiations of each texture. Our results suggest that comparisons of representational geometry can reveal important transformations that occur across successive stages of visual processing.
Collapse
|
22
|
Mathis MW, Perez Rotondo A, Chang EF, Tolias AS, Mathis A. Decoding the brain: From neural representations to mechanistic models. Cell 2024; 187:5814-5832. [PMID: 39423801 PMCID: PMC11637322 DOI: 10.1016/j.cell.2024.08.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/29/2024] [Accepted: 08/26/2024] [Indexed: 10/21/2024]
Abstract
A central principle in neuroscience is that neurons within the brain act in concert to produce perception, cognition, and adaptive behavior. Neurons are organized into specialized brain areas, dedicated to different functions to varying extents, and their function relies on distributed circuits to continuously encode relevant environmental and body-state features, enabling other areas to decode (interpret) these representations for computing meaningful decisions and executing precise movements. Thus, the distributed brain can be thought of as a series of computations that act to encode and decode information. In this perspective, we detail important concepts of neural encoding and decoding and highlight the mathematical tools used to measure them, including deep learning methods. We provide case studies where decoding concepts enable foundational and translational science in motor, visual, and language processing.
Collapse
Affiliation(s)
- Mackenzie Weygandt Mathis
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland.
| | - Adriana Perez Rotondo
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland
| | - Edward F Chang
- Department of Neurological Surgery, UCSF, San Francisco, CA, USA
| | - Andreas S Tolias
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, CA, USA; Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Stanford BioX, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Alexander Mathis
- Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland
| |
Collapse
|
23
|
Höfling L, Szatko KP, Behrens C, Deng Y, Qiu Y, Klindt DA, Jessen Z, Schwartz GW, Bethge M, Berens P, Franke K, Ecker AS, Euler T. A chromatic feature detector in the retina signals visual context changes. eLife 2024; 13:e86860. [PMID: 39365730 PMCID: PMC11452179 DOI: 10.7554/elife.86860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 08/25/2024] [Indexed: 10/06/2024] Open
Abstract
The retina transforms patterns of light into visual feature representations supporting behaviour. These representations are distributed across various types of retinal ganglion cells (RGCs), whose spatial and temporal tuning properties have been studied extensively in many model organisms, including the mouse. However, it has been difficult to link the potentially nonlinear retinal transformations of natural visual inputs to specific ethological purposes. Here, we discover a nonlinear selectivity to chromatic contrast in an RGC type that allows the detection of changes in visual context. We trained a convolutional neural network (CNN) model on large-scale functional recordings of RGC responses to natural mouse movies, and then used this model to search in silico for stimuli that maximally excite distinct types of RGCs. This procedure predicted centre colour opponency in transient suppressed-by-contrast (tSbC) RGCs, a cell type whose function is being debated. We confirmed experimentally that these cells indeed responded very selectively to Green-OFF, UV-ON contrasts. This type of chromatic contrast was characteristic of transitions from ground to sky in the visual scene, as might be elicited by head or eye movements across the horizon. Because tSbC cells performed best among all RGC types at reliably detecting these transitions, we suggest a role for this RGC type in providing contextual information (i.e. sky or ground) necessary for the selection of appropriate behavioural responses to other stimuli, such as looming objects. Our work showcases how a combination of experiments with natural stimuli and computational modelling allows discovering novel types of stimulus selectivity and identifying their potential ethological relevance.
Collapse
Affiliation(s)
- Larissa Höfling
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
| | - Klaudia P Szatko
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
| | - Christian Behrens
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
| | - Yuyao Deng
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
| | - Yongrong Qiu
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
| | | | - Zachary Jessen
- Feinberg School of Medicine, Department of Ophthalmology, Northwestern UniversityChicagoUnited States
| | - Gregory W Schwartz
- Feinberg School of Medicine, Department of Ophthalmology, Northwestern UniversityChicagoUnited States
| | - Matthias Bethge
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
- Tübingen AI Center, University of TübingenTübingenGermany
| | - Philipp Berens
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
- Tübingen AI Center, University of TübingenTübingenGermany
- Hertie Institute for AI in Brain HealthTübingenGermany
| | - Katrin Franke
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of GöttingenGöttingenGermany
- Max Planck Institute for Dynamics and Self-OrganizationGöttingenGermany
| | - Thomas Euler
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Centre for Integrative Neuroscience, University of TübingenTübingenGermany
| |
Collapse
|
24
|
Zhang J, Zhou H, Wang S. Distinct visual processing networks for foveal and peripheral visual fields. Commun Biol 2024; 7:1259. [PMID: 39367101 PMCID: PMC11452663 DOI: 10.1038/s42003-024-06980-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 09/27/2024] [Indexed: 10/06/2024] Open
Abstract
Foveal and peripheral vision are two distinct modes of visual processing essential for navigating the world. However, it remains unclear if they engage different neural mechanisms and circuits within the visual attentional system. Here, we trained macaques to perform a free-gaze visual search task using natural face and object stimuli and recorded a large number of 14588 visually responsive units from a broadly distributed network of brain regions involved in visual attentional processing. Foveal and peripheral units had substantially different proportions across brain regions and exhibited systematic differences in encoding visual information and visual attention. The spike-local field potential (LFP) coherence of foveal units was more extensively modulated by both attention and visual selectivity, thus indicating differential engagement of the attention and visual coding network compared to peripheral units. Furthermore, we delineated the interaction and coordination between foveal and peripheral processing for spatial attention and saccade selection. Together, the systematic differences between foveal and peripheral processing provide valuable insights into how the brain processes and integrates visual information from different regions of the visual field.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, 63110, USA.
- Peng Cheng Laboratory, Shenzhen, 518000, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Huihui Zhou
- Peng Cheng Laboratory, Shenzhen, 518000, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, 63110, USA.
| |
Collapse
|
25
|
Reilly J, Goodwin JD, Lu S, Kozlov AS. Bidirectional generative adversarial representation learning for natural stimulus synthesis. J Neurophysiol 2024; 132:1156-1169. [PMID: 39196986 PMCID: PMC11495180 DOI: 10.1152/jn.00421.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 08/30/2024] Open
Abstract
Thousands of species use vocal signals to communicate with one another. Vocalizations carry rich information, yet characterizing and analyzing these complex, high-dimensional signals is difficult and prone to human bias. Moreover, animal vocalizations are ethologically relevant stimuli whose representation by auditory neurons is an important subject of research in sensory neuroscience. A method that can efficiently generate naturalistic vocalization waveforms would offer an unlimited supply of stimuli with which to probe neuronal computations. Although unsupervised learning methods allow for the projection of vocalizations into low-dimensional latent spaces learned from the waveforms themselves, and generative modeling allows for the synthesis of novel vocalizations for use in downstream tasks, we are not aware of any model that combines these tasks to synthesize naturalistic vocalizations in the waveform domain for stimulus playback. In this paper, we demonstrate BiWaveGAN: a bidirectional generative adversarial network (GAN) capable of learning a latent representation of ultrasonic vocalizations (USVs) from mice. We show that BiWaveGAN can be used to generate, and interpolate between, realistic vocalization waveforms. We then use these synthesized stimuli along with natural USVs to probe the sensory input space of mouse auditory cortical neurons. We show that stimuli generated from our method evoke neuronal responses as effectively as real vocalizations, and produce receptive fields with the same predictive power. BiWaveGAN is not restricted to mouse USVs but can be used to synthesize naturalistic vocalizations of any animal species and interpolate between vocalizations of the same or different species, which could be useful for probing categorical boundaries in representations of ethologically relevant auditory signals.NEW & NOTEWORTHY A new type of artificial neural network is presented that can be used to generate animal vocalization waveforms and interpolate between them to create new vocalizations. We find that our synthetic naturalistic stimuli drive auditory cortical neurons in the mouse equally well and produce receptive field features with the same predictive power as those obtained with natural mouse vocalizations, confirming the quality of the stimuli produced by the neural network.
Collapse
Affiliation(s)
- Johnny Reilly
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - John D Goodwin
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Sihao Lu
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Andriy S Kozlov
- Department of Bioengineering, Imperial College London, London, United Kingdom
| |
Collapse
|
26
|
Nielsen KJ, Connor CE. How Shape Perception Works, in Two Dimensions and Three Dimensions. Annu Rev Vis Sci 2024; 10:47-68. [PMID: 38848596 DOI: 10.1146/annurev-vision-112823-031607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The ventral visual pathway transforms retinal images into neural representations that support object understanding, including exquisite appreciation of precise 2D pattern shape and 3D volumetric shape. We articulate a framework for understanding the goals of this transformation and how they are achieved by neural coding at successive ventral pathway stages. The critical goals are (a) radical compression to make shape information communicable across axonal bundles and storable in memory, (b) explicit coding to make shape information easily readable by the rest of the brain and thus accessible for cognition and behavioral control, and (c) representational stability to maintain consistent perception across highly variable viewing conditions. We describe how each transformational step in ventral pathway vision serves one or more of these goals. This three-goal framework unifies discoveries about ventral shape processing into a neural explanation for our remarkable experience of shape as a vivid, richly detailed aspect of the natural world.
Collapse
Affiliation(s)
- Kristina J Nielsen
- Krieger Mind/Brain Institute and Department of Neuroscience, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Charles E Connor
- Krieger Mind/Brain Institute and Department of Neuroscience, Johns Hopkins University, Baltimore, Maryland, USA; ,
| |
Collapse
|
27
|
Kar K, DiCarlo JJ. The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates. Annu Rev Vis Sci 2024; 10:91-121. [PMID: 38950431 DOI: 10.1146/annurev-vision-112823-030616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Inferences made about objects via vision, such as rapid and accurate categorization, are core to primate cognition despite the algorithmic challenge posed by varying viewpoints and scenes. Until recently, the brain mechanisms that support these capabilities were deeply mysterious. However, over the past decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in these behavioral feats. Apart from fundamentally changing the landscape of artificial intelligence, modified versions of these ANN systems are the current leading scientific hypotheses of an integrated set of mechanisms in the primate ventral visual stream that support core object recognition. What separates brain-mapped versions of these systems from prior conceptual models is that they are sensory computable, mechanistic, anatomically referenced, and testable (SMART). In this article, we review and provide perspective on the brain mechanisms addressed by the current leading SMART models. We review their empirical brain and behavioral alignment successes and failures, discuss the next frontiers for an even more accurate mechanistic understanding, and outline the likely applications.
Collapse
Affiliation(s)
- Kohitij Kar
- Department of Biology, Centre for Vision Research, and Centre for Integrative and Applied Neuroscience, York University, Toronto, Ontario, Canada;
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, MIT Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
28
|
Wang EY, Fahey PG, Ding Z, Papadopoulos S, Ponder K, Weis MA, Chang A, Muhammad T, Patel S, Ding Z, Tran D, Fu J, Schneider-Mizell CM, Reid RC, Collman F, da Costa NM, Franke K, Ecker AS, Reimer J, Pitkow X, Sinz FH, Tolias AS. Foundation model of neural activity predicts response to new stimulus types and anatomy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.21.533548. [PMID: 36993435 PMCID: PMC10055288 DOI: 10.1101/2023.03.21.533548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The complexity of neural circuits makes it challenging to decipher the brain's algorithms of intelligence. Recent breakthroughs in deep learning have produced models that accurately simulate brain activity, enhancing our understanding of the brain's computational objectives and neural coding. However, these models struggle to generalize beyond their training distribution, limiting their utility. The emergence of foundation models, trained on vast datasets, has introduced a new AI paradigm with remarkable generalization capabilities. We collected large amounts of neural activity from visual cortices of multiple mice and trained a foundation model to accurately predict neuronal responses to arbitrary natural videos. This model generalized to new mice with minimal training and successfully predicted responses across various new stimulus domains, such as coherent motion and noise patterns. It could also be adapted to new tasks beyond neural prediction, accurately predicting anatomical cell types, dendritic features, and neuronal connectivity within the MICrONS functional connectomics dataset. Our work is a crucial step toward building foundation brain models. As neuroscience accumulates larger, multi-modal datasets, foundation models will uncover statistical regularities, enabling rapid adaptation to new tasks and accelerating research.
Collapse
Affiliation(s)
- Eric Y Wang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Paul G Fahey
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Zhuokun Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Stelios Papadopoulos
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Kayla Ponder
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Marissa A Weis
- Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany
| | - Andersen Chang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Taliah Muhammad
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Saumil Patel
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Zhiwei Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Dat Tran
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Jiakun Fu
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | | | - R Clay Reid
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Katrin Franke
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| | - Jacob Reimer
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
| | - Xaq Pitkow
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Fabian H Sinz
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
| | - Andreas S Tolias
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, US
- Stanford Bio-X, Stanford University, Stanford, CA, US
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, US
- Department of Electrical Engineering, Stanford University, Stanford, CA, US
| |
Collapse
|
29
|
Fu J, Pierzchlewicz PA, Willeke KF, Bashiri M, Muhammad T, Diamantaki M, Froudarakis E, Restivo K, Ponder K, Denfield GH, Sinz F, Tolias AS, Franke K. Heterogeneous orientation tuning in the primary visual cortex of mice diverges from Gabor-like receptive fields in primates. Cell Rep 2024; 43:114639. [PMID: 39167488 PMCID: PMC11463840 DOI: 10.1016/j.celrep.2024.114639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 06/19/2024] [Accepted: 07/31/2024] [Indexed: 08/23/2024] Open
Abstract
A key feature of neurons in the primary visual cortex (V1) of primates is their orientation selectivity. Recent studies using deep neural network models showed that the most exciting input (MEI) for mouse V1 neurons exhibit complex spatial structures that predict non-uniform orientation selectivity across the receptive field (RF), in contrast to the classical Gabor filter model. Using local patches of drifting gratings, we identified heterogeneous orientation tuning in mouse V1 that varied up to 90° across sub-regions of the RF. This heterogeneity correlated with deviations from optimal Gabor filters and was consistent across cortical layers and recording modalities (calcium vs. spikes). In contrast, model-synthesized MEIs for macaque V1 neurons were predominantly Gabor like, consistent with previous studies. These findings suggest that complex spatial feature selectivity emerges earlier in the visual pathway in mice than in primates. This may provide a faster, though less general, method of extracting task-relevant information.
Collapse
Affiliation(s)
- Jiakun Fu
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA
| | - Paweł A Pierzchlewicz
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany; Georg-August University Göttingen, Göttingen, Germany
| | - Konstantin F Willeke
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany; Georg-August University Göttingen, Göttingen, Germany
| | - Mohammad Bashiri
- Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany; Georg-August University Göttingen, Göttingen, Germany
| | - Taliah Muhammad
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA
| | - Maria Diamantaki
- Institute of Molecular Biology & Biotechnology, Foundation of Research & Technology - Hellas, Heraklion, Crete, Greece; School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Emmanouil Froudarakis
- Institute of Molecular Biology & Biotechnology, Foundation of Research & Technology - Hellas, Heraklion, Crete, Greece; School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Kelli Restivo
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kayla Ponder
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA
| | - George H Denfield
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fabian Sinz
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA; Institute for Bioinformatics and Medical Informatics, Tübingen University, Tübingen, Germany; Georg-August University Göttingen, Göttingen, Germany
| | - Andreas S Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA; Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA 94303, USA; Stanford Bio-X, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA; Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
| | - Katrin Franke
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX 77030, USA; Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA 94303, USA; Stanford Bio-X, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
30
|
Tuckute G, Kanwisher N, Fedorenko E. Language in Brains, Minds, and Machines. Annu Rev Neurosci 2024; 47:277-301. [PMID: 38669478 DOI: 10.1146/annurev-neuro-120623-101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties-their architecture, task performance, or training-are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
31
|
Jang G, Kragel PA. Understanding human amygdala function with artificial neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605621. [PMID: 39131372 PMCID: PMC11312467 DOI: 10.1101/2024.07.29.605621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
The amygdala is a cluster of subcortical nuclei that receives diverse sensory inputs and projects to the cortex, midbrain and other subcortical structures. Numerous accounts of amygdalar contributions to social and emotional behavior have been offered, yet an overarching description of amygdala function remains elusive. Here we adopt a computationally explicit framework that aims to develop a model of amygdala function based on the types of sensory inputs it receives, rather than individual constructs such as threat, arousal, or valence. Characterizing human fMRI signal acquired as participants viewed a full-length film, we developed encoding models that predict both patterns of amygdala activity and self-reported valence evoked by naturalistic images. We use deep image synthesis to generate artificial stimuli that distinctly engage encoding models of amygdala subregions that systematically differ from one another in terms of their low-level visual properties. These findings characterize how the amygdala compresses high-dimensional sensory inputs into low-dimensional representations relevant for behavior.
Collapse
|
32
|
Wang T, Lee TS, Yao H, Hong J, Li Y, Jiang H, Andolina IM, Tang S. Large-scale calcium imaging reveals a systematic V4 map for encoding natural scenes. Nat Commun 2024; 15:6401. [PMID: 39080309 PMCID: PMC11289446 DOI: 10.1038/s41467-024-50821-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Biological visual systems have evolved to process natural scenes. A full understanding of visual cortical functions requires a comprehensive characterization of how neuronal populations in each visual area encode natural scenes. Here, we utilized widefield calcium imaging to record V4 cortical response to tens of thousands of natural images in male macaques. Using this large dataset, we developed a deep-learning digital twin of V4 that allowed us to map the natural image preferences of the neural population at 100-µm scale. This detailed map revealed a diverse set of functional domains in V4, each encoding distinct natural image features. We validated these model predictions using additional widefield imaging and single-cell resolution two-photon imaging. Feature attribution analysis revealed that these domains lie along a continuum from preferring spatially localized shape features to preferring spatially dispersed surface features. These results provide insights into the organizing principles that govern natural scene encoding in V4.
Collapse
Affiliation(s)
- Tianye Wang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Tai Sing Lee
- Computer Science Department and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Haoxuan Yao
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Jiayi Hong
- Peking University School of Life Sciences, Beijing, 100871, China
| | - Yang Li
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Hongfei Jiang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Ian Max Andolina
- The Center for Excellence in Brain Science and Intelligence Technology, State Key Laboratory of Neuroscience, Key Laboratory of Primate Neurobiology, Institute of Neuroscience, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shiming Tang
- Peking University School of Life Sciences, Beijing, 100871, China.
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China.
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China.
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China.
| |
Collapse
|
33
|
Wu N, Valera I, Sinz F, Ecker A, Euler T, Qiu Y. Probabilistic neural transfer function estimation with Bayesian system identification. PLoS Comput Biol 2024; 20:e1012354. [PMID: 39083559 PMCID: PMC11318871 DOI: 10.1371/journal.pcbi.1012354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 08/12/2024] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Neural population responses in sensory systems are driven by external physical stimuli. This stimulus-response relationship is typically characterized by receptive fields, which have been estimated by neural system identification approaches. Such models usually require a large amount of training data, yet, the recording time for animal experiments is limited, giving rise to epistemic uncertainty for the learned neural transfer functions. While deep neural network models have demonstrated excellent power on neural prediction, they usually do not provide the uncertainty of the resulting neural representations and derived statistics, such as most exciting inputs (MEIs), from in silico experiments. Here, we present a Bayesian system identification approach to predict neural responses to visual stimuli, and explore whether explicitly modeling network weight variability can be beneficial for identifying neural response properties. To this end, we use variational inference to estimate the posterior distribution of each model weight given the training data. Tests with different neural datasets demonstrate that this method can achieve higher or comparable performance on neural prediction, with a much higher data efficiency compared to Monte Carlo dropout methods and traditional models using point estimates of the model parameters. At the same time, our variational method provides us with an effectively infinite ensemble, avoiding the idiosyncrasy of any single model, to generate MEIs. This allows us to estimate the uncertainty of stimulus-response function, which we have found to be negatively correlated with the predictive performance at model level and may serve to evaluate models. Furthermore, our approach enables us to identify response properties with credible intervals and to determine whether the inferred features are meaningful by performing statistical tests on MEIs. Finally, in silico experiments show that our model generates stimuli driving neuronal activity significantly better than traditional models in the limited-data regime.
Collapse
Affiliation(s)
- Nan Wu
- Department of Computer Science, Saarland University, Saarbrücken, Germany
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
| | - Isabel Valera
- Department of Computer Science, Saarland University, Saarbrücken, Germany
| | - Fabian Sinz
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
| | - Alexander Ecker
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| | - Thomas Euler
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
| | - Yongrong Qiu
- Institute for Ophthalmic Research and Centre for Integrative Neuroscience (CIN), Tübingen University, Tübingen, Germany
- Department of Computer Science and Campus Institute Data Science (CIDAS), Göttingen University, Göttingen, Germany
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, California, United State of America
- Stanford Bio-X, Stanford University, Stanford, California, United State of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United State of America
| |
Collapse
|
34
|
Zhang J, Zhou H, Wang S. Distinct visual processing networks for foveal and peripheral visual fields. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600415. [PMID: 38979165 PMCID: PMC11230199 DOI: 10.1101/2024.06.24.600415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Foveal and peripheral vision are two distinct modes of visual processing essential for navigating the world. However, it remains unclear if they engage different neural mechanisms and circuits within the visual attentional system. Here, we trained macaques to perform a free-gaze visual search task using natural face and object stimuli and recorded a large number of 14588 visually responsive neurons from a broadly distributed network of brain regions involved in visual attentional processing. Foveal and peripheral units had substantially different proportions across brain regions and exhibited systematic differences in encoding visual information and visual attention. The spike-LFP coherence of foveal units was more extensively modulated by both attention and visual selectivity, thus indicating differential engagement of the attention and visual coding network compared to peripheral units. Furthermore, we delineated the interaction and coordination between foveal and peripheral processing for spatial attention and saccade selection. Finally, the search became more efficient with increasing target-induced desynchronization, and foveal and peripheral units exhibited different correlations between neural responses and search behavior. Together, the systematic differences between foveal and peripheral processing provide valuable insights into how the brain processes and integrates visual information from different regions of the visual field. Significance Statement This study investigates the systematic differences between foveal and peripheral vision, two crucial components of visual processing essential for navigating our surroundings. By simultaneously recording from a large number of neurons in the visual attentional neural network, we revealed substantial variations in the proportion and functional characteristics of foveal and peripheral units across different brain regions. We uncovered differential modulation of functional connectivity by attention and visual selectivity, elucidated the intricate interplay between foveal and peripheral processing in spatial attention and saccade selection, and linked neural responses to search behavior. Overall, our study contributes to a deeper understanding of how the brain processes and integrates visual information for active visual behaviors.
Collapse
|
35
|
Zhang J, Cao R, Zhu X, Zhou H, Wang S. Distinct attentional profile and functional connectivity of neurons with visual feature coding in the primate brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600401. [PMID: 38979388 PMCID: PMC11230157 DOI: 10.1101/2024.06.24.600401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Visual attention and object recognition are two critical cognitive functions that significantly influence our perception of the world. While these neural processes converge on the temporal cortex, the exact nature of their interactions remains largely unclear. Here, we systematically investigated the interplay between visual attention and object feature coding by training macaques to perform a free-gaze visual search task using natural face and object stimuli. With a large number of units recorded from multiple brain areas, we discovered that units exhibiting visual feature coding displayed a distinct attentional response profile and functional connectivity compared to units not exhibiting feature coding. Attention directed towards search targets enhanced the pattern separation of stimuli across brain areas, and this enhancement was more pronounced for units encoding visual features. Our findings suggest two stages of neural processing, with the early stage primarily focused on processing visual features and the late stage dedicated to processing attention. Importantly, feature coding in the early stage could predict the attentional effect in the late stage. Together, our results suggest an intricate interplay between visual feature and attention coding in the primate brain, which can be attributed to the differential functional connectivity and neural networks engaged in these processes.
Collapse
|
36
|
Li Y, Yang H, Gu S. Enhancing neural encoding models for naturalistic perception with a multi-level integration of deep neural networks and cortical networks. Sci Bull (Beijing) 2024; 69:1738-1747. [PMID: 38490889 DOI: 10.1016/j.scib.2024.02.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 06/27/2023] [Accepted: 02/23/2024] [Indexed: 03/17/2024]
Abstract
Cognitive neuroscience aims to develop computational models that can accurately predict and explain neural responses to sensory inputs in the cortex. Recent studies attempt to leverage the representation power of deep neural networks (DNNs) to predict the brain response and suggest a correspondence between artificial and biological neural networks in their feature representations. However, typical voxel-wise encoding models tend to rely on specific networks designed for computer vision tasks, leading to suboptimal brain-wide correspondence during cognitive tasks. To address this challenge, this work proposes a novel approach that upgrades voxel-wise encoding models through multi-level integration of features from DNNs and information from brain networks. Our approach combines DNN feature-level ensemble learning and brain atlas-level model integration, resulting in significant improvements in predicting whole-brain neural activity during naturalistic video perception. Furthermore, this multi-level integration framework enables a deeper understanding of the brain's neural representation mechanism, accurately predicting the neural response to complex visual concepts. We demonstrate that neural encoding models can be optimized by leveraging a framework that integrates both data-driven approaches and theoretical insights into the functional structure of the cortical networks.
Collapse
Affiliation(s)
- Yuanning Li
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai 201210, China.
| | - Huzheng Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shi Gu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 518110, China.
| |
Collapse
|
37
|
Miao HY, Tong F. Convolutional neural network models applied to neuronal responses in macaque V1 reveal limited nonlinear processing. J Vis 2024; 24:1. [PMID: 38829629 PMCID: PMC11156204 DOI: 10.1167/jov.24.6.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/03/2024] [Indexed: 06/05/2024] Open
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple nonlinearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more nonlinear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven nonlinear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although the predictive accuracy of VGG-19 was somewhat better than that of standard AlexNet, we found that a modified version of AlexNet could match the performance of VGG-19 after only a few nonlinear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for nonlinear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few nonlinear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
38
|
Djambazovska S, Zafer A, Ramezanpour H, Kreiman G, Kar K. The Impact of Scene Context on Visual Object Recognition: Comparing Humans, Monkeys, and Computational Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596127. [PMID: 38854011 PMCID: PMC11160639 DOI: 10.1101/2024.05.27.596127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
During natural vision, we rarely see objects in isolation but rather embedded in rich and complex contexts. Understanding how the brain recognizes objects in natural scenes by integrating contextual information remains a key challenge. To elucidate neural mechanisms compatible with human visual processing, we need an animal model that behaves similarly to humans, so that inferred neural mechanisms can provide hypotheses relevant to the human brain. Here we assessed whether rhesus macaques could model human context-driven object recognition by quantifying visual object identification abilities across variations in the amount, quality, and congruency of contextual cues. Behavioral metrics revealed strikingly similar context-dependent patterns between humans and monkeys. However, neural responses in the inferior temporal (IT) cortex of monkeys that were never explicitly trained to discriminate objects in context, as well as current artificial neural network models, could only partially explain this cross-species correspondence. The shared behavioral variance unexplained by context-naive neural data or computational models highlights fundamental knowledge gaps. Our findings demonstrate an intriguing alignment of human and monkey visual object processing that defies full explanation by either brain activity in a key visual region or state-of-the-art models.
Collapse
Affiliation(s)
- Sara Djambazovska
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
- Children’s Hospital, Harvard Medical School, MA, USA
| | - Anaa Zafer
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | - Hamidreza Ramezanpour
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | | | - Kohitij Kar
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| |
Collapse
|
39
|
Caplette L, Turk-Browne NB. Computational reconstruction of mental representations using human behavior. Nat Commun 2024; 15:4183. [PMID: 38760341 PMCID: PMC11101448 DOI: 10.1038/s41467-024-48114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024] Open
Abstract
Revealing how the mind represents information is a longstanding goal of cognitive science. However, there is currently no framework for reconstructing the broad range of mental representations that humans possess. Here, we ask participants to indicate what they perceive in images made of random visual features in a deep neural network. We then infer associations between the semantic features of their responses and the visual features of the images. This allows us to reconstruct the mental representations of multiple visual concepts, both those supplied by participants and other concepts extrapolated from the same semantic space. We validate these reconstructions in separate participants and further generalize our approach to predict behavior for new stimuli and in a new task. Finally, we reconstruct the mental representations of individual observers and of a neural network. This framework enables a large-scale investigation of conceptual representations.
Collapse
Affiliation(s)
| | - Nicholas B Turk-Browne
- Department of Psychology, Yale University, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| |
Collapse
|
40
|
Fu J, Shrinivasan S, Baroni L, Ding Z, Fahey PG, Pierzchlewicz P, Ponder K, Froebe R, Ntanavara L, Muhammad T, Willeke KF, Wang E, Ding Z, Tran DT, Papadopoulos S, Patel S, Reimer J, Ecker AS, Pitkow X, Antolik J, Sinz FH, Haefner RM, Tolias AS, Franke K. Pattern completion and disruption characterize contextual modulation in the visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.13.532473. [PMID: 36993321 PMCID: PMC10054952 DOI: 10.1101/2023.03.13.532473] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Vision is fundamentally context-dependent, with neuronal responses influenced not just by local features but also by surrounding contextual information. In the visual cortex, studies using simple grating stimuli indicate that congruent stimuli - where the center and surround share the same orientation - are more inhibitory than when orientations are orthogonal, potentially serving redundancy reduction and predictive coding. Understanding these center-surround interactions in relation to natural image statistics is challenging due to the high dimensionality of the stimulus space, yet crucial for deciphering the neuronal code of real-world sensory processing. Utilizing large-scale recordings from mouse V1, we trained convolutional neural networks (CNNs) to predict and synthesize surround patterns that either optimally suppressed or enhanced responses to center stimuli, confirmed by in vivo experiments. Contrary to the notion that congruent stimuli are suppressive, we found that surrounds that completed patterns based on natural image statistics were facilitatory, while disruptive surrounds were suppressive. Applying our CNN image synthesis method in macaque V1, we discovered that pattern completion within the near surround occurred more frequently with excitatory than with inhibitory surrounds, suggesting that our results in mice are conserved in macaques. Further, experiments and model analyses confirmed previous studies reporting the opposite effect with grating stimuli in both species. Using the MICrONS functional connectomics dataset, we observed that neurons with similar feature selectivity formed excitatory connections regardless of their receptive field overlap, aligning with the pattern completion phenomenon observed for excitatory surrounds. Finally, our empirical results emerged in a normative model of perception implementing Bayesian inference, where neuronal responses are modulated by prior knowledge of natural scene statistics. In summary, our findings identify a novel relationship between contextual information and natural scene statistics and provide evidence for a role of contextual modulation in hierarchical inference.
Collapse
|
41
|
Cadena SA, Willeke KF, Restivo K, Denfield G, Sinz FH, Bethge M, Tolias AS, Ecker AS. Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks. PLoS Comput Biol 2024; 20:e1012056. [PMID: 38781156 PMCID: PMC11115319 DOI: 10.1371/journal.pcbi.1012056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 04/08/2024] [Indexed: 05/25/2024] Open
Abstract
Responses to natural stimuli in area V4-a mid-level area of the visual ventral stream-are well predicted by features from convolutional neural networks (CNNs) trained on image classification. This result has been taken as evidence for the functional role of V4 in object classification. However, we currently do not know if and to what extent V4 plays a role in solving other computational objectives. Here, we investigated normative accounts of V4 (and V1 for comparison) by predicting macaque single-neuron responses to natural images from the representations extracted by 23 CNNs trained on different computer vision tasks including semantic, geometric, 2D, and 3D types of tasks. We found that V4 was best predicted by semantic classification features and exhibited high task selectivity, while the choice of task was less consequential to V1 performance. Consistent with traditional characterizations of V4 function that show its high-dimensional tuning to various 2D and 3D stimulus directions, we found that diverse non-semantic tasks explained aspects of V4 function that are not captured by individual semantic tasks. Nevertheless, jointly considering the features of a pair of semantic classification tasks was sufficient to yield one of our top V4 models, solidifying V4's main functional role in semantic processing and suggesting that V4's selectivity to 2D or 3D stimulus properties found by electrophysiologists can result from semantic functional goals.
Collapse
Affiliation(s)
- Santiago A. Cadena
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
| | - Konstantin F. Willeke
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University Tübingen, Tübingen, Germany
| | - Kelli Restivo
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - George Denfield
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - Fabian H. Sinz
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- International Max Planck Research School for Intelligent Systems, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| | - Andreas S. Tolias
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, Texas, United States of America
| | - Alexander S. Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
42
|
Cowley BR, Calhoun AJ, Rangarajan N, Ireland E, Turner MH, Pillow JW, Murthy M. Mapping model units to visual neurons reveals population code for social behaviour. Nature 2024; 629:1100-1108. [PMID: 38778103 PMCID: PMC11136655 DOI: 10.1038/s41586-024-07451-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 04/19/2024] [Indexed: 05/25/2024]
Abstract
The rich variety of behaviours observed in animals arises through the interplay between sensory processing and motor control. To understand these sensorimotor transformations, it is useful to build models that predict not only neural responses to sensory input1-5 but also how each neuron causally contributes to behaviour6,7. Here we demonstrate a novel modelling approach to identify a one-to-one mapping between internal units in a deep neural network and real neurons by predicting the behavioural changes that arise from systematic perturbations of more than a dozen neuronal cell types. A key ingredient that we introduce is 'knockout training', which involves perturbing the network during training to match the perturbations of the real neurons during behavioural experiments. We apply this approach to model the sensorimotor transformations of Drosophila melanogaster males during a complex, visually guided social behaviour8-11. The visual projection neurons at the interface between the optic lobe and central brain form a set of discrete channels12, and prior work indicates that each channel encodes a specific visual feature to drive a particular behaviour13,14. Our model reaches a different conclusion: combinations of visual projection neurons, including those involved in non-social behaviours, drive male interactions with the female, forming a rich population code for behaviour. Overall, our framework consolidates behavioural effects elicited from various neural perturbations into a single, unified model, providing a map from stimulus to neuronal cell type to behaviour, and enabling future incorporation of wiring diagrams of the brain15 into the model.
Collapse
Affiliation(s)
- Benjamin R Cowley
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Adam J Calhoun
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Elise Ireland
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Maxwell H Turner
- Department of Neurobiology, Stanford University, Stanford, CA, USA
| | - Jonathan W Pillow
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Mala Murthy
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
43
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
44
|
Ren Y, Bashivan P. How well do models of visual cortex generalize to out of distribution samples? PLoS Comput Biol 2024; 20:e1011145. [PMID: 38820563 PMCID: PMC11216589 DOI: 10.1371/journal.pcbi.1011145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/01/2024] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs' object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
Collapse
Affiliation(s)
- Yifei Ren
- Department of Computer Science, McGill University, Montreal, Canada
| | - Pouya Bashivan
- Department of Computer Science, McGill University, Montreal, Canada
- Department of Computer Physiology, McGill University, Montreal, Canada
- Mila, Université de Montréal, Montreal, Canada
| |
Collapse
|
45
|
Melis JM, Siwanowicz I, Dickinson MH. Machine learning reveals the control mechanics of an insect wing hinge. Nature 2024; 628:795-803. [PMID: 38632396 DOI: 10.1038/s41586-024-07293-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/11/2024] [Indexed: 04/19/2024]
Abstract
Insects constitute the most species-rich radiation of metazoa, a success that is due to the evolution of active flight. Unlike pterosaurs, birds and bats, the wings of insects did not evolve from legs1, but are novel structures that are attached to the body via a biomechanically complex hinge that transforms tiny, high-frequency oscillations of specialized power muscles into the sweeping back-and-forth motion of the wings2. The hinge consists of a system of tiny, hardened structures called sclerites that are interconnected to one another via flexible joints and regulated by the activity of specialized control muscles. Here we imaged the activity of these muscles in a fly using a genetically encoded calcium indicator, while simultaneously tracking the three-dimensional motion of the wings with high-speed cameras. Using machine learning, we created a convolutional neural network3 that accurately predicts wing motion from the activity of the steering muscles, and an encoder-decoder4 that predicts the role of the individual sclerites on wing motion. By replaying patterns of wing motion on a dynamically scaled robotic fly, we quantified the effects of steering muscle activity on aerodynamic forces. A physics-based simulation incorporating our hinge model generates flight manoeuvres that are remarkably similar to those of free-flying flies. This integrative, multi-disciplinary approach reveals the mechanical control logic of the insect wing hinge, arguably among the most sophisticated and evolutionarily important skeletal structures in the natural world.
Collapse
Affiliation(s)
- Johan M Melis
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA
| | - Igor Siwanowicz
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - Michael H Dickinson
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
46
|
Jain S, Vo VA, Wehbe L, Huth AG. Computational Language Modeling and the Promise of In Silico Experimentation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:80-106. [PMID: 38645624 PMCID: PMC11025654 DOI: 10.1162/nol_a_00101] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/18/2023] [Indexed: 04/23/2024]
Abstract
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm-in silico experimentation using deep learning-based encoding models-that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
Collapse
Affiliation(s)
- Shailee Jain
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Vy A. Vo
- Brain-Inspired Computing Lab, Intel Labs, Hillsboro, OR, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alexander G. Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
- Department of Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
47
|
Deng K, Schwendeman PS, Guan Y. Predicting Single Neuron Responses of the Primary Visual Cortex with Deep Learning Model. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305626. [PMID: 38350735 PMCID: PMC11022733 DOI: 10.1002/advs.202305626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/03/2024] [Indexed: 02/15/2024]
Abstract
Modeling neuron responses to stimuli can shed light on next-generation technologies such as brain-chip interfaces. Furthermore, high-performing models can serve to help formulate hypotheses and reveal the mechanisms underlying neural responses. Here the state-of-the-art computational model is presented for predicting single neuron responses to natural stimuli in the primary visual cortex (V1) of mice. The algorithm incorporates object positions and assembles multiple models with different train-validation data, resulting in a 15%-30% improvement over the existing models in cross-subject predictions and ranking first in the SENSORIUM 2022 Challenge, which benchmarks methods for neuron-specific prediction based on thousands of images. Importantly, The model reveals evidence that the spatial organizations of V1 are conserved across mice. This model will serve as an important noninvasive tool for understanding and utilizing the response patterns of primary visual cortex neurons.
Collapse
Affiliation(s)
- Kaiwen Deng
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMI48105USA
| | | | - Yuanfang Guan
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMI48105USA
| |
Collapse
|
48
|
Marin Vargas A, Bisi A, Chiappa AS, Versteeg C, Miller LE, Mathis A. Task-driven neural network models predict neural dynamics of proprioception. Cell 2024; 187:1745-1761.e19. [PMID: 38518772 DOI: 10.1016/j.cell.2024.02.036] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/06/2023] [Accepted: 02/27/2024] [Indexed: 03/24/2024]
Abstract
Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task-driven modeling approach to investigate the neural code of proprioceptive neurons in cuneate nucleus (CN) and somatosensory cortex area 2 (S1). We simulated muscle spindle signals through musculoskeletal modeling and generated a large-scale movement repertoire to train neural networks based on 16 hypotheses, each representing different computational goals. We found that the emerging, task-optimized internal representations generalize from synthetic data to predict neural dynamics in CN and S1 of primates. Computational tasks that aim to predict the limb position and velocity were the best at predicting the neural activity in both areas. Since task optimization develops representations that better predict neural activity during active than passive movements, we postulate that neural activity in the CN and S1 is top-down modulated during goal-directed movements.
Collapse
Affiliation(s)
- Alessandro Marin Vargas
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Axel Bisi
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Alberto S Chiappa
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Chris Versteeg
- Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA; Shirley Ryan AbilityLab, Chicago, IL 60611, USA
| | - Lee E Miller
- Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA; Shirley Ryan AbilityLab, Chicago, IL 60611, USA
| | - Alexander Mathis
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| |
Collapse
|
49
|
Shen S, Sun Y, Lu J, Li C, Chen Q, Mo C, Fang F, Zhang X. Profiles of visual perceptual learning in feature space. iScience 2024; 27:109128. [PMID: 38384835 PMCID: PMC10879700 DOI: 10.1016/j.isci.2024.109128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/22/2024] [Accepted: 02/01/2024] [Indexed: 02/23/2024] Open
Abstract
Visual perceptual learning (VPL), experience-induced gains in discriminating visual features, has been studied extensively and intensively for many years, its profile in feature space, however, remains unclear. Here, human subjects were trained to perform either a simple low-level feature (grating orientation) or a complex high-level object (face view) discrimination task over a long-time course. During, immediately after, and one month after training, all results showed that in feature space VPL in grating orientation discrimination was a center-surround profile; VPL in face view discrimination, however, was a monotonic gradient profile. Importantly, these two profiles can be emerged by a deep convolutional neural network with a modified AlexNet consisted of 7 and 12 layers, respectively. Altogether, our study reveals for the first time a feature hierarchy-dependent profile of VPL in feature space, placing a necessary constraint on our understanding of the neural computation of VPL.
Collapse
Affiliation(s)
- Shiqi Shen
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| | - Yueling Sun
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| | - Jiachen Lu
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| | - Chu Li
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| | - Qinglin Chen
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| | - Ce Mo
- Department of Psychology, Sun-YatSen University, Guangzhou, Guangdong 510275, China
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Xilin Zhang
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Guangzhou, Guangdong 510631, China
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, Guangdong 510631, China
| |
Collapse
|
50
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. Nat Commun 2024; 15:1989. [PMID: 38443349 PMCID: PMC10915141 DOI: 10.1038/s41467-024-45679-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/30/2024] [Indexed: 03/07/2024] Open
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide multi-faceted neurocomputational evidence that blurry visual experiences may be critical for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea.
| | - Frank Tong
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|