1
|
Cao R, Zhang J, Zheng J, Wang Y, Brunner P, Willie JT, Wang S. A neural computational framework for face processing in the human temporal lobe. Curr Biol 2025; 35:1765-1778.e6. [PMID: 40118061 PMCID: PMC12014353 DOI: 10.1016/j.cub.2025.02.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/03/2025] [Accepted: 02/27/2025] [Indexed: 03/23/2025]
Abstract
A key question in cognitive neuroscience is how unified identity representations emerge from visual inputs. Here, we recorded intracranial electroencephalography (iEEG) from the human ventral temporal cortex (VTC) and medial temporal lobe (MTL), as well as single-neuron activity in the MTL, to demonstrate how dense feature-based representations in the VTC are translated into sparse identity-based representations in the MTL. First, we characterized the spatiotemporal neural dynamics of face coding in the VTC and MTL. The VTC, particularly the fusiform gyrus, exhibits robust axis-based feature coding. Remarkably, MTL neurons encode a receptive field within the VTC neural feature space, constructed using VTC neural axes, thereby bridging dense feature and sparse identity representations. We further validated our findings using recordings from a macaque. Lastly, inter-areal interactions between the VTC and MTL provide the physiological basis of this computational framework. Together, we reveal the neurophysiological underpinnings of a computational framework that explains how perceptual information is translated into face identities.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA.
| | - Jie Zhang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Jie Zheng
- Department of Biomedical Engineering, University of California, Davis, Davis, CA 95618, USA
| | - Yue Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA; Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
2
|
Cortinovis D, Peelen MV, Bracci S. Tool Representations in Human Visual Cortex. J Cogn Neurosci 2025; 37:515-531. [PMID: 39620956 DOI: 10.1162/jocn_a_02281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2025]
Abstract
Tools such as pens, forks, and scissors play an important role in many daily-life activities, an importance underscored by the presence in visual cortex of a set of tool-selective brain regions. This review synthesizes decades of neuroimaging research that investigated the representational spaces in the visual ventral stream for objects, such as tools, that are specifically characterized by action-related properties. Overall, results reveal a dissociation between representational spaces in ventral and lateral occipito-temporal cortex (OTC). While lateral OTC encodes both visual (shape) and action-related properties of objects, distinguishing between objects acting as end-effectors (e.g., tools, hands) versus similar noneffector manipulable objects (e.g., a glass), ventral OTC primarily represents objects' visual features such as their surface properties (e.g., material and texture). These areas act in concert with regions outside of OTC to support object interaction and tool use. The parallel investigation of the dimensions underlying object representations in artificial neural networks reveals both the possibilities and the difficulties in capturing the action-related dimensions that distinguish tools from other objects. Although artificial neural networks offer promise as models of visual cortex computations, challenges persist in replicating the action-related dimensions that go beyond mere visual features. Taken together, we propose that regions in OTC support the representation of tools based on a behaviorally relevant action code and suggest future paths to generate a computational model of this object space.
Collapse
|
3
|
Kanwisher N. Animal models of the human brain: Successes, limitations, and alternatives. Curr Opin Neurobiol 2025; 90:102969. [PMID: 39914250 DOI: 10.1016/j.conb.2024.102969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 12/19/2024] [Accepted: 12/21/2024] [Indexed: 02/21/2025]
Abstract
The last three decades of research in human cognitive neuroscience have given us an initial "parts list" for the human mind in the form of a set of cortical regions with distinct and often very specific functions. But current neuroscientific methods in humans have limited ability to reveal exactly what these regions represent and compute, the causal role of each in behavior, and the interactions among regions that produce real-world cognition. Animal models can help to answer these questions when homologues exist in other species, like the face system in macaques. When homologues do not exist in animals, for example for speech and music perception, and understanding of language or other people's thoughts, intracranial recordings in humans play a central role, along with a new alternative to animal models: artificial neural networks.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, United States.
| |
Collapse
|
4
|
Ramirez JG, Vanhoyland M, Ratan Murty NA, Decramer T, Van Paesschen W, Bracci S, Op de Beeck H, Kanwisher N, Janssen P, Theys T. Intracortical recordings reveal the neuronal selectivity for bodies and body parts in the human visual cortex. Proc Natl Acad Sci U S A 2024; 121:e2408871121. [PMID: 39652751 PMCID: PMC11665852 DOI: 10.1073/pnas.2408871121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 10/22/2024] [Indexed: 02/13/2025] Open
Abstract
Body perception plays a fundamental role in social cognition. Yet, the neural mechanisms underlying this process in humans remain elusive given the spatiotemporal constraints of functional imaging. Here, we present intracortical recordings of single- and multiunit spiking activity in two epilepsy surgery patients in or near the extrastriate body area, a critical region for body perception. Our recordings revealed a strong preference for human bodies over a large range of control stimuli. Notably, body selectivity was driven by a distinct selectivity for body parts. The observed body selectivity generalized to nonphotographic depictions of bodies including silhouettes and stick figures. Overall, our study provides unique neural data that bridge the gap between human neuroimaging and macaque electrophysiology studies, laying a solid foundation for computational models of human body processing.
Collapse
Affiliation(s)
- Jesus Garcia Ramirez
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
| | - Michael Vanhoyland
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - N. A. Ratan Murty
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- The Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Thomas Decramer
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Wim Van Paesschen
- Laboratory for Epilepsy Research, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Stefania Bracci
- Department of Psychology and Cognitive Science, University of Trento, Trento38068, Italy
| | - Hans Op de Beeck
- Laboratory for Biological Psychology, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- The Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Peter Janssen
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, Katholieke Universiteit Leuven and the Leuven Brain Institute, LeuvenB-3000, Belgium
| | - Tom Theys
- Research group Experimental Neurosurgery and Neuroanatomy, Katholieke Universiteit Leuven, and the Leuven Brain Institute, LeuvenB-3000, Belgium
- Department of Neurosurgery, Universitaire Ziekenhuizen Leuven, Katholieke Universiteit Leuven, LeuvenB-3000, Belgium
| |
Collapse
|
5
|
Liu TT, Granovetter MC, Maallo AMS, Robert S, Fu JZ, Patterson C, Plaut DC, Behrmann M. Cross-sectional and longitudinal changes in category-selectivity in visual cortex following pediatric cortical resection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.08.627367. [PMID: 39713452 PMCID: PMC11661110 DOI: 10.1101/2024.12.08.627367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
The topographic organization of category-selective responses in human ventral occipitotemporal cortex (VOTC) and its relationship to regions subserving language functions is remarkably uniform across individuals. This arrangement is thought to result from the clustering of neurons responding to similar inputs, constrained by intrinsic architecture and tuned by experience. We examined the malleability of this organization in individuals with unilateral resection of VOTC during childhood for the management of drug-resistant epilepsy. In cross-sectional and longitudinal functional imaging studies, we compared the topography and neural representations of 17 category-selective regions in individuals with a VOTC resection, a 'control patient' with resection outside VOTC, and typically developing matched controls. We demonstrated both adherence to and deviation from the standard topography and uncovered fine-grained competitive dynamics between word- and face-selectivity over time in the single, preserved VOTC. The findings elucidate the nature and extent of cortical plasticity and highlight the potential for remodeling of extrastriate architecture and function.
Collapse
Affiliation(s)
- Tina T. Liu
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Laboratory of Brain and Cognition, National Institute of Mental Health, NIH, Bethesda, MD, USA
- Department of Neurology, Georgetown University Medical Center, Washington, D.C., USA
| | - Michael C. Granovetter
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Departments of Pediatrics and Neurology, New York University, New York, NY, USA
| | - Anne Margarette S. Maallo
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Sophia Robert
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jason Z. Fu
- Laboratory of Brain and Cognition, National Institute of Mental Health, NIH, Bethesda, MD, USA
| | | | - David C. Plaut
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Marlene Behrmann
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Ophthalmology, University of Pittsburgh, PA, USA
| |
Collapse
|
6
|
Nakai T, Constant-Varlet C, Prado J. Encoding models for developmental cognitive computational neuroscience: Promise, challenges, and potential. Dev Cogn Neurosci 2024; 70:101470. [PMID: 39504850 PMCID: PMC11570778 DOI: 10.1016/j.dcn.2024.101470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/16/2024] [Accepted: 10/23/2024] [Indexed: 11/08/2024] Open
Abstract
Cognitive computational neuroscience has received broad attention in recent years as an emerging area integrating cognitive science, neuroscience, and artificial intelligence. At the heart of this field, approaches using encoding models allow for explaining brain activity from latent and high-dimensional features, including artificial neural networks. With the notable exception of temporal response function models that are applied to electroencephalography, most prior studies have focused on adult subjects, making it difficult to capture how brain representations change with learning and development. Here, we argue that future developmental cognitive neuroscience studies would benefit from approaches relying on encoding models. We provide an overview of encoding models used in adult functional magnetic resonance imaging research. This research has notably used data with a small number of subjects, but with a large number of samples per subject. Studies using encoding models also generally require task-based neuroimaging data. Though these represent challenges for developmental studies, we argue that these challenges may be overcome by using functional alignment techniques and naturalistic paradigms. These methods would facilitate encoding model analysis in developmental neuroimaging research, which may lead to important theoretical advances.
Collapse
Affiliation(s)
- Tomoya Nakai
- Centre de Recherche en Neurosciences de Lyon (CRNL), INSERM U1028 - CNRS UMR5292, Université de Lyon, France; Araya Inc., Tokyo, Japan.
| | - Charlotte Constant-Varlet
- Centre de Recherche en Neurosciences de Lyon (CRNL), INSERM U1028 - CNRS UMR5292, Université de Lyon, France
| | - Jérôme Prado
- Centre de Recherche en Neurosciences de Lyon (CRNL), INSERM U1028 - CNRS UMR5292, Université de Lyon, France.
| |
Collapse
|
7
|
Ahmed B, Downer JD, Malone BJ, Makin JG. Deep Neural Networks Explain Spiking Activity in Auditory Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.12.623280. [PMID: 39605715 PMCID: PMC11601425 DOI: 10.1101/2024.11.12.623280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
For static stimuli or at gross (~1-s) time scales, artificial neural networks (ANNs) that have been trained on challenging engineering tasks, like image classification and automatic speech recognition, are now the best predictors of neural responses in primate visual and auditory cortex. It is, however, unknown whether this success can be extended to spiking activity at fine time scales, which are particularly relevant to audition. Here we address this question with ANNs trained on speech audio, and acute multi-electrode recordings from the auditory cortex of squirrel monkeys. We show that layers of trained ANNs can predict the spike counts of neurons responding to speech audio and to monkey vocalizations at bin widths of 50 ms and below. For some neurons, the ANNs explain close to all of the explainable variance-much more than traditional spectrotemporal-receptive-field models, and more than untrained networks. Non-primary neurons tend to be more predictable by deeper layers of the ANNs, but there is much variation by neuron, which would be invisible to coarser recording modalities.
Collapse
Affiliation(s)
- Bilal Ahmed
- Elmore School of Electrical and Computer Engineering, Purdue University
| | - Joshua D Downer
- Otolaryngology and Head and Neck Surgery, University of California, San Francisco
| | - Brian J Malone
- Otolaryngology and Head and Neck Surgery, University of California, San Francisco
- Center for Neurscience, U.C. Davis
| | - Joseph G Makin
- Elmore School of Electrical and Computer Engineering, Purdue University
| |
Collapse
|
8
|
Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat Commun 2024; 15:9383. [PMID: 39477923 PMCID: PMC11526138 DOI: 10.1038/s41467-024-53147-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/01/2024] [Indexed: 11/02/2024] Open
Abstract
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
Collapse
Affiliation(s)
- Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Kendrick N Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
9
|
Das S, Mangun GR, Ding M. Perceptual Expertise and Attention: An Exploration using Deep Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.617743. [PMID: 39464001 PMCID: PMC11507720 DOI: 10.1101/2024.10.15.617743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Perceptual expertise and attention are two important factors that enable superior object recognition and task performance. While expertise enhances knowledge and provides a holistic understanding of the environment, attention allows us to selectively focus on task-related information and suppress distraction. It has been suggested that attention operates differently in experts and in novices, but much remains unknown. This study investigates the relationship between perceptual expertise and attention using convolutional neural networks (CNNs), which are shown to be good models of primate visual pathways. Two CNN models were trained to become experts in either face or scene recognition, and the effect of attention on performance was evaluated in tasks involving complex stimuli, such as superimposed images containing superimposed faces and scenes. The goal was to explore how feature-based attention (FBA) influences recognition within and outside the domain of expertise of the models. We found that each model performed better in its area of expertise-and that FBA further enhanced task performance, but only within the domain of expertise, increasing performance by up to 35% in scene recognition, and 15% in face recognition. However, attention had reduced or negative effects when applied outside the models' expertise domain. Neural unit-level analysis revealed that expertise led to stronger tuning towards category-specific features and sharper tuning curves, as reflected in greater representational dissimilarity between targets and distractors, which, in line with the biased competition model of attention, leads to enhanced performance by reducing competition. These findings highlight the critical role of neural tuning at single as well as network level neural in distinguishing the effects of attention in experts and in novices and demonstrate that CNNs can be used fruitfully as computational models for addressing neuroscience questions not practical with the empirical methods.
Collapse
Affiliation(s)
- Soukhin Das
- Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of California, Davis
| | - G R Mangun
- Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of California, Davis
- Department of Neurology, University of California, Davis
| | - Mingzhou Ding
- Department of Neurology, University of California, Davis
| |
Collapse
|
10
|
Liu X, He D, Zhu M, Li Y, Lin L, Cai Q. Hemispheric dominance in reading system alters contribution to face processing lateralization across development. Dev Cogn Neurosci 2024; 69:101418. [PMID: 39059053 PMCID: PMC11331717 DOI: 10.1016/j.dcn.2024.101418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 07/07/2024] [Accepted: 07/21/2024] [Indexed: 07/28/2024] Open
Abstract
Face processing dominates the right hemisphere. This lateralization can be affected by co-lateralization within the same system and influence between different systems, such as neural competition from reading acquisition. Yet, how the relationship pattern changes through development remains unknown. This study examined the lateralization of core face processing and word processing in different age groups. By comparing fMRI data from 36 school-aged children and 40 young adults, we investigated whether there are age and regional effects on lateralization, and how relationships between lateralization within and between systems change across development. Our results showed significant right hemispheric lateralization in the core face system and left hemispheric lateralization in reading-related areas for both age groups when viewing faces and texts passively. While all participants showed stronger lateralization in brain regions of higher functional hierarchy when viewing faces, only adults exhibited this lateralization when viewing texts. In both age cohorts, there was intra-system co-lateralization for face processing, whereas an inter-system relationship was only found in adults. Specifically, functional lateralization of Broca's area during reading negatively predicted functional asymmetry in the FFA during face perception. This study initially provides neuroimaging evidence for the reading-induced neural competition theory from a maturational perspective in Chinese cohorts.
Collapse
Affiliation(s)
- Xinyang Liu
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China.
| | - Danni He
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Miaomiao Zhu
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Yinghui Li
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Longnian Lin
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Center for Brain Science and Brain-Inspired Technology, East China Normal University, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University, Shanghai, China; School of Life Science Department, East China Normal University, Shanghai 200062, China.
| | - Qing Cai
- Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Changning Mental Health Center, Shanghai 200335, China; Shanghai Center for Brain Science and Brain-Inspired Technology, East China Normal University, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University, Shanghai, China.
| |
Collapse
|
11
|
Prince JS, Alvarez GA, Konkle T. Contrastive learning explains the emergence and function of visual category-selective regions. SCIENCE ADVANCES 2024; 10:eadl1776. [PMID: 39321304 PMCID: PMC11423896 DOI: 10.1126/sciadv.adl1776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 08/21/2024] [Indexed: 09/27/2024]
Abstract
Modular and distributed coding theories of category selectivity along the human ventral visual stream have long existed in tension. Here, we present a reconciling framework-contrastive coding-based on a series of analyses relating category selectivity within biological and artificial neural networks. We discover that, in models trained with contrastive self-supervised objectives over a rich natural image diet, category-selective tuning naturally emerges for faces, bodies, scenes, and words. Further, lesions of these model units lead to selective, dissociable recognition deficits, highlighting their distinct functional roles in information processing. Finally, these pre-identified units can predict neural responses in all corresponding face-, scene-, body-, and word-selective regions of human visual cortex, under a highly constrained sparse positive encoding procedure. The success of this single model indicates that brain-like functional specialization can emerge without category-specific learning pressures, as the system learns to untangle rich image content. Contrastive coding, therefore, provides a unifying account of object category emergence and representation in the human brain.
Collapse
Affiliation(s)
- Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Kempner Institute for Biological and Artificial Intelligence, Harvard University, Cambridge, MA, USA
| |
Collapse
|
12
|
Kar K, DiCarlo JJ. The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates. Annu Rev Vis Sci 2024; 10:91-121. [PMID: 38950431 DOI: 10.1146/annurev-vision-112823-030616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Inferences made about objects via vision, such as rapid and accurate categorization, are core to primate cognition despite the algorithmic challenge posed by varying viewpoints and scenes. Until recently, the brain mechanisms that support these capabilities were deeply mysterious. However, over the past decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in these behavioral feats. Apart from fundamentally changing the landscape of artificial intelligence, modified versions of these ANN systems are the current leading scientific hypotheses of an integrated set of mechanisms in the primate ventral visual stream that support core object recognition. What separates brain-mapped versions of these systems from prior conceptual models is that they are sensory computable, mechanistic, anatomically referenced, and testable (SMART). In this article, we review and provide perspective on the brain mechanisms addressed by the current leading SMART models. We review their empirical brain and behavioral alignment successes and failures, discuss the next frontiers for an even more accurate mechanistic understanding, and outline the likely applications.
Collapse
Affiliation(s)
- Kohitij Kar
- Department of Biology, Centre for Vision Research, and Centre for Integrative and Applied Neuroscience, York University, Toronto, Ontario, Canada;
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, MIT Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
13
|
Tuckute G, Kanwisher N, Fedorenko E. Language in Brains, Minds, and Machines. Annu Rev Neurosci 2024; 47:277-301. [PMID: 38669478 DOI: 10.1146/annurev-neuro-120623-101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties-their architecture, task performance, or training-are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
14
|
Wang T, Lee TS, Yao H, Hong J, Li Y, Jiang H, Andolina IM, Tang S. Large-scale calcium imaging reveals a systematic V4 map for encoding natural scenes. Nat Commun 2024; 15:6401. [PMID: 39080309 PMCID: PMC11289446 DOI: 10.1038/s41467-024-50821-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Biological visual systems have evolved to process natural scenes. A full understanding of visual cortical functions requires a comprehensive characterization of how neuronal populations in each visual area encode natural scenes. Here, we utilized widefield calcium imaging to record V4 cortical response to tens of thousands of natural images in male macaques. Using this large dataset, we developed a deep-learning digital twin of V4 that allowed us to map the natural image preferences of the neural population at 100-µm scale. This detailed map revealed a diverse set of functional domains in V4, each encoding distinct natural image features. We validated these model predictions using additional widefield imaging and single-cell resolution two-photon imaging. Feature attribution analysis revealed that these domains lie along a continuum from preferring spatially localized shape features to preferring spatially dispersed surface features. These results provide insights into the organizing principles that govern natural scene encoding in V4.
Collapse
Affiliation(s)
- Tianye Wang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Tai Sing Lee
- Computer Science Department and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Haoxuan Yao
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Jiayi Hong
- Peking University School of Life Sciences, Beijing, 100871, China
| | - Yang Li
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Hongfei Jiang
- Peking University School of Life Sciences, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Ian Max Andolina
- The Center for Excellence in Brain Science and Intelligence Technology, State Key Laboratory of Neuroscience, Key Laboratory of Primate Neurobiology, Institute of Neuroscience, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shiming Tang
- Peking University School of Life Sciences, Beijing, 100871, China.
- Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China.
- IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China.
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China.
| |
Collapse
|
15
|
Dipani A, McNeal N, Ratan Murty NA. Linking faces to social cognition: The temporal pole as a potential social switch. Proc Natl Acad Sci U S A 2024; 121:e2411735121. [PMID: 39024106 PMCID: PMC11295026 DOI: 10.1073/pnas.2411735121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024] Open
Affiliation(s)
- Alish Dipani
- Cognition and Brain Science, School of Psychology, Georgia Institute of Technology, Atlanta, GA30332
- Center of Excellence in Computational Cognition, Georgia Institute of Technology, Atlanta, GA30332
| | - Nikolas McNeal
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA30332
| | - N. Apurva Ratan Murty
- Cognition and Brain Science, School of Psychology, Georgia Institute of Technology, Atlanta, GA30332
- Center of Excellence in Computational Cognition, Georgia Institute of Technology, Atlanta, GA30332
| |
Collapse
|
16
|
Lahner B, Dwivedi K, Iamshchinina P, Graumann M, Lascelles A, Roig G, Gifford AT, Pan B, Jin S, Ratan Murty NA, Kay K, Oliva A, Cichy R. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat Commun 2024; 15:6241. [PMID: 39048577 PMCID: PMC11269733 DOI: 10.1038/s41467-024-50310-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/04/2024] [Indexed: 07/27/2024] Open
Abstract
Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
| | - Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Polina Iamshchinina
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Monika Graumann
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Alex Lascelles
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
- The Hessian Center for AI (hessian.AI), Darmstadt, Germany
| | | | - Bowen Pan
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - SouYoung Jin
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - N Apurva Ratan Murty
- Department of Brain and Cognitive Science, MIT, Cambridge, MA, USA
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Radoslaw Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
17
|
Ren Y, Bashivan P. How well do models of visual cortex generalize to out of distribution samples? PLoS Comput Biol 2024; 20:e1011145. [PMID: 38820563 PMCID: PMC11216589 DOI: 10.1371/journal.pcbi.1011145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/01/2024] [Accepted: 04/29/2024] [Indexed: 06/02/2024] Open
Abstract
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs' object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
Collapse
Affiliation(s)
- Yifei Ren
- Department of Computer Science, McGill University, Montreal, Canada
| | - Pouya Bashivan
- Department of Computer Science, McGill University, Montreal, Canada
- Department of Computer Physiology, McGill University, Montreal, Canada
- Mila, Université de Montréal, Montreal, Canada
| |
Collapse
|
18
|
Garlichs A, Blank H. Prediction error processing and sharpening of expected information across the face-processing hierarchy. Nat Commun 2024; 15:3407. [PMID: 38649694 PMCID: PMC11035707 DOI: 10.1038/s41467-024-47749-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
The perception and neural processing of sensory information are strongly influenced by prior expectations. The integration of prior and sensory information can manifest through distinct underlying mechanisms: focusing on unexpected input, denoted as prediction error (PE) processing, or amplifying anticipated information via sharpened representation. In this study, we employed computational modeling using deep neural networks combined with representational similarity analyses of fMRI data to investigate these two processes during face perception. Participants were cued to see face images, some generated by morphing two faces, leading to ambiguity in face identity. We show that expected faces were identified faster and perception of ambiguous faces was shifted towards priors. Multivariate analyses uncovered evidence for PE processing across and beyond the face-processing hierarchy from the occipital face area (OFA), via the fusiform face area, to the anterior temporal lobe, and suggest sharpened representations in the OFA. Our findings support the proposition that the brain represents faces grounded in prior expectations.
Collapse
Affiliation(s)
- Annika Garlichs
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany.
| |
Collapse
|
19
|
Lahner B, Mohsenzadeh Y, Mullin C, Oliva A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol 2024; 22:e3002564. [PMID: 38557761 PMCID: PMC10984539 DOI: 10.1371/journal.pbio.3002564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Yalda Mohsenzadeh
- The Brain and Mind Institute, The University of Western Ontario, London, Canada
- Department of Computer Science, The University of Western Ontario, London, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Caitlin Mullin
- Vision: Science to Application (VISTA), York University, Toronto, Ontario, Canada
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
20
|
Jain S, Vo VA, Wehbe L, Huth AG. Computational Language Modeling and the Promise of In Silico Experimentation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:80-106. [PMID: 38645624 PMCID: PMC11025654 DOI: 10.1162/nol_a_00101] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/18/2023] [Indexed: 04/23/2024]
Abstract
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm-in silico experimentation using deep learning-based encoding models-that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
Collapse
Affiliation(s)
- Shailee Jain
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Vy A. Vo
- Brain-Inspired Computing Lab, Intel Labs, Hillsboro, OR, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alexander G. Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
- Department of Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
21
|
Walbrin J, Downing PE, Sotero FD, Almeida J. Characterizing the discriminability of visual categorical information in strongly connected voxels. Neuropsychologia 2024; 195:108815. [PMID: 38311112 DOI: 10.1016/j.neuropsychologia.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 01/06/2024] [Accepted: 02/01/2024] [Indexed: 02/06/2024]
Abstract
Functional brain responses are strongly influenced by connectivity. Recently, we demonstrated a major example of this: category discriminability within occipitotemporal cortex (OTC) is enhanced for voxel sets that share strong functional connectivity to distal brain areas, relative to those that share lesser connectivity. That is, within OTC regions, sets of 'most-connected' voxels show improved multivoxel pattern discriminability for tool-, face-, and place stimuli relative to voxels with weaker connectivity to the wider brain. However, understanding whether these effects generalize to other domains (e.g. body perception network), and across different levels of the visual processing streams (e.g. dorsal as well as ventral stream areas) is an important extension of this work. Here, we show that this so-called connectivity-guided decoding (CGD) effect broadly generalizes across a wide range of categories (tools, faces, bodies, hands, places). This effect is robust across dorsal stream areas, but less consistent in earlier ventral stream areas. In the latter regions, category discriminability is generally very high, suggesting that extraction of category-relevant visual properties is less reliant on connectivity to downstream areas. Further, CGD effects are primarily expressed in a category-specific manner: For example, within the network of tool regions, discriminability of tool information is greater than non-tool information. The connectivity-guided decoding approach shown here provides a novel demonstration of the crucial relationship between wider brain connectivity and complex local-level functional responses at different levels of the visual processing streams. Further, this approach generates testable new hypotheses about the relationships between connectivity and local selectivity.
Collapse
Affiliation(s)
- Jon Walbrin
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal.
| | - Paul E Downing
- School of Human and Behavioural Sciences, Bangor University, Bangor, Wales
| | - Filipa Dourado Sotero
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal
| | - Jorge Almeida
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal; CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Portugal
| |
Collapse
|
22
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
23
|
Stoinski LM, Perkuhn J, Hebart MN. THINGSplus: New norms and metadata for the THINGS database of 1854 object concepts and 26,107 natural object images. Behav Res Methods 2024; 56:1583-1603. [PMID: 37095326 PMCID: PMC10991023 DOI: 10.3758/s13428-023-02110-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/26/2023]
Abstract
To study visual and semantic object representations, the need for well-curated object concepts and images has grown significantly over the past years. To address this, we have previously developed THINGS, a large-scale database of 1854 systematically sampled object concepts with 26,107 high-quality naturalistic images of these concepts. With THINGSplus, we significantly extend THINGS by adding concept- and image-specific norms and metadata for all 1854 concepts and one copyright-free image example per concept. Concept-specific norms were collected for the properties of real-world size, manmadeness, preciousness, liveliness, heaviness, naturalness, ability to move or be moved, graspability, holdability, pleasantness, and arousal. Further, we provide 53 superordinate categories as well as typicality ratings for all their members. Image-specific metadata includes a nameability measure, based on human-generated labels of the objects depicted in the 26,107 images. Finally, we identified one new public domain image per concept. Property (M = 0.97, SD = 0.03) and typicality ratings (M = 0.97, SD = 0.01) demonstrate excellent consistency, with the subsequently collected arousal ratings as the only exception (r = 0.69). Our property (M = 0.85, SD = 0.11) and typicality (r = 0.72, 0.74, 0.88) data correlated strongly with external norms, again with the lowest validity for arousal (M = 0.41, SD = 0.08). To summarize, THINGSplus provides a large-scale, externally validated extension to existing object norms and an important extension to THINGS, allowing detailed selection of stimuli and control variables for a wide range of research interested in visual object processing, language, and semantic memory.
Collapse
Affiliation(s)
- Laura M Stoinski
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany.
| | - Jonas Perkuhn
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany
| | - Martin N Hebart
- Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig, Germany
- Justus Liebig University, Gießen, Germany
| |
Collapse
|
24
|
Op de Beeck H, Bracci S. Going after the bigger picture: Using high-capacity models to understand mind and brain. Behav Brain Sci 2023; 46:e404. [PMID: 38054291 DOI: 10.1017/s0140525x2300153x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural networks (DNNs) provide a unique opportunity to move towards a generic modelling framework in psychology. The high representational capacity of these models combined with the possibility for further extensions has already allowed us to investigate the forest, namely the complex landscape of representations and processes that underlie human cognition, without forgetting about the trees, which include individual psychological phenomena.
Collapse
Affiliation(s)
| | - Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy ://webapps.unitn.it/du/en/Persona/PER0076943/Curriculum
| |
Collapse
|
25
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
26
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537080. [PMID: 37090673 PMCID: PMC10120732 DOI: 10.1101/2023.04.16.537080] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Transformer models such as GPT generate human-like language and are highly predictive of human brain responses to language. Here, using fMRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of brain response associated with each sentence. Then, we use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also noninvasively control neural activity in higher-level cortical areas, like the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- MIT-IBM Watson AI Lab, Cambridge, MA 02142, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
27
|
Jiahui G, Feilong M, Visconti di Oleggio Castello M, Nastase SA, Haxby JV, Gobbini MI. Modeling naturalistic face processing in humans with deep convolutional neural networks. Proc Natl Acad Sci U S A 2023; 120:e2304085120. [PMID: 37847731 PMCID: PMC10614847 DOI: 10.1073/pnas.2304085120] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The ways in which the internal face representations in DCNNs relate to human cognitive representations and brain activity are not well understood. Nearly all previous studies focused on static face image processing with rapid display times and ignored the processing of naturalistic, dynamic information. To address this gap, we developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces). We used this naturalistic dataset to compare representational geometries estimated from DCNNs, behavioral responses, and brain responses. We found that DCNN representational geometries were consistent across architectures, cognitive representational geometries were consistent across raters in a behavioral arrangement task, and neural representational geometries in face areas were consistent across brains. Representational geometries in late, fully connected DCNN layers, which are optimized for individuation, were much more weakly correlated with cognitive and neural geometries than were geometries in late-intermediate layers. The late-intermediate face-DCNN layers successfully matched cognitive representational geometries, as measured with a behavioral arrangement task that primarily reflected categorical attributes, and correlated with neural representational geometries in known face-selective topographies. Our study suggests that current DCNNs successfully capture neural cognitive processes for categorical attributes of faces but less accurately capture individuation and dynamic features.
Collapse
Affiliation(s)
- Guo Jiahui
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - Ma Feilong
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | | | - Samuel A. Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - James V. Haxby
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - M. Ida Gobbini
- Department of Medical and Surgical Sciences, University of Bologna, Bologna40138, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico, Istituto delle Scienze Neurologiche di Bologna, Bologna40139, Italia
| |
Collapse
|
28
|
Gu Z, Jamison K, Sabuncu MR, Kuceyeski A. Human brain responses are modulated when exposed to optimized natural images or synthetically generated images. Commun Biol 2023; 6:1076. [PMID: 37872319 PMCID: PMC10593916 DOI: 10.1038/s42003-023-05440-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023] Open
Abstract
Understanding how human brains interpret and process information is important. Here, we investigated the selectivity and inter-individual differences in human brain responses to images via functional MRI. In our first experiment, we found that images predicted to achieve maximal activations using a group level encoding model evoke higher responses than images predicted to achieve average activations, and the activation gain is positively associated with the encoding model accuracy. Furthermore, anterior temporal lobe face area (aTLfaces) and fusiform body area 1 had higher activation in response to maximal synthetic images compared to maximal natural images. In our second experiment, we found that synthetic images derived using a personalized encoding model elicited higher responses compared to synthetic images from group-level or other subjects' encoding models. The finding of aTLfaces favoring synthetic images than natural images was also replicated. Our results indicate the possibility of using data-driven and generative approaches to modulate macro-scale brain region responses and probe inter-individual differences in and functional specialization of the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
29
|
Ma G, Yan R, Tang H. Exploiting noise as a resource for computation and learning in spiking neural networks. PATTERNS (NEW YORK, N.Y.) 2023; 4:100831. [PMID: 37876899 PMCID: PMC10591140 DOI: 10.1016/j.patter.2023.100831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/06/2023] [Accepted: 08/07/2023] [Indexed: 10/26/2023]
Abstract
Networks of spiking neurons underpin the extraordinary information-processing capabilities of the brain and have become pillar models in neuromorphic artificial intelligence. Despite extensive research on spiking neural networks (SNNs), most studies are established on deterministic models, overlooking the inherent non-deterministic, noisy nature of neural computations. This study introduces the noisy SNN (NSNN) and the noise-driven learning (NDL) rule by incorporating noisy neuronal dynamics to exploit the computational advantages of noisy neural processing. The NSNN provides a theoretical framework that yields scalable, flexible, and reliable computation and learning. We demonstrate that this framework leads to spiking neural models with competitive performance, improved robustness against challenging perturbations compared with deterministic SNNs, and better reproducing probabilistic computation in neural coding. Generally, this study offers a powerful and easy-to-use tool for machine learning, neuromorphic intelligence practitioners, and computational neuroscience researchers.
Collapse
Affiliation(s)
- Gehua Ma
- College of Computer Science and Technology, Zhejiang University, Hangzhou, PRC
| | - Rui Yan
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, PRC
| | - Huajin Tang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, PRC
- State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, PRC
| |
Collapse
|
30
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
31
|
Yao M, Wen B, Yang M, Guo J, Jiang H, Feng C, Cao Y, He H, Chang L. High-dimensional topographic organization of visual features in the primate temporal lobe. Nat Commun 2023; 14:5931. [PMID: 37739988 PMCID: PMC10517140 DOI: 10.1038/s41467-023-41584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/07/2023] [Indexed: 09/24/2023] Open
Abstract
The inferotemporal cortex supports our supreme object recognition ability. Numerous studies have been conducted to elucidate the functional organization of this brain area, but there are still important questions that remain unanswered, including how this organization differs between humans and non-human primates. Here, we use deep neural networks trained on object categorization to construct a 25-dimensional space of visual features, and systematically measure the spatial organization of feature preference in both male monkey brains and human brains using fMRI. These feature maps allow us to predict the selectivity of a previously unknown region in monkey brains, which is corroborated by additional fMRI and electrophysiology experiments. These maps also enable quantitative analyses of the topographic organization of the temporal lobe, demonstrating the existence of a pair of orthogonal gradients that differ in spatial scale and revealing significant differences in the functional organization of high-level visual areas between monkey and human brains.
Collapse
Affiliation(s)
- Mengna Yao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bincheng Wen
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Mingpo Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jiebin Guo
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Haozhou Jiang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Chao Feng
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yilei Cao
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Huiguang He
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Le Chang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
32
|
Ozcelik F, VanRullen R. Natural scene reconstruction from fMRI signals using generative latent diffusion. Sci Rep 2023; 13:15666. [PMID: 37731047 PMCID: PMC10511448 DOI: 10.1038/s41598-023-42891-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 09/15/2023] [Indexed: 09/22/2023] Open
Abstract
In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called "Brain-Diffuser". In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling "ROI-optimal" scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
Collapse
Affiliation(s)
- Furkan Ozcelik
- CerCo, CNRS UMR5549, Toulouse, France.
- Universite de Toulouse, Toulouse, France.
| | - Rufin VanRullen
- CerCo, CNRS UMR5549, Toulouse, France
- Universite de Toulouse, Toulouse, France
- ANITI, Toulouse, France
| |
Collapse
|
33
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
34
|
Yang Y, Liu X, Li W, Li C, Ma G, Yang G, Ren J, Ge S. Detection of Hindwing Landmarks Using Transfer Learning and High-Resolution Networks. BIOLOGY 2023; 12:1006. [PMID: 37508435 PMCID: PMC10376506 DOI: 10.3390/biology12071006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/25/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023]
Abstract
Hindwing venation is one of the most important morphological features for the functional and evolutionary analysis of beetles, as it is one of the key features used for the analysis of beetle flight performance and the design of beetle-like flapping wing micro aerial vehicles. However, manual landmark annotation for hindwing morphological analysis is a time-consuming process hindering the development of wing morphology research. In this paper, we present a novel approach for the detection of landmarks on the hindwings of leaf beetles (Coleoptera, Chrysomelidae) using a limited number of samples. The proposed method entails the transfer of a pre-existing model, trained on a large natural image dataset, to the specific domain of leaf beetle hindwings. This is achieved by using a deep high-resolution network as the backbone. The low-stage network parameters are frozen, while the high-stage parameters are re-trained to construct a leaf beetle hindwing landmark detection model. A leaf beetle hindwing landmark dataset was constructed, and the network was trained on varying numbers of randomly selected hindwing samples. The results demonstrate that the average detection normalized mean error for specific landmarks of leaf beetle hindwings (100 samples) remains below 0.02 and only reached 0.045 when using a mere three samples for training. Comparative analyses reveal that the proposed approach out-performs a prevalently used method (i.e., a deep residual network). This study showcases the practicability of employing natural images-specifically, those in ImageNet-for the purpose of pre-training leaf beetle hindwing landmark detection models in particular, providing a promising approach for insect wing venation digitization.
Collapse
Affiliation(s)
- Yi Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaokun Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenjie Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Congqiao Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ge Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangqin Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Ren
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Siqin Ge
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
35
|
Huber LS, Geirhos R, Wichmann FA. The developmental trajectory of object recognition robustness: Children are like small adults but unlike big deep neural networks. J Vis 2023; 23:4. [PMID: 37410494 PMCID: PMC10337805 DOI: 10.1167/jov.23.7.4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 05/10/2023] [Indexed: 07/07/2023] Open
Abstract
In laboratory object recognition tasks based on undistorted photographs, both adult humans and deep neural networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last 2 years have seen impressive gains in DNN distortion robustness, predominantly achieved through ever-increasing large-scale datasets-orders of magnitude larger than ImageNet. Although this simple brute-force approach is very effective in achieving human-level robustness in DNNs, it raises the question of whether human robustness, too, is simply due to extensive experience with (distorted) visual input during childhood and beyond. Here we investigate this question by comparing the core object recognition performance of 146 children (aged 4-15 years) against adults and against DNNs. We find, first, that already 4- to 6-year-olds show remarkable robustness to image distortions and outperform DNNs trained on ImageNet. Second, we estimated the number of images children had been exposed to during their lifetime. Compared with various DNNs, children's high robustness requires relatively little data. Third, when recognizing objects, children-like adults but unlike DNNs-rely heavily on shape but not on texture cues. Together our results suggest that the remarkable robustness to distortions emerges early in the developmental trajectory of human object recognition and is unlikely the result of a mere accumulation of experience with distorted visual input. Even though current DNNs match human performance regarding robustness, they seem to rely on different and more data-hungry strategies to do so.
Collapse
Affiliation(s)
- Lukas S Huber
- Department of Psychology, University of Bern, Bern, Switzerland
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0002-7755-6926
| | - Robert Geirhos
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0001-7698-3187
| | - Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
- https://orcid.org/0000-0002-2592-634X
| |
Collapse
|
36
|
Doshi FR, Konkle T. Cortical topographic motifs emerge in a self-organized map of object space. SCIENCE ADVANCES 2023; 9:eade8187. [PMID: 37343093 PMCID: PMC10284546 DOI: 10.1126/sciadv.ade8187] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 05/17/2023] [Indexed: 06/23/2023]
Abstract
The human ventral visual stream has a highly systematic organization of object information, but the causal pressures driving these topographic motifs are highly debated. Here, we use self-organizing principles to learn a topographic representation of the data manifold of a deep neural network representational space. We find that a smooth mapping of this representational space showed many brain-like motifs, with a large-scale organization by animacy and real-world object size, supported by mid-level feature tuning, with naturally emerging face- and scene-selective regions. While some theories of the object-selective cortex posit that these differently tuned regions of the brain reflect a collection of distinctly specified functional modules, the present work provides computational support for an alternate hypothesis that the tuning and topography of the object-selective cortex reflect a smooth mapping of a unified representational space.
Collapse
Affiliation(s)
- Fenil R. Doshi
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA
| | | |
Collapse
|
37
|
Marrazzo G, De Martino F, Lage-Castellanos A, Vaessen MJ, de Gelder B. Voxelwise encoding models of body stimuli reveal a representational gradient from low-level visual features to postural features in occipitotemporal cortex. Neuroimage 2023:120240. [PMID: 37348622 DOI: 10.1016/j.neuroimage.2023.120240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/16/2023] [Accepted: 06/19/2023] [Indexed: 06/24/2023] Open
Abstract
Research on body representation in the brain has focused on category-specific representation, using fMRI to investigate the response pattern to body stimuli in occipitotemporal cortex without so far addressing the issue of the specific computations involved in body selective regions, only defined by higher order category selectivity. This study used ultra-high field fMRI and banded ridge regression to investigate the coding of body images, by comparing the performance of three encoding models in predicting brain activity in occipitotemporal cortex and specifically the extrastriate body area (EBA). Our results suggest that bodies are encoded in occipitotemporal cortex and in the EBA according to a combination of low-level visual features and postural features.
Collapse
Affiliation(s)
- Giuseppe Marrazzo
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands
| | - Federico De Martino
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands; Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN 55455, United States and Department of NeuroInformatics
| | - Agustin Lage-Castellanos
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands; Cuban Center for Neuroscience, Street 190 e/25 and 27 Cubanacán Playa Havana, CP 11600, Cuba
| | - Maarten J Vaessen
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands
| | - Beatrice de Gelder
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Limburg 6200 MD, Maastricht, The Netherlands.
| |
Collapse
|
38
|
Schwartz E, Alreja A, Richardson RM, Ghuman A, Anzellotti S. Intracranial Electroencephalography and Deep Neural Networks Reveal Shared Substrates for Representations of Face Identity and Expressions. J Neurosci 2023; 43:4291-4303. [PMID: 37142430 PMCID: PMC10255163 DOI: 10.1523/jneurosci.1277-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/25/2023] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open
Abstract
According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n = 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested-even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression.SIGNIFICANCE STATEMENT Previous work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| | - Arish Alreja
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114
- Harvard Medical School, Boston, Massachusetts 02115
| | - Avniel Ghuman
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| |
Collapse
|
39
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
40
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
41
|
Nakai T, Nishimoto S. Artificial neural network modelling of the neural population code underlying mathematical operations. Neuroimage 2023; 270:119980. [PMID: 36848969 DOI: 10.1016/j.neuroimage.2023.119980] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 02/10/2023] [Accepted: 02/23/2023] [Indexed: 02/28/2023] Open
Abstract
Mathematical operations have long been regarded as a sparse, symbolic process in neuroimaging studies. In contrast, advances in artificial neural networks (ANN) have enabled extracting distributed representations of mathematical operations. Recent neuroimaging studies have compared distributed representations of the visual, auditory and language domains in ANNs and biological neural networks (BNNs). However, such a relationship has not yet been examined in mathematics. Here we hypothesise that ANN-based distributed representations can explain brain activity patterns of symbolic mathematical operations. We used the fMRI data of a series of mathematical problems with nine different combinations of operators to construct voxel-wise encoding/decoding models using both sparse operator and latent ANN features. Representational similarity analysis demonstrated shared representations between ANN and BNN, an effect particularly evident in the intraparietal sulcus. Feature-brain similarity (FBS) analysis served to reconstruct a sparse representation of mathematical operations based on distributed ANN features in each cortical voxel. Such reconstruction was more efficient when using features from deeper ANN layers. Moreover, latent ANN features allowed the decoding of novel operators not used during model training from brain activity. The current study provides novel insights into the neural code underlying mathematical thought.
Collapse
Affiliation(s)
- Tomoya Nakai
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Lyon Neuroscience Research Center (CRNL), INSERM U1028 - CNRS UMR5292, University of Lyon, Bron, France.
| | - Shinji Nishimoto
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Graduate School of Frontier Biosciences, Osaka University, Suita, Japan; Graduate School of Medicine, Osaka University, Suita, Japan
| |
Collapse
|
42
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
43
|
Kanwisher N, Gupta P, Dobs K. CNNs reveal the computational implausibility of the expertise hypothesis. iScience 2023; 26:105976. [PMID: 36794151 PMCID: PMC9923184 DOI: 10.1016/j.isci.2023.105976] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/07/2022] [Accepted: 01/11/2023] [Indexed: 01/15/2023] Open
Abstract
Face perception has long served as a classic example of domain specificity of mind and brain. But an alternative "expertise" hypothesis holds that putatively face-specific mechanisms are actually domain-general, and can be recruited for the perception of other objects of expertise (e.g., cars for car experts). Here, we demonstrate the computational implausibility of this hypothesis: Neural network models optimized for generic object categorization provide a better foundation for expert fine-grained discrimination than do models optimized for face recognition.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Pranjul Gupta
- Department of Psychology, Justus-Liebig University Giessen, 35394 Giessen, Germany
| | - Katharina Dobs
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA,Department of Psychology, Justus-Liebig University Giessen, 35394 Giessen, Germany,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus-Liebig University, 35032 Marburg, Germany,Corresponding author
| |
Collapse
|
44
|
Gu Z, Jamison K, Sabuncu M, Kuceyeski A. Personalized visual encoding model construction with small data. Commun Biol 2022; 5:1382. [PMID: 36528715 PMCID: PMC9759560 DOI: 10.1038/s42003-022-04347-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
Quantifying population heterogeneity in brain stimuli-response mapping may allow insight into variability in bottom-up neural systems that can in turn be related to individual's behavior or pathological state. Encoding models that predict brain responses to stimuli are one way to capture this relationship. However, they generally need a large amount of fMRI data to achieve optimal accuracy. Here, we propose an ensemble approach to create encoding models for novel individuals with relatively little data by modeling each subject's predicted response vector as a linear combination of the other subjects' predicted response vectors. We show that these ensemble encoding models trained with hundreds of image-response pairs, achieve accuracy not different from models trained on 20,000 image-response pairs. Importantly, the ensemble encoding models preserve patterns of inter-individual differences in the image-response relationship. We also show the proposed approach is robust against domain shift by validating on data with a different scanner and experimental setup. Additionally, we show that the ensemble encoding models are able to discover the inter-individual differences in various face areas' responses to images of animal vs human faces using a recently developed NeuroGen framework. Our approach shows the potential to use existing densely-sampled data, i.e. large amounts of data collected from a single individual, to efficiently create accurate, personalized encoding models and, subsequently, personalized optimal synthetic images for new individuals scanned under different experimental conditions.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert Sabuncu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
45
|
Dissociation and hierarchy of human visual pathways for simultaneously coding facial identity and expression. Neuroimage 2022; 264:119769. [PMID: 36435341 DOI: 10.1016/j.neuroimage.2022.119769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/14/2022] [Accepted: 11/22/2022] [Indexed: 11/25/2022] Open
Abstract
Humans have an extraordinary ability to recognize facial expression and identity from a single face simultaneously and effortlessly, however, the underlying neural computation is not well understood. Here, we optimized a multi-task deep neural network to classify facial expression and identity simultaneously. Under various optimization training strategies, the best-performing model consistently showed 'share-separate' organization. The two separate branches of the best-performing model also exhibited distinct abilities to categorize facial expression and identity, and these abilities increased along the facial expression or identity branches toward high layers. By comparing the representational similarities between the best-performing model and functional magnetic resonance imaging (fMRI) responses in the human visual cortex to the same face stimuli, the face-selective posterior superior temporal sulcus (pSTS) in the dorsal visual cortex was significantly correlated with layers in the expression branch of the model, and the anterior inferotemporal cortex (aIT) and anterior fusiform face area (aFFA) in the ventral visual cortex were significantly correlated with layers in the identity branch of the model. Besides, the aFFA and aIT better matched the high layers of the model, while the posterior FFA (pFFA) and occipital facial area (OFA) better matched the middle and early layers of the model, respectively. Overall, our study provides a task-optimization computational model to better understand the neural mechanism underlying face recognition, which suggest that similar to the best-performing model, the human visual system exhibits both dissociated and hierarchical neuroanatomical organization when simultaneously coding facial identity and expression.
Collapse
|
46
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
47
|
Ayzenberg V, Behrmann M. Does the brain's ventral visual pathway compute object shape? Trends Cogn Sci 2022; 26:1119-1132. [PMID: 36272937 PMCID: PMC11669366 DOI: 10.1016/j.tics.2022.09.019] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
Abstract
A rich behavioral literature has shown that human object recognition is supported by a representation of shape that is tolerant to variations in an object's appearance. Such 'global' shape representations are achieved by describing objects via the spatial arrangement of their local features, or structure, rather than by the appearance of the features themselves. However, accumulating evidence suggests that the ventral visual pathway - the primary substrate underlying object recognition - may not represent global shape. Instead, ventral representations may be better described as a basis set of local image features. We suggest that this evidence forces a reevaluation of the role of the ventral pathway in object perception and posits a broader network for shape perception that encompasses contributions from the dorsal pathway.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA; The Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
48
|
Khosla M, Ratan Murty NA, Kanwisher N. A highly selective response to food in human visual cortex revealed by hypothesis-free voxel decomposition. Curr Biol 2022; 32:4159-4171.e9. [PMID: 36027910 PMCID: PMC9561032 DOI: 10.1016/j.cub.2022.08.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022]
Abstract
Prior work has identified cortical regions selectively responsive to specific categories of visual stimuli. However, this hypothesis-driven work cannot reveal how prominent these category selectivities are in the overall functional organization of the visual cortex, or what others might exist that scientists have not thought to look for. Furthermore, standard voxel-wise tests cannot detect distinct neural selectivities that coexist within voxels. To overcome these limitations, we used data-driven voxel decomposition methods to identify the main components underlying fMRI responses to thousands of complex photographic images. Our hypothesis-neutral analysis rediscovered components selective for faces, places, bodies, and words, validating our method and showing that these selectivities are dominant features of the ventral visual pathway. The analysis also revealed an unexpected component with a distinct anatomical distribution that responded highly selectively to images of food. Alternative accounts based on low- to mid-level visual features, such as color, shape, or texture, failed to account for the food selectivity of this component. High-throughput testing and control experiments with matched stimuli on a highly accurate computational model of this component confirm its selectivity for food. We registered our methods and hypotheses before replicating them on held-out participants and in a novel dataset. These findings demonstrate the power of data-driven methods and show that the dominant neural responses of the ventral visual pathway include not only selectivities for faces, scenes, bodies, and words but also the visually heterogeneous category of food, thus constraining accounts of when and why functional specialization arises in the cortex.
Collapse
Affiliation(s)
- Meenakshi Khosla
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - N Apurva Ratan Murty
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nancy Kanwisher
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
49
|
Sp A. Trailblazers in Neuroscience: Using compositionality to understand how parts combine in whole objects. Eur J Neurosci 2022; 56:4378-4392. [PMID: 35760552 PMCID: PMC10084036 DOI: 10.1111/ejn.15746] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/27/2022]
Abstract
A fundamental question for any visual system is whether its image representation can be understood in terms of its components. Decomposing any image into components is challenging because there are many possible decompositions with no common dictionary, and enumerating them leads to a combinatorial explosion. Even in perception, many objects are readily seen as containing parts, but there are many exceptions. These exceptions include objects that are not perceived as containing parts, properties like symmetry that cannot be localized to any single part, and also special categories like words and faces whose perception is widely believed to be holistic. Here, I describe a novel approach we have used to address these issues and evaluate compositionality at the behavioral and neural levels. The key design principle is to create a large number of objects by combining a small number of pre-defined components in all possible ways. This allows for building component-based models that explain whole objects using a combination of these components. Importantly, any systematic error in model fits can be used to detect the presence of emergent or holistic properties. Using this approach, we have found that whole object representations are surprisingly predictable from their components, that some components are preferred to others in perception, and that emergent properties can be discovered or explained using compositional models. Thus, compositionality is a powerful approach for understanding how whole objects relate to their parts.
Collapse
Affiliation(s)
- Arun Sp
- Centre for Neuroscience, Indian Institute of Science Bangalore
| |
Collapse
|
50
|
Kamps FS, Richardson H, Murty NAR, Kanwisher N, Saxe R. Using child-friendly movie stimuli to study the development of face, place, and object regions from age 3 to 12 years. Hum Brain Mapp 2022; 43:2782-2800. [PMID: 35274789 PMCID: PMC9120553 DOI: 10.1002/hbm.25815] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 01/21/2023] Open
Abstract
Scanning young children while they watch short, engaging, commercially-produced movies has emerged as a promising approach for increasing data retention and quality. Movie stimuli also evoke a richer variety of cognitive processes than traditional experiments, allowing the study of multiple aspects of brain development simultaneously. However, because these stimuli are uncontrolled, it is unclear how effectively distinct profiles of brain activity can be distinguished from the resulting data. Here we develop an approach for identifying multiple distinct subject-specific Regions of Interest (ssROIs) using fMRI data collected during movie-viewing. We focused on the test case of higher-level visual regions selective for faces, scenes, and objects. Adults (N = 13) were scanned while viewing a 5.6-min child-friendly movie, as well as a traditional localizer experiment with blocks of faces, scenes, and objects. We found that just 2.7 min of movie data could identify subject-specific face, scene, and object regions. While successful, movie-defined ssROIS still showed weaker domain selectivity than traditional ssROIs. Having validated our approach in adults, we then used the same methods on movie data collected from 3 to 12-year-old children (N = 122). Movie response timecourses in 3-year-old children's face, scene, and object regions were already significantly and specifically predicted by timecourses from the corresponding regions in adults. We also found evidence of continued developmental change, particularly in the face-selective posterior superior temporal sulcus. Taken together, our results reveal both early maturity and functional change in face, scene, and object regions, and more broadly highlight the promise of short, child-friendly movies for developmental cognitive neuroscience.
Collapse
Affiliation(s)
- Frederik S. Kamps
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Hilary Richardson
- School of Philosophy, Psychology and Language SciencesUniversity of EdinburghEdinburghUK
| | - N. Apurva Ratan Murty
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Nancy Kanwisher
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Rebecca Saxe
- Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|