1
|
Bognár A, Nejad GG, Rens G, Raman R, Vogels R. Expanding the stimulus domain: Co-occurrence of motion and body-category selectivity in the macaque ventral STS. Prog Neurobiol 2025; 249:102769. [PMID: 40254177 PMCID: PMC12095119 DOI: 10.1016/j.pneurobio.2025.102769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 04/11/2025] [Accepted: 04/17/2025] [Indexed: 04/22/2025]
Abstract
The primate Superior Temporal Sulcus (STS) plays a pivotal role in the recognition of bodies and their actions, which is essential for survival and social interaction with conspecifics. Here, we show that, surprisingly, a sizable proportion of macaque middle ventral STS units are selective for static bodies and random dot motion. They show a faithful representation of random dot motion direction, with motion directions differing by 180 degrees being represented distinctly, although responding more strongly to complex optic flow patterns. This aligns with an fMRI experiment in which we show that the mid-STS body patch, defined by a greater activation to static bodies compared to faces and objects, is also more strongly activated by moving random dot patterns compared to static ones, especially when including complex optic flow patterns. More anterior ventral STS body-selective units demonstrate a less pronounced random dot motion selectivity and this is mainly for complex optic flow patterns. Moreover, middle STS units, but rarely those of the anterior STS, respond selectively to dynamic dot patterns in which body parts are visible solely through motion, and their preference correlates with those for videos of acting monkeys. Overall, these findings highlight an association between body and motion processing in the macaque ventral STS, which might result from the co-occurrence of body features and motion during the observation of bodily actions.
Collapse
Affiliation(s)
- Anna Bognár
- Deparment of Neurosciences, KU Leuven, Leuven, Belgium; Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Ghazaleh Ghamkhari Nejad
- Deparment of Neurosciences, KU Leuven, Leuven, Belgium; Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Guy Rens
- Deparment of Neurosciences, KU Leuven, Leuven, Belgium; Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Rajani Raman
- Deparment of Neurosciences, KU Leuven, Leuven, Belgium; Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Rufin Vogels
- Deparment of Neurosciences, KU Leuven, Leuven, Belgium; Leuven Brain Institute, KU Leuven, Leuven, Belgium.
| |
Collapse
|
2
|
Hernández-Cámara P, Vila-Tomás J, Laparra V, Malo J. Dissecting the effectiveness of deep features as metric of perceptual image quality. Neural Netw 2025; 185:107189. [PMID: 39874824 DOI: 10.1016/j.neunet.2025.107189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/07/2025] [Accepted: 01/15/2025] [Indexed: 01/30/2025]
Abstract
There is an open debate on the role of artificial networks to understand the visual brain. Internal representations of images in artificial networks develop human-like properties. In particular, evaluating distortions using differences between internal features is correlated to human perception of distortion. However, the origins of this correlation are not well understood. Here, we dissect the different factors involved in the emergence of human-like behavior: function, architecture, and environment. To do so, we evaluate the aforementioned human-network correlation at different depths of 46 pre-trained model configurations that include no psycho-visual information. The results show that most of the models correlate better with human opinion than SSIM (a de-facto standard in subjective image quality). Moreover, some models are better than state-of-the-art networks specifically tuned for the application (LPIPS, DISTS). Regarding the function, supervised classification leads to nets that correlate better with humans than the explored models for self- and non-supervised tasks. However, we found that better performance in the task does not imply more human behavior. Regarding the architecture, simpler models correlate better with humans than very deep nets and generally, the highest correlation is not achieved in the last layer. Finally, regarding the environment, training with large natural datasets leads to bigger correlations than training in smaller databases with restricted content, as expected. We also found that the best classification models are not the best for predicting human distances. In the general debate about understanding human vision, our empirical findings imply that explanations have not to be focused on a single abstraction level, but all function, architecture, and environment are relevant.
Collapse
Affiliation(s)
| | - Jorge Vila-Tomás
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Valero Laparra
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Jesús Malo
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| |
Collapse
|
3
|
Chou CN, Kim R, Arend LA, Yang YY, Mensh BD, Shim WM, Perich MG, Chung S. Geometry Linked to Untangling Efficiency Reveals Structure and Computation in Neural Populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.02.26.582157. [PMID: 40236228 PMCID: PMC11996410 DOI: 10.1101/2024.02.26.582157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
From an eagle spotting a fish in shimmering water to a scientist extracting patterns from noisy data, many cognitive tasks require untangling overlapping signals. Neural circuits achieve this by transforming complex sensory inputs into distinct, separable representations that guide behavior. Data-visualization techniques convey the geometry of these transformations, and decoding approaches quantify performance efficiency. However, we lack a framework for linking these two key aspects. Here we address this gap by introducing a data-driven analysis framework, which we call Geometry Linked to Untangling Efficiency (GLUE) with manifold capacity theory, that links changes in the geometrical properties of neural activity patterns to representational untangling at the computational level. We applied GLUE to over seven neuroscience datasets-spanning multiple organisms, tasks, and recording techniques-and found that task-relevant representations untangle in many domains, including along the cortical hierarchy, through learning, and over the course of intrinsic neural dynamics. Furthermore, GLUE can characterize the underlying geometric mechanisms of representational untangling, and explain how it facilitates efficient and robust computation. Beyond neuroscience, GLUE provides a powerful framework for quantifying information organization in data-intensive fields such as structural genomics and interpretable AI, where analyzing high-dimensional representations remains a fundamental challenge.
Collapse
|
4
|
Failor SW, Carandini M, Harris KD. Visual experience orthogonalizes visual cortical stimulus responses via population code transformation. Cell Rep 2025; 44:115235. [PMID: 39888718 DOI: 10.1016/j.celrep.2025.115235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 09/26/2024] [Accepted: 01/06/2025] [Indexed: 02/02/2025] Open
Abstract
Sensory and behavioral experience can alter visual cortical stimulus coding, but the precise form of this plasticity is unclear. We measured orientation tuning in 4,000-neuron populations of mouse V1 before and after training on a visuomotor task. Changes to single-cell tuning curves appeared complex, including development of asymmetries and of multiple peaks. Nevertheless, these complex tuning curve transformations can be explained by a simple equation: a convex transformation suppressing responses to task stimuli specifically in cells responding at intermediate levels. The strength of the transformation varies across trials, suggesting a dynamic circuit mechanism rather than static synaptic plasticity. The transformation results in sparsening and orthogonalization of population codes for task stimuli. It cannot improve the performance of an optimal stimulus decoder, which is already perfect even for naive codes, but it improves the performance of a suboptimal decoder model with inductive bias as might be found in downstream readout circuits.
Collapse
Affiliation(s)
- Samuel W Failor
- UCL Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK.
| | - Matteo Carandini
- UCL Institute of Ophthalmology, University College London, London EC1V 9EL, UK
| | - Kenneth D Harris
- UCL Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK.
| |
Collapse
|
5
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgments of natural stimuli. Sci Rep 2025; 15:5316. [PMID: 39939679 PMCID: PMC11821992 DOI: 10.1038/s41598-025-88910-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/31/2025] [Indexed: 02/14/2025] Open
Abstract
In natural visually guided behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on blank backgrounds. Natural images, however, contain task-irrelevant background elements that might interfere with the perception of object features. Recent studies suggest that visual feature estimation can be modeled through the linear decoding of task-relevant information from visual cortex. So, if the representations of task-relevant and irrelevant features are not orthogonal in the neural population, then variation in the task-irrelevant features would impair task performance. We tested this hypothesis using human psychophysics and monkey neurophysiology combined with parametrically variable naturalistic stimuli. We demonstrate that (1) the neural representation of one feature (the position of an object) in visual area V4 is orthogonal to those of several background features, (2) the ability of human observers to precisely judge object position was largely unaffected by those background features, and (3) many features of the object and the background (and of objects from a separate stimulus set) are orthogonally represented in V4 neural population responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of object features despite the richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - Amy M Ni
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marlene R Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
6
|
Mukherjee S, Babadi B, Shamma S. Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields. PLoS Comput Biol 2025; 21:e1012721. [PMID: 39746112 PMCID: PMC11774495 DOI: 10.1371/journal.pcbi.1012721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/28/2025] [Accepted: 12/16/2024] [Indexed: 01/04/2025] Open
Abstract
Characterizing neuronal responses to natural stimuli remains a central goal in sensory neuroscience. In auditory cortical neurons, the stimulus selectivity of elicited spiking activity is summarized by a spectrotemporal receptive field (STRF) that relates neuronal responses to the stimulus spectrogram. Though effective in characterizing primary auditory cortical responses, STRFs of non-primary auditory neurons can be quite intricate, reflecting their mixed selectivity. The complexity of non-primary STRFs hence impedes understanding how acoustic stimulus representations are transformed along the auditory pathway. Here, we focus on the relationship between ferret primary auditory cortex (A1) and a secondary region, dorsal posterior ectosylvian gyrus (PEG). We propose estimating receptive fields in PEG with respect to a well-established high-dimensional computational model of primary-cortical stimulus representations. These "cortical receptive fields" (CortRF) are estimated greedily to identify the salient primary-cortical features modulating spiking responses and in turn related to corresponding spectrotemporal features. Hence, they provide biologically plausible hierarchical decompositions of STRFs in PEG. Such CortRF analysis was applied to PEG neuronal responses to speech and temporally orthogonal ripple combination (TORC) stimuli and, for comparison, to A1 neuronal responses. CortRFs of PEG neurons captured their selectivity to more complex spectrotemporal features than A1 neurons; moreover, CortRF models were more predictive of PEG (but not A1) responses to speech. Our results thus suggest that secondary-cortical stimulus representations can be computed as sparse combinations of primary-cortical features that facilitate encoding natural stimuli. Thus, by adding the primary-cortical representation, we can account for PEG single-unit responses to natural sounds better than bypassing it and considering as input the auditory spectrogram. These results confirm with explicit details the presumed hierarchical organization of the auditory cortex.
Collapse
Affiliation(s)
- Shoutik Mukherjee
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Laboratoire des Systèmes Perceptifs, Department des Études Cognitive, École Normale Supériure, Paris Sciences et Lettres University, Paris, France
| |
Collapse
|
7
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2025; 53:219-241. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
8
|
Pandey L, Lee D, Wood SMW, Wood JN. Parallel development of object recognition in newborn chicks and deep neural networks. PLoS Comput Biol 2024; 20:e1012600. [PMID: 39621774 DOI: 10.1371/journal.pcbi.1012600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 12/17/2024] [Accepted: 10/29/2024] [Indexed: 12/18/2024] Open
Abstract
How do newborns learn to see? We propose that visual systems are space-time fitters, meaning visual development can be understood as a blind fitting process (akin to evolution) in which visual systems gradually adapt to the spatiotemporal data distributions in the newborn's environment. To test whether space-time fitting is a viable theory for learning how to see, we performed parallel controlled-rearing experiments on newborn chicks and deep neural networks (DNNs), including CNNs and transformers. First, we raised newborn chicks in impoverished environments containing a single object, then simulated those environments in a video game engine. Second, we recorded first-person images from agents moving through the virtual animal chambers and used those images to train DNNs. Third, we compared the viewpoint-invariant object recognition performance of the chicks and DNNs. When DNNs received the same visual diet (training data) as chicks, the models developed common object recognition skills as chicks. DNNs that used time as a teaching signal-space-time fitters-also showed common patterns of successes and failures across the test viewpoints as chicks. Thus, DNNs can learn object recognition in the same impoverished environments as newborn animals. We argue that space-time fitters can serve as formal scientific models of newborn visual systems, providing image-computable models for studying how newborns learn to see from raw visual experiences.
Collapse
Affiliation(s)
- Lalit Pandey
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
| | - Donsuk Lee
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
| | - Samantha M W Wood
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
- Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America
- Department of Neuroscience, Indiana University, Bloomington, Indiana, United States of America
| | - Justin N Wood
- Informatics Department, Indiana University, Bloomington, Indiana, United States of America
- Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America
- Department of Neuroscience, Indiana University, Bloomington, Indiana, United States of America
- Center for the Integrated Study of Animal Behavior, Indiana University, Bloomington, Indiana, United States of America
| |
Collapse
|
9
|
Han Z, Sereno AB. Understanding Cortical Streams from a Computational Perspective. J Cogn Neurosci 2024; 36:2618-2626. [PMID: 38319677 PMCID: PMC11602005 DOI: 10.1162/jocn_a_02121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The two visual cortical streams hypothesis, which suggests object properties (what) are processed separately from spatial properties (where), has a longstanding history, and much evidence has accumulated to support its conjectures. Nevertheless, in the last few decades, conflicting evidence has mounted that demands some explanation and modification. For example, existence of (1) shape activities (fMRI) or shape selectivities (physiology) in dorsal stream, similar to ventral stream; likewise, spatial activations (fMRI) or spatial selectivities (physiology) in ventral stream, similar to dorsal stream; (2) multiple segregated subpathways within a stream. In addition, the idea of segregation of various aspects of multiple objects in a scene raises questions about how these properties of multiple objects are then properly re-associated or bound back together to accurately perceive, remember, or make decisions. We will briefly review the history of the two-stream hypothesis, discuss competing accounts that challenge current thinking, and propose ideas on why the brain has segregated pathways. We will present ideas based on our own data using artificial neural networks (1) to reveal encoding differences for what and where that arise in a two-pathway neural network, (2) to show how these encoding differences can clarify previous conflicting findings, and (3) to elucidate the computational advantages of segregated pathways. Furthermore, we will discuss whether neural networks need to have multiple subpathways for different visual attributes. We will also discuss the binding problem (how to correctly associate the different attributes of each object together when there are multiple objects each with multiple attributes in a scene) and possible solutions to the binding problem. Finally, we will briefly discuss problems and limitations with existing models and potential fruitful future directions.
Collapse
Affiliation(s)
| | - Anne B Sereno
- Purdue University
- Indiana University School of Medicine
| |
Collapse
|
10
|
St-Yves G, Kay K, Naselaris T. Variation in the geometry of concept manifolds across human visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.26.625280. [PMID: 39651255 PMCID: PMC11623644 DOI: 10.1101/2024.11.26.625280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Brain activity patterns in high-level visual cortex support accurate linear classification of visual concepts (e.g., objects or scenes). It has long been appreciated that the accuracy of linear classification in any brain area depends on the geometry of its concept manifolds-sets of brain activity patterns that encode images of a concept. However, it is unclear how the geometry of concept manifolds differs between regions of visual cortex that support accurate classification and those that don't, or how it differs between visual cortex and deep neural networks (DNNs). We estimated geometric properties of concept manifolds that, per a recent theory, directly determine the accuracy of simple "few-shot" linear classifiers. Using a large fMRI dataset, we show that variation in classification accuracy across human visual cortex is driven by a variation in a single geometric property: the distance between manifold centers ("geometric Signal"). In contrast, variation in classification accuracy across most DNN layers is driven by an increase in the effective number of manifold dimensions ("Dimensionality"). Despite this difference in the geometric properties that affect few-shot classification performance in the brain and DNNs, we find that Signal and Dimensionality are strongly, negatively correlated: when Signal increases across brain regions or DNN layers, Dimensionality decreases, and vice versa. We conclude that visual cortex and DNNs deploy different geometric strategies for accurate linear classification of concepts, even though both are subject to the same constraint.
Collapse
|
11
|
Peng Y, Gong X, Lu H, Fang F. Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers. J Cogn Neurosci 2024; 36:2458-2480. [PMID: 39106158 DOI: 10.1162/jocn_a_02233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.
Collapse
Affiliation(s)
- Yujia Peng
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- Institute for Artificial Intelligence, Peking University, Beijing, People's Republic of China
- National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China
- Department of Psychology, University of California, Los Angeles
| | - Xizi Gong
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
| | - Hongjing Lu
- Department of Psychology, University of California, Los Angeles
- Department of Statistics, University of California, Los Angeles
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, People's Republic of China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, People's Republic of China
| |
Collapse
|
12
|
Li W, Cao D, Li J, Jiang T. Face-Specific Activity in the Ventral Stream Visual Cortex Linked to Conscious Face Perception. Neurosci Bull 2024; 40:1434-1444. [PMID: 38457111 PMCID: PMC11422301 DOI: 10.1007/s12264-024-01185-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 11/25/2023] [Indexed: 03/09/2024] Open
Abstract
When presented with visual stimuli of face images, the ventral stream visual cortex of the human brain exhibits face-specific activity that is modulated by the physical properties of the input images. However, it is still unclear whether this activity relates to conscious face perception. We explored this issue by using the human intracranial electroencephalography technique. Our results showed that face-specific activity in the ventral stream visual cortex was significantly higher when the subjects subjectively saw faces than when they did not, even when face stimuli were presented in both conditions. In addition, the face-specific neural activity exhibited a more reliable neural response and increased posterior-anterior direction information transfer in the "seen" condition than the "unseen" condition. Furthermore, the face-specific neural activity was significantly correlated with performance. These findings support the view that face-specific activity in the ventral stream visual cortex is linked to conscious face perception.
Collapse
Affiliation(s)
- Wenlu Li
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Dan Cao
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jin Li
- School of Psychology, Capital Normal University, Beijing, 100048, China.
| | - Tianzi Jiang
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
- Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou, 311100, China.
- Xiaoxiang Institute for Brain Health and Yongzhou Central Hospital, Yongzhou, 425000, China.
| |
Collapse
|
13
|
Mininni CJ, Zanutto BS. Constructing neural networks with pre-specified dynamics. Sci Rep 2024; 14:18860. [PMID: 39143351 PMCID: PMC11324765 DOI: 10.1038/s41598-024-69747-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 08/08/2024] [Indexed: 08/16/2024] Open
Abstract
A main goal in neuroscience is to understand the computations carried out by neural populations that give animals their cognitive skills. Neural network models allow to formulate explicit hypotheses regarding the algorithms instantiated in the dynamics of a neural population, its firing statistics, and the underlying connectivity. Neural networks can be defined by a small set of parameters, carefully chosen to procure specific capabilities, or by a large set of free parameters, fitted with optimization algorithms that minimize a given loss function. In this work we alternatively propose a method to make a detailed adjustment of the network dynamics and firing statistic to better answer questions that link dynamics, structure, and function. Our algorithm-termed generalised Firing-to-Parameter (gFTP)-provides a way to construct binary recurrent neural networks whose dynamics strictly follows a user pre-specified transition graph that details the transitions between population firing states triggered by stimulus presentations. Our main contribution is a procedure that detects when a transition graph is not realisable in terms of a neural network, and makes the necessary modifications in order to obtain a new transition graph that is realisable and preserves all the information encoded in the transitions of the original graph. With a realisable transition graph, gFTP assigns values to the network firing states associated with each node in the graph, and finds the synaptic weight matrices by solving a set of linear separation problems. We test gFTP performance by constructing networks with random dynamics, continuous attractor-like dynamics that encode position in 2-dimensional space, and discrete attractor dynamics. We then show how gFTP can be employed as a tool to explore the link between structure, function, and the algorithms instantiated in the network dynamics.
Collapse
Affiliation(s)
- Camilo J Mininni
- Instituto de Biología y Medicina Experimental, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina.
| | - B Silvano Zanutto
- Instituto de Biología y Medicina Experimental, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Instituto de Ingeniería Biomédica, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
14
|
Beck DW, Heaton CN, Davila LD, Rakocevic LI, Drammis SM, Tyulmankov D, Vara P, Giri A, Umashankar Beck S, Zhang Q, Pokojovy M, Negishi K, Batson SA, Salcido AA, Reyes NF, Macias AY, Ibanez-Alcala RJ, Hossain SB, Waller GL, O'Dell LE, Moschak TM, Goosens KA, Friedman A. Model of a striatal circuit exploring biological mechanisms underlying decision-making during normal and disordered states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605535. [PMID: 39211231 PMCID: PMC11361035 DOI: 10.1101/2024.07.29.605535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Decision-making requires continuous adaptation to internal and external contexts. Changes in decision-making are reliable transdiagnostic symptoms of neuropsychiatric disorders. We created a computational model demonstrating how the striosome compartment of the striatum constructs a mathematical space for decision-making computations depending on context, and how the matrix compartment defines action value depending on the space. The model explains multiple experimental results and unifies other theories like reward prediction error, roles of the direct versus indirect pathways, and roles of the striosome versus matrix, under one framework. We also found, through new analyses, that striosome and matrix neurons increase their synchrony during difficult tasks, caused by a necessary increase in dimensionality of the space. The model makes testable predictions about individual differences in disorder susceptibility, decision-making symptoms shared among neuropsychiatric disorders, and differences in neuropsychiatric disorder symptom presentation. The model reframes the role of the striosomal circuit in neuroeconomic and disorder-affected decision-making. Highlights Striosomes prioritize decision-related data used by matrix to set action values. Striosomes and matrix have different roles in the direct and indirect pathways. Abnormal information organization/valuation alters disorder presentation. Variance in data prioritization may explain individual differences in disorders. eTOC Beck et al. developed a computational model of how a striatal circuit functions during decision-making. The model unifies and extends theories about the direct versus indirect pathways. It further suggests how aberrant circuit function underlies decision-making phenomena observed in neuropsychiatric disorders.
Collapse
|
15
|
Jurewicz K, Sleezer BJ, Mehta PS, Hayden BY, Ebitz RB. Irrational choices via a curvilinear representational geometry for value. Nat Commun 2024; 15:6424. [PMID: 39080250 PMCID: PMC11289086 DOI: 10.1038/s41467-024-49568-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/06/2024] [Indexed: 08/02/2024] Open
Abstract
We make decisions by comparing values, but it is not yet clear how value is represented in the brain. Many models assume, if only implicitly, that the representational geometry of value is linear. However, in part due to a historical focus on noisy single neurons, rather than neuronal populations, this hypothesis has not been rigorously tested. Here, we examine the representational geometry of value in the ventromedial prefrontal cortex (vmPFC), a part of the brain linked to economic decision-making, in two male rhesus macaques. We find that values are encoded along a curved manifold in vmPFC. This curvilinear geometry predicts a specific pattern of irrational decision-making: that decision-makers will make worse choices when an irrelevant, decoy option is worse in value, compared to when it is better. We observe this type of irrational choices in behavior. Together, these results not only suggest that the representational geometry of value is nonlinear, but that this nonlinearity could impose bounds on rational decision-making.
Collapse
Affiliation(s)
- Katarzyna Jurewicz
- Department of Neurosciences, Faculté de médecine, and Centre interdisciplinaire de recherche sur le cerveau et l'apprentissage, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, Faculty of Medicine and Health Sciences, McGill University, Montréal, QC, Canada
| | - Brianna J Sleezer
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
| | - Priyanka S Mehta
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
- Psychology Program, Department of Human Behavior, Justice, and Diversity, University of Wisconsin, Superior, Superior, WI, USA
| | - Benjamin Y Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - R Becket Ebitz
- Department of Neurosciences, Faculté de médecine, and Centre interdisciplinaire de recherche sur le cerveau et l'apprentissage, Université de Montréal, Montréal, QC, Canada.
| |
Collapse
|
16
|
Quaia C, Krauzlis RJ. Object recognition in primates: what can early visual areas contribute? Front Behav Neurosci 2024; 18:1425496. [PMID: 39070778 PMCID: PMC11272660 DOI: 10.3389/fnbeh.2024.1425496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 07/01/2024] [Indexed: 07/30/2024] Open
Abstract
Introduction If neuroscientists were asked which brain area is responsible for object recognition in primates, most would probably answer infero-temporal (IT) cortex. While IT is likely responsible for fine discriminations, and it is accordingly dominated by foveal visual inputs, there is more to object recognition than fine discrimination. Importantly, foveation of an object of interest usually requires recognizing, with reasonable confidence, its presence in the periphery. Arguably, IT plays a secondary role in such peripheral recognition, and other visual areas might instead be more critical. Methods To investigate how signals carried by early visual processing areas (such as LGN and V1) could be used for object recognition in the periphery, we focused here on the task of distinguishing faces from non-faces. We tested how sensitive various models were to nuisance parameters, such as changes in scale and orientation of the image, and the type of image background. Results We found that a model of V1 simple or complex cells could provide quite reliable information, resulting in performance better than 80% in realistic scenarios. An LGN model performed considerably worse. Discussion Because peripheral recognition is both crucial to enable fine recognition (by bringing an object of interest on the fovea), and probably sufficient to account for a considerable fraction of our daily recognition-guided behavior, we think that the current focus on area IT and foveal processing is too narrow. We propose that rather than a hierarchical system with IT-like properties as its primary aim, object recognition should be seen as a parallel process, with high-accuracy foveal modules operating in parallel with lower-accuracy and faster modules that can operate across the visual field.
Collapse
Affiliation(s)
- Christian Quaia
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD, United States
| | | |
Collapse
|
17
|
Lindsey JW, Issa EB. Factorized visual representations in the primate visual system and deep neural networks. eLife 2024; 13:RP91685. [PMID: 38968311 PMCID: PMC11226229 DOI: 10.7554/elife.91685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024] Open
Abstract
Object classification has been proposed as a principal objective of the primate ventral visual stream and has been used as an optimization target for deep neural network models (DNNs) of the visual system. However, visual brain areas represent many different types of information, and optimizing for classification of object identity alone does not constrain how other information may be encoded in visual representations. Information about different scene parameters may be discarded altogether ('invariance'), represented in non-interfering subspaces of population activity ('factorization') or encoded in an entangled fashion. In this work, we provide evidence that factorization is a normative principle of biological visual representations. In the monkey ventral visual hierarchy, we found that factorization of object pose and background information from object identity increased in higher-level regions and strongly contributed to improving object identity decoding performance. We then conducted a large-scale analysis of factorization of individual scene parameters - lighting, background, camera viewpoint, and object pose - in a diverse library of DNN models of the visual system. Models which best matched neural, fMRI, and behavioral data from both monkeys and humans across 12 datasets tended to be those which factorized scene parameters most strongly. Notably, invariance to these parameters was not as consistently associated with matches to neural and behavioral data, suggesting that maintaining non-class information in factorized activity subspaces is often preferred to dropping it altogether. Thus, we propose that factorization of visual scene information is a widely used strategy in brains and DNN models thereof.
Collapse
Affiliation(s)
- Jack W Lindsey
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| | - Elias B Issa
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
| |
Collapse
|
18
|
Ostojic S, Fusi S. Computational role of structure in neural activity and connectivity. Trends Cogn Sci 2024; 28:677-690. [PMID: 38553340 DOI: 10.1016/j.tics.2024.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 02/29/2024] [Accepted: 03/07/2024] [Indexed: 07/05/2024]
Abstract
One major challenge of neuroscience is identifying structure in seemingly disorganized neural activity. Different types of structure have different computational implications that can help neuroscientists understand the functional role of a particular brain area. Here, we outline a unified approach to characterize structure by inspecting the representational geometry and the modularity properties of the recorded activity and show that a similar approach can also reveal structure in connectivity. We start by setting up a general framework for determining geometry and modularity in activity and connectivity and relating these properties with computations performed by the network. We then use this framework to review the types of structure found in recent studies of model networks performing three classes of computations.
Collapse
Affiliation(s)
- Srdjan Ostojic
- Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole Normale Superieure - PSL Research University, 75005 Paris, France.
| | - Stefano Fusi
- Center for Theoretical Neuroscience, Columbia University, New York, NY, USA; Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Neuroscience, Columbia University, New York, NY, USA; Kavli Institute for Brain Science, Columbia University, New York, NY, USA
| |
Collapse
|
19
|
Djambazovska S, Zafer A, Ramezanpour H, Kreiman G, Kar K. The Impact of Scene Context on Visual Object Recognition: Comparing Humans, Monkeys, and Computational Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596127. [PMID: 38854011 PMCID: PMC11160639 DOI: 10.1101/2024.05.27.596127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
During natural vision, we rarely see objects in isolation but rather embedded in rich and complex contexts. Understanding how the brain recognizes objects in natural scenes by integrating contextual information remains a key challenge. To elucidate neural mechanisms compatible with human visual processing, we need an animal model that behaves similarly to humans, so that inferred neural mechanisms can provide hypotheses relevant to the human brain. Here we assessed whether rhesus macaques could model human context-driven object recognition by quantifying visual object identification abilities across variations in the amount, quality, and congruency of contextual cues. Behavioral metrics revealed strikingly similar context-dependent patterns between humans and monkeys. However, neural responses in the inferior temporal (IT) cortex of monkeys that were never explicitly trained to discriminate objects in context, as well as current artificial neural network models, could only partially explain this cross-species correspondence. The shared behavioral variance unexplained by context-naive neural data or computational models highlights fundamental knowledge gaps. Our findings demonstrate an intriguing alignment of human and monkey visual object processing that defies full explanation by either brain activity in a key visual region or state-of-the-art models.
Collapse
Affiliation(s)
- Sara Djambazovska
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
- Children’s Hospital, Harvard Medical School, MA, USA
| | - Anaa Zafer
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | - Hamidreza Ramezanpour
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| | | | - Kohitij Kar
- York University, Department of Biology and Centre for Vision Research, Toronto, Canada
| |
Collapse
|
20
|
Rolls ET. Two what, two where, visual cortical streams in humans. Neurosci Biobehav Rev 2024; 160:105650. [PMID: 38574782 DOI: 10.1016/j.neubiorev.2024.105650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 03/25/2024] [Accepted: 03/31/2024] [Indexed: 04/06/2024]
Abstract
ROLLS, E. T. Two What, Two Where, Visual Cortical Streams in Humans. NEUROSCI BIOBEHAV REV 2024. Recent cortical connectivity investigations lead to new concepts about 'What' and 'Where' visual cortical streams in humans, and how they connect to other cortical systems. A ventrolateral 'What' visual stream leads to the inferior temporal visual cortex for object and face identity, and provides 'What' information to the hippocampal episodic memory system, the anterior temporal lobe semantic system, and the orbitofrontal cortex emotion system. A superior temporal sulcus (STS) 'What' visual stream utilising connectivity from the temporal and parietal visual cortex responds to moving objects and faces, and face expression, and connects to the orbitofrontal cortex for emotion and social behaviour. A ventromedial 'Where' visual stream builds feature combinations for scenes, and provides 'Where' inputs via the parahippocampal scene area to the hippocampal episodic memory system that are also useful for landmark-based navigation. The dorsal 'Where' visual pathway to the parietal cortex provides for actions in space, but also provides coordinate transforms to provide inputs to the parahippocampal scene area for self-motion update of locations in scenes in the dark or when the view is obscured.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK; Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK; Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China.
| |
Collapse
|
21
|
Campbell A, Tanaka JW. Fast saccades to faces during the feedforward sweep. J Vis 2024; 24:16. [PMID: 38630459 PMCID: PMC11037494 DOI: 10.1167/jov.24.4.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 09/19/2023] [Indexed: 04/19/2024] Open
Abstract
Saccadic choice tasks use eye movements as a response method, typically in a task where observers are asked to saccade as quickly as possible to an image of a prespecified target category. Using this approach, face-selective saccades have been observed within 100 ms poststimulus. When taking into account oculomotor processing, this suggests that faces can be detected in as little as 70 to 80 ms. It has therefore been suggested that face detection must occur during the initial feedforward sweep, since this latency leaves little time for feedback processing. In the current experiment, we tested this hypothesis using backward masking-a technique shown to primarily disrupt feedback processing while leaving feedforward activation relatively intact. Based on minimum saccadic reaction time, we found that face detection benefited from ultra-fast, accurate saccades within 110 to 160 ms and that these eye movements are obtainable even under extreme masking conditions that limit perceptual awareness. However, masking did significantly increase the median SRT for faces. In the manual responses, we found remarkable detection accuracy for faces and houses, even when participants indicated having no visual experience of the test images. These results provide evidence for the view that the saccadic bias to faces is initiated by coarse information used to categorize faces in the feedforward sweep but that, in most cases, additional processing is required to quickly reach the threshold for saccade initiation.
Collapse
Affiliation(s)
- Alison Campbell
- Department of Psychology, University of Victoria, Victoria, BC, Canada
- https://orcid.org/0000-0001-6891-8609
| | - James W Tanaka
- Department of Psychology, University of Victoria, Victoria, BC, Canada
- https://orcid.org/0000-0001-6559-0388
| |
Collapse
|
22
|
Bi Z, Li H, Tian L. Top-down generation of low-resolution representations improves visual perception and imagination. Neural Netw 2024; 171:440-456. [PMID: 38150870 DOI: 10.1016/j.neunet.2023.12.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 11/30/2023] [Accepted: 12/18/2023] [Indexed: 12/29/2023]
Abstract
Perception or imagination requires top-down signals from high-level cortex to primary visual cortex (V1) to reconstruct or simulate the representations bottom-up stimulated by the seen images. Interestingly, top-down signals in V1 have lower spatial resolution than bottom-up representations. It is unclear why the brain uses low-resolution signals to reconstruct or simulate high-resolution representations. By modeling the top-down pathway of the visual system using the decoder of a variational auto-encoder (VAE), we reveal that low-resolution top-down signals can better reconstruct or simulate the information contained in the sparse activities of V1 simple cells, which facilitates perception and imagination. This advantage of low-resolution generation is related to facilitating high-level cortex to form geometry-respecting representations observed in experiments. Furthermore, we present two findings regarding this phenomenon in the context of AI-generated sketches, a style of drawings made of lines. First, we found that the quality of the generated sketches critically depends on the thickness of the lines in the sketches: thin-line sketches are harder to generate than thick-line sketches. Second, we propose a technique to generate high-quality thin-line sketches: instead of directly using original thin-line sketches, we use blurred sketches to train VAE or GAN (generative adversarial network), and then infer the thin-line sketches from the VAE- or GAN-generated blurred sketches. Collectively, our work suggests that low-resolution top-down generation is a strategy the brain uses to improve visual perception and imagination, which inspires new sketch-generation AI techniques.
Collapse
Affiliation(s)
- Zedong Bi
- Lingang Laboratory, Shanghai 200031, China.
| | - Haoran Li
- Department of Physics, Hong Kong Baptist University, Hong Kong, China
| | - Liang Tian
- Department of Physics, Hong Kong Baptist University, Hong Kong, China; Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong, China; Institute of Systems Medicine and Health Sciences, Hong Kong Baptist University, Hong Kong, China; State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong, China.
| |
Collapse
|
23
|
Monosov IE. Curiosity: primate neural circuits for novelty and information seeking. Nat Rev Neurosci 2024; 25:195-208. [PMID: 38263217 DOI: 10.1038/s41583-023-00784-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2023] [Indexed: 01/25/2024]
Abstract
For many years, neuroscientists have investigated the behavioural, computational and neurobiological mechanisms that support value-based decisions, revealing how humans and animals make choices to obtain rewards. However, many decisions are influenced by factors other than the value of physical rewards or second-order reinforcers (such as money). For instance, animals (including humans) frequently explore novel objects that have no intrinsic value solely because they are novel and they exhibit the desire to gain information to reduce their uncertainties about the future, even if this information cannot lead to reward or assist them in accomplishing upcoming tasks. In this Review, I discuss how circuits in the primate brain responsible for detecting, predicting and assessing novelty and uncertainty regulate behaviour and give rise to these behavioural components of curiosity. I also briefly discuss how curiosity-related behaviours arise during postnatal development and point out some important reasons for the persistence of curiosity across generations.
Collapse
Affiliation(s)
- Ilya E Monosov
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Electrical Engineering, Washington University, St. Louis, MO, USA.
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA.
- Department of Neurosurgery, Washington University, St. Louis, MO, USA.
- Pain Center, Washington University, St. Louis, MO, USA.
| |
Collapse
|
24
|
Machida I, Shishikura M, Yamane Y, Sakai K. Representation of Natural Contours by a Neural Population in Monkey V4. eNeuro 2024; 11:ENEURO.0445-23.2024. [PMID: 38423791 PMCID: PMC10946029 DOI: 10.1523/eneuro.0445-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
The cortical visual area, V4, has been considered to code contours that contribute to the intermediate-level representation of objects. The neural responses to the complex contour features intrinsic to natural contours are expected to clarify the essence of the representation. To approach the cortical coding of natural contours, we investigated the simultaneous coding of multiple contour features in monkey (Macaca fuscata) V4 neurons and their population-level representation. A substantial number of neurons showed significant tuning for two or more features such as curvature and closure, indicating that a substantial number of V4 neurons simultaneously code multiple contour features. A large portion of the neurons responded vigorously to acutely curved contours that surrounded the center of classical receptive field, suggesting that V4 neurons tend to code prominent features of object contours. The analysis of mutual information (MI) between the neural responses and each contour feature showed that most neurons exhibited similar magnitudes for each type of MI, indicating that many neurons showing the responses depended on multiple contour features. We next examined the population-level representation by using multidimensional scaling analysis. The neural preferences to the multiple contour features and that to natural stimuli compared with silhouette stimuli increased along with the primary and secondary axes, respectively, indicating the contribution of the multiple contour features and surface textures in the population responses. Our analyses suggested that V4 neurons simultaneously code multiple contour features in natural images and represent contour and surface properties in population.
Collapse
Affiliation(s)
- Itsuki Machida
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| | - Motofumi Shishikura
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| | - Yukako Yamane
- Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan
| | - Ko Sakai
- Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japan
| |
Collapse
|
25
|
Yildirim I, Siegel MH, Soltani AA, Ray Chaudhuri S, Tenenbaum JB. Perception of 3D shape integrates intuitive physics and analysis-by-synthesis. Nat Hum Behav 2024; 8:320-335. [PMID: 37996497 DOI: 10.1038/s41562-023-01759-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 10/12/2023] [Indexed: 11/25/2023]
Abstract
Many surface cues support three-dimensional shape perception, but humans can sometimes still see shape when these features are missing-such as when an object is covered with a draped cloth. Here we propose a framework for three-dimensional shape perception that explains perception in both typical and atypical cases as analysis-by-synthesis, or inference in a generative model of image formation. The model integrates intuitive physics to explain how shape can be inferred from the deformations it causes to other objects, as in cloth draping. Behavioural and computational studies comparing this account with several alternatives show that it best matches human observers (total n = 174) in both accuracy and response times, and is the only model that correlates significantly with human performance on difficult discriminations. We suggest that bottom-up deep neural network models are not fully adequate accounts of human shape perception, and point to how machine vision systems might achieve more human-like robustness.
Collapse
Affiliation(s)
- Ilker Yildirim
- Department of Psychology, Yale University, New Haven, CT, USA.
- Department of Statistics & Data Science, Yale University, New Haven, CT, USA.
- Wu-Tsai Institute, Yale University, New Haven, CT, USA.
| | - Max H Siegel
- Department of Brain & Cognitive Sciences, MIT, Cambridge, MA, USA.
- The Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
| | - Amir A Soltani
- Department of Brain & Cognitive Sciences, MIT, Cambridge, MA, USA
- The Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
| | | | - Joshua B Tenenbaum
- Department of Brain & Cognitive Sciences, MIT, Cambridge, MA, USA.
- The Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
| |
Collapse
|
26
|
Bi Z. Cognition of Time and Thinking Beyond. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:171-195. [PMID: 38918352 DOI: 10.1007/978-3-031-60183-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
A common research protocol in cognitive neuroscience is to train subjects to perform deliberately designed experiments while recording brain activity, with the aim of understanding the brain mechanisms underlying cognition. However, how the results of this protocol of research can be applied in technology is seldom discussed. Here, I review the studies on time processing of the brain as examples of this research protocol, as well as two main application areas of neuroscience (neuroengineering and brain-inspired artificial intelligence). Time processing is a fundamental dimension of cognition, and time is also an indispensable dimension of any real-world signal to be processed in technology. Therefore, one may expect that the studies of time processing in cognition profoundly influence brain-related technology. Surprisingly, I found that the results from cognitive studies on timing processing are hardly helpful in solving practical problems. This awkward situation may be due to the lack of generalizability of the results of cognitive studies, which are under well-controlled laboratory conditions, to real-life situations. This lack of generalizability may be rooted in the fundamental unknowability of the world (including cognition). Overall, this paper questions and criticizes the usefulness and prospect of the abovementioned research protocol of cognitive neuroscience. I then give three suggestions for future research. First, to improve the generalizability of research, it is better to study brain activity under real-life conditions instead of in well-controlled laboratory experiments. Second, to overcome the unknowability of the world, we can engineer an easily accessible surrogate of the object under investigation, so that we can predict the behavior of the object under investigation by experimenting on the surrogate. Third, the paper calls for technology-oriented research, with the aim of technology creation instead of knowledge discovery.
Collapse
Affiliation(s)
- Zedong Bi
- Lingang Laboratory, Shanghai, China.
- Institute for Future, Qingdao University, Qingdao, China.
- School of Automation, Shandong Key Laboratory of Industrial Control Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
27
|
Feather J, Leclerc G, Mądry A, McDermott JH. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat Neurosci 2023; 26:2017-2034. [PMID: 37845543 PMCID: PMC10620097 DOI: 10.1038/s41593-023-01442-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/29/2023] [Indexed: 10/18/2023]
Abstract
Deep neural network models of sensory systems are often proposed to learn representational transformations with invariances like those in the brain. To reveal these invariances, we generated 'model metamers', stimuli whose activations within a model stage are matched to those of a natural stimulus. Metamers for state-of-the-art supervised and unsupervised neural network models of vision and audition were often completely unrecognizable to humans when generated from late model stages, suggesting differences between model and human invariances. Targeted model changes improved human recognizability of model metamers but did not eliminate the overall human-model discrepancy. The human recognizability of a model's metamers was well predicted by their recognizability by other models, suggesting that models contain idiosyncratic invariances in addition to those required by the task. Metamer recognizability dissociated from both traditional brain-based benchmarks and adversarial vulnerability, revealing a distinct failure mode of existing sensory models and providing a complementary benchmark for model assessment.
Collapse
Affiliation(s)
- Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Computational Neuroscience, Flatiron Institute, Cambridge, MA, USA.
| | - Guillaume Leclerc
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aleksander Mądry
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
28
|
Nayebi A, Kong NCL, Zhuang C, Gardner JL, Norcia AM, Yamins DLK. Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation. PLoS Comput Biol 2023; 19:e1011506. [PMID: 37782673 PMCID: PMC10569538 DOI: 10.1371/journal.pcbi.1011506] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 10/12/2023] [Accepted: 09/11/2023] [Indexed: 10/04/2023] Open
Abstract
Studies of the mouse visual system have revealed a variety of visual brain areas that are thought to support a multitude of behavioral capacities, ranging from stimulus-reward associations, to goal-directed navigation, and object-centric discriminations. However, an overall understanding of the mouse's visual cortex, and how it supports a range of behaviors, remains unknown. Here, we take a computational approach to help address these questions, providing a high-fidelity quantitative model of mouse visual cortex and identifying key structural and functional principles underlying that model's success. Structurally, we find that a comparatively shallow network structure with a low-resolution input is optimal for modeling mouse visual cortex. Our main finding is functional-that models trained with task-agnostic, self-supervised objective functions based on the concept of contrastive embeddings are much better matches to mouse cortex, than models trained on supervised objectives or alternative self-supervised methods. This result is very much unlike in primates where prior work showed that the two were roughly equivalent, naturally leading us to ask the question of why these self-supervised objectives are better matches than supervised ones in mouse. To this end, we show that the self-supervised, contrastive objective builds a general-purpose visual representation that enables the system to achieve better transfer on out-of-distribution visual scene understanding and reward-based navigation tasks. Our results suggest that mouse visual cortex is a low-resolution, shallow network that makes best use of the mouse's limited resources to create a light-weight, general-purpose visual system-in contrast to the deep, high-resolution, and more categorization-dominated visual system of primates.
Collapse
Affiliation(s)
- Aran Nayebi
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Neurosciences Ph.D. Program, Stanford University, Stanford, California, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America
| | - Nathan C. L. Kong
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Psychology, Stanford University, Stanford, California, United States of America
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America
| | - Chengxu Zhuang
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States of America
- Department of Psychology, Stanford University, Stanford, California, United States of America
| | - Justin L. Gardner
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Psychology, Stanford University, Stanford, California, United States of America
| | - Anthony M. Norcia
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Psychology, Stanford University, Stanford, California, United States of America
| | - Daniel L. K. Yamins
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Psychology, Stanford University, Stanford, California, United States of America
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| |
Collapse
|
29
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
30
|
Johnston WJ, Freedman DJ. Redundant representations are required to disambiguate simultaneously presented complex stimuli. PLoS Comput Biol 2023; 19:e1011327. [PMID: 37556470 PMCID: PMC10442167 DOI: 10.1371/journal.pcbi.1011327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/21/2023] [Accepted: 07/04/2023] [Indexed: 08/11/2023] Open
Abstract
A pedestrian crossing a street during rush hour often looks and listens for potential danger. When they hear several different horns, they localize the cars that are honking and decide whether or not they need to modify their motor plan. How does the pedestrian use this auditory information to pick out the corresponding cars in visual space? The integration of distributed representations like these is called the assignment problem, and it must be solved to integrate distinct representations across but also within sensory modalities. Here, we identify and analyze a solution to the assignment problem: the representation of one or more common stimulus features in pairs of relevant brain regions-for example, estimates of the spatial position of cars are represented in both the visual and auditory systems. We characterize how the reliability of this solution depends on different features of the stimulus set (e.g., the size of the set and the complexity of the stimuli) and the details of the split representations (e.g., the precision of each stimulus representation and the amount of overlapping information). Next, we implement this solution in a biologically plausible receptive field code and show how constraints on the number of neurons and spikes used by the code force the brain to navigate a tradeoff between local and catastrophic errors. We show that, when many spikes and neurons are available, representing stimuli from a single sensory modality can be done more reliably across multiple brain regions, despite the risk of assignment errors. Finally, we show that a feedforward neural network can learn the optimal solution to the assignment problem, even when it receives inputs in two distinct representational formats. We also discuss relevant results on assignment errors from the human working memory literature and show that several key predictions of our theory already have support.
Collapse
Affiliation(s)
- W. Jeffrey Johnston
- Graduate Program in Computational Neuroscience and the Department of Neurobiology, The University of Chicago, Chicago, Illinois, United States of America
- Center for Theoretical Neuroscience and Mortimer B. Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, New York, United States of America
| | - David J. Freedman
- Graduate Program in Computational Neuroscience and the Department of Neurobiology, The University of Chicago, Chicago, Illinois, United States of America
- Neuroscience Institute, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
31
|
Li D, Chang L. Representational geometry of incomplete faces in macaque face patches. Cell Rep 2023; 42:112673. [PMID: 37342911 DOI: 10.1016/j.celrep.2023.112673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 04/23/2023] [Accepted: 06/06/2023] [Indexed: 06/23/2023] Open
Abstract
The neural code of faces has been intensively studied in the macaque face patch system. Although the majority of previous studies used complete faces as stimuli, faces are often seen partially in daily life. Here, we investigated how face-selective cells represent two types of incomplete faces: face fragments and occluded faces, with the location of the fragment/occluder and the facial features systematically varied. Contrary to popular belief, we found that the preferred face regions identified with two stimulus types are dissociated in many face cells. This dissociation can be explained by the nonlinear integration of information from different face parts and is closely related to a curved representation of face completeness in the state space, which allows a clear discrimination between different stimulus types. Furthermore, identity-related facial features are represented in a subspace orthogonal to the nonlinear dimension of face completeness, supporting a condition-general code of facial identity.
Collapse
Affiliation(s)
- Dongyuan Li
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Le Chang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
32
|
Schwartz E, Alreja A, Richardson RM, Ghuman A, Anzellotti S. Intracranial Electroencephalography and Deep Neural Networks Reveal Shared Substrates for Representations of Face Identity and Expressions. J Neurosci 2023; 43:4291-4303. [PMID: 37142430 PMCID: PMC10255163 DOI: 10.1523/jneurosci.1277-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/25/2023] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open
Abstract
According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n = 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested-even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression.SIGNIFICANCE STATEMENT Previous work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| | - Arish Alreja
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114
- Harvard Medical School, Boston, Massachusetts 02115
| | - Avniel Ghuman
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| |
Collapse
|
33
|
Watanabe N, Miyoshi K, Jimura K, Shimane D, Keerativittayayut R, Nakahara K, Takeda M. Multimodal deep neural decoding reveals highly resolved spatiotemporal profile of visual object representation in humans. Neuroimage 2023; 275:120164. [PMID: 37169115 DOI: 10.1016/j.neuroimage.2023.120164] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 05/02/2023] [Accepted: 05/09/2023] [Indexed: 05/13/2023] Open
Abstract
Perception and categorization of objects in a visual scene are essential to grasp the surrounding situation. Recently, neural decoding schemes, such as machine learning in functional magnetic resonance imaging (fMRI), has been employed to elucidate the underlying neural mechanisms. However, it remains unclear as to how spatially distributed brain regions temporally represent visual object categories and sub-categories. One promising strategy to address this issue is neural decoding with concurrently obtained neural response data of high spatial and temporal resolution. In this study, we explored the spatial and temporal organization of visual object representations using concurrent fMRI and electroencephalography (EEG), combined with neural decoding using deep neural networks (DNNs). We hypothesized that neural decoding by multimodal neural data with DNN would show high classification performance in visual object categorization (faces or non-face objects) and sub-categorization within faces and objects. Visualization of the fMRI DNN was more sensitive than that in the univariate approach and revealed that visual categorization occurred in brain-wide regions. Interestingly, the EEG DNN valued the earlier phase of neural responses for categorization and the later phase of neural responses for sub-categorization. Combination of the two DNNs improved the classification performance for both categorization and sub-categorization compared with fMRI DNN or EEG DNN alone. These deep learning-based results demonstrate a categorization principle in which visual objects are represented in a spatially organized and coarse-to-fine manner, and provide strong evidence of the ability of multimodal deep learning to uncover spatiotemporal neural machinery in sensory processing.
Collapse
Affiliation(s)
- Noriya Watanabe
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Kosuke Miyoshi
- Narrative Nights, Inc., Yokohama, Kanagawa, 236-0011, Japan
| | - Koji Jimura
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Department of Informatics, Gunma University, Maebashi, Gunma, 371-8510, Japan
| | - Daisuke Shimane
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Ruedeerat Keerativittayayut
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Chulabhorn Royal Academy, Bangkok, 10210, Thailand
| | - Kiyoshi Nakahara
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Masaki Takeda
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan.
| |
Collapse
|
34
|
Graumann M, Wallenwein LA, Cichy RM. Independent spatiotemporal effects of spatial attention and background clutter on human object location representations. Neuroimage 2023; 272:120053. [PMID: 36966853 PMCID: PMC10112276 DOI: 10.1016/j.neuroimage.2023.120053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/21/2023] [Accepted: 03/23/2023] [Indexed: 04/04/2023] Open
Abstract
Spatial attention helps us to efficiently localize objects in cluttered environments. However, the processing stage at which spatial attention modulates object location representations remains unclear. Here we investigated this question identifying processing stages in time and space in an EEG and fMRI experiment respectively. As both object location representations and attentional effects have been shown to depend on the background on which objects appear, we included object background as an experimental factor. During the experiments, human participants viewed images of objects appearing in different locations on blank or cluttered backgrounds while either performing a task on fixation or on the periphery to direct their covert spatial attention away or towards the objects. We used multivariate classification to assess object location information. Consistent across the EEG and fMRI experiment, we show that spatial attention modulated location representations during late processing stages (>150 ms, in middle and high ventral visual stream areas) independent of background condition. Our results clarify the processing stage at which attention modulates object location representations in the ventral visual stream and show that attentional modulation is a cognitive process separate from recurrent processes related to the processing of objects on cluttered backgrounds.
Collapse
Affiliation(s)
- Monika Graumann
- Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany; Berlin School of Mind and Brain, Faculty of Philosophy, Humboldt-Universität zu Berlin, 10117 Berlin, Germany.
| | - Lara A Wallenwein
- Department of Psychology, Universität Konstanz, 78457 Konstanz, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany; Berlin School of Mind and Brain, Faculty of Philosophy, Humboldt-Universität zu Berlin, 10117 Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, 10115 Berlin, Germany
| |
Collapse
|
35
|
Taylor J, Xu Y. Comparing the Dominance of Color and Form Information across the Human Ventral Visual Pathway and Convolutional Neural Networks. J Cogn Neurosci 2023; 35:816-840. [PMID: 36877074 PMCID: PMC11283826 DOI: 10.1162/jocn_a_01979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
Color and form information can be decoded in every region of the human ventral visual hierarchy, and at every layer of many convolutional neural networks (CNNs) trained to recognize objects, but how does the coding strength of these features vary over processing? Here, we characterize for these features both their absolute coding strength-how strongly each feature is represented independent of the other feature-and their relative coding strength-how strongly each feature is encoded relative to the other, which could constrain how well a feature can be read out by downstream regions across variation in the other feature. To quantify relative coding strength, we define a measure called the form dominance index that compares the relative influence of color and form on the representational geometry at each processing stage. We analyze brain and CNN responses to stimuli varying based on color and either a simple form feature, orientation, or a more complex form feature, curvature. We find that while the brain and CNNs largely differ in how the absolute coding strength of color and form vary over processing, comparing them in terms of their relative emphasis of these features reveals a striking similarity: For both the brain and for CNNs trained for object recognition (but not for untrained CNNs), orientation information is increasingly de-emphasized, and curvature information is increasingly emphasized, relative to color information over processing, with corresponding processing stages showing largely similar values of the form dominance index.
Collapse
|
36
|
He BJ. Towards a pluralistic neurobiological understanding of consciousness. Trends Cogn Sci 2023; 27:420-432. [PMID: 36842851 PMCID: PMC10101889 DOI: 10.1016/j.tics.2023.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 02/27/2023]
Abstract
Theories of consciousness are often based on the assumption that a single, unified neurobiological account will explain different types of conscious awareness. However, recent findings show that, even within a single modality such as conscious visual perception, the anatomical location, timing, and information flow of neural activity related to conscious awareness vary depending on both external and internal factors. This suggests that the search for generic neural correlates of consciousness may not be fruitful. I argue that consciousness science requires a more pluralistic approach and propose a new framework: joint determinant theory (JDT). This theory may be capable of accommodating different brain circuit mechanisms for conscious contents as varied as percepts, wills, memories, emotions, and thoughts, as well as their integrated experience.
Collapse
Affiliation(s)
- Biyu J He
- Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA; Departments of Neurology, Neuroscience and Physiology, Radiology, New York University Grossman School of Medicine, New York, NY 10016.
| |
Collapse
|
37
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
38
|
Yargholi E, Op de Beeck H. Category Trumps Shape as an Organizational Principle of Object Space in the Human Occipitotemporal Cortex. J Neurosci 2023; 43:2960-2972. [PMID: 36922027 PMCID: PMC10124953 DOI: 10.1523/jneurosci.2179-22.2023] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 02/22/2023] [Accepted: 03/03/2023] [Indexed: 03/17/2023] Open
Abstract
The organizational principles of the object space represented in the human ventral visual cortex are debated. Here we contrast two prominent proposals that, in addition to an organization in terms of animacy, propose either a representation related to aspect ratio (stubby-spiky) or to the distinction between faces and bodies. We designed a critical test that dissociates the latter two categories from aspect ratio and investigated responses from human fMRI (of either sex) and deep neural networks (BigBiGAN). Representational similarity and decoding analyses showed that the object space in the occipitotemporal cortex and BigBiGAN was partially explained by animacy but not by aspect ratio. Data-driven approaches showed clusters for face and body stimuli and animate-inanimate separation in the representational space of occipitotemporal cortex and BigBiGAN, but no arrangement related to aspect ratio. In sum, the findings go in favor of a model in terms of an animacy representation combined with strong selectivity for faces and bodies.SIGNIFICANCE STATEMENT We contrasted animacy, aspect ratio, and face-body as principal dimensions characterizing object space in the occipitotemporal cortex. This is difficult to test, as typically faces and bodies differ in aspect ratio (faces are mostly stubby and bodies are mostly spiky). To dissociate the face-body distinction from the difference in aspect ratio, we created a new stimulus set in which faces and bodies have a similar and very wide distribution of values along the shape dimension of the aspect ratio. Brain imaging (fMRI) with this new stimulus set showed that, in addition to animacy, the object space is mainly organized by the face-body distinction and selectivity for aspect ratio is minor (despite its wide distribution).
Collapse
Affiliation(s)
- Elahe' Yargholi
- Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences, KU Leuven, 3000 Leuven, Belgium
| | - Hans Op de Beeck
- Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
39
|
Schwartz E, O’Nell K, Saxe R, Anzellotti S. Challenging the Classical View: Recognition of Identity and Expression as Integrated Processes. Brain Sci 2023; 13:296. [PMID: 36831839 PMCID: PMC9954353 DOI: 10.3390/brainsci13020296] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/01/2023] [Accepted: 02/02/2023] [Indexed: 02/12/2023] Open
Abstract
Recent neuroimaging evidence challenges the classical view that face identity and facial expression are processed by segregated neural pathways, showing that information about identity and expression are encoded within common brain regions. This article tests the hypothesis that integrated representations of identity and expression arise spontaneously within deep neural networks. A subset of the CelebA dataset is used to train a deep convolutional neural network (DCNN) to label face identity (chance = 0.06%, accuracy = 26.5%), and the FER2013 dataset is used to train a DCNN to label facial expression (chance = 14.2%, accuracy = 63.5%). The identity-trained and expression-trained networks each successfully transfer to labeling both face identity and facial expression on the Karolinska Directed Emotional Faces dataset. This study demonstrates that DCNNs trained to recognize face identity and DCNNs trained to recognize facial expression spontaneously develop representations of facial expression and face identity, respectively. Furthermore, a congruence coefficient analysis reveals that features distinguishing between identities and features distinguishing between expressions become increasingly orthogonal from layer to layer, suggesting that deep neural networks disentangle representational subspaces corresponding to different sources.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Boston, MA 02467, USA
| | - Kathryn O’Nell
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Rebecca Saxe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Boston, MA 02467, USA
| |
Collapse
|
40
|
Yang C, Chen H, Naya Y. Allocentric information represented by self-referenced spatial coding in the primate medial temporal lobe. Hippocampus 2023; 33:522-532. [PMID: 36728411 DOI: 10.1002/hipo.23501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/16/2022] [Accepted: 12/30/2022] [Indexed: 02/03/2023]
Abstract
For living organisms, the ability to acquire information regarding the external space around them is critical for future actions. While the information must be stored in an allocentric frame to facilitate its use in various spatial contexts, each case of use requires the information to be represented in a particular self-referenced frame. Previous studies have explored neural substrates responsible for the linkage between self-referenced and allocentric spatial representations based on findings in rodents. However, the behaviors of rodents are different from those of primates in several aspects; for example, rodents mainly explore their environments through locomotion, while primates use eye movements. In this review, we discuss the brain mechanisms responsible for the linkage in nonhuman primates. Based on recent physiological studies, we propose that two types of neural substrates link the first-person perspective with allocentric coding. The first is the view-center background signal, which represents an image of the background surrounding the current position of fixation on the retina. This perceptual signal is transmitted from the ventral visual pathway to the hippocampus (HPC) via the perirhinal cortex and parahippocampal cortex. Because images that share the same objective-position in the environment tend to appear similar when seen from different self-positions, the view-center background signals are easily associated with one another in the formation of allocentric position coding and storage. The second type of neural substrate is the HPC neurons' dynamic activity that translates the stored location memory to the first-person perspective depending on the current spatial context.
Collapse
Affiliation(s)
- Cen Yang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China
| | - He Chen
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yuji Naya
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| |
Collapse
|
41
|
Lieber JD, Lee GM, Majaj NJ, Movshon JA. Sensitivity to naturalistic texture relies primarily on high spatial frequencies. J Vis 2023; 23:4. [PMID: 36745452 PMCID: PMC9910384 DOI: 10.1167/jov.23.2.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/19/2022] [Indexed: 02/07/2023] Open
Abstract
Natural images contain information at multiple spatial scales. Though we understand how early visual mechanisms split multiscale images into distinct spatial frequency channels, we do not know how the outputs of these channels are processed further by mid-level visual mechanisms. We have recently developed a texture discrimination task that uses synthetic, multi-scale, "naturalistic" textures to isolate these mid-level mechanisms. Here, we use three experimental manipulations (image blur, image rescaling, and eccentric viewing) to show that perceptual sensitivity to naturalistic structure is strongly dependent on features at high object spatial frequencies (measured in cycles/image). As a result, sensitivity depends on a texture acuity limit, a property of the visual system that sets the highest retinal spatial frequency (measured in cycles/degree) at which observers can detect naturalistic features. Analysis of the texture images using a model observer analysis shows that naturalistic image features at high object spatial frequencies carry more task-relevant information than those at low object spatial frequencies. That is, the dependence of sensitivity on high object spatial frequencies is a property of the texture images, rather than a property of the visual system. Accordingly, we find human observers' ability to extract naturalistic information (their efficiency) is similar for all object spatial frequencies. We conclude that the mid-level mechanisms that underlie perceptual sensitivity effectively extract information from all image features below the texture acuity limit, regardless of their retinal and object spatial frequency.
Collapse
Affiliation(s)
- Justin D Lieber
- Center for Neural Science, New York University, New York, NY, USA
| | - Gerick M Lee
- Center for Neural Science, New York University, New York, NY, USA
| | - Najib J Majaj
- Center for Neural Science, New York University, New York, NY, USA
| | | |
Collapse
|
42
|
Neural mechanisms underlying the hierarchical construction of perceived aesthetic value. Nat Commun 2023; 14:127. [PMID: 36693833 PMCID: PMC9873760 DOI: 10.1038/s41467-022-35654-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 12/15/2022] [Indexed: 01/26/2023] Open
Abstract
Little is known about how the brain computes the perceived aesthetic value of complex stimuli such as visual art. Here, we used computational methods in combination with functional neuroimaging to provide evidence that the aesthetic value of a visual stimulus is computed in a hierarchical manner via a weighted integration over both low and high level stimulus features contained in early and late visual cortex, extending into parietal and lateral prefrontal cortices. Feature representations in parietal and lateral prefrontal cortex may in turn be utilized to produce an overall aesthetic value in the medial prefrontal cortex. Such brain-wide computations are not only consistent with a feature-based mechanism for value construction, but also resemble computations performed by a deep convolutional neural network. Our findings thus shed light on the existence of a general neurocomputational mechanism for rapidly and flexibly producing value judgements across an array of complex novel stimuli and situations.
Collapse
|
43
|
Kell AJ, Bokor SL, Jeon YN, Toosi T, Issa EB. Marmoset core visual object recognition behavior is comparable to that of macaques and humans. iScience 2023; 26:105788. [PMID: 36594035 PMCID: PMC9804140 DOI: 10.1016/j.isci.2022.105788] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 10/13/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Among the smallest simian primates, the common marmoset offers promise as an experimentally tractable primate model for neuroscience with translational potential to humans. However, given its exceedingly small brain and body, the gap in perceptual and cognitive abilities between marmosets and humans requires study. Here, we performed a comparison of marmoset behavior to that of three other species in the domain of high-level vision. We first found that marmosets outperformed rats - a marmoset-sized rodent - on a simple recognition task, with marmosets robustly recognizing objects across views. On a more challenging invariant object recognition task used previously in humans, marmosets also achieved high performance. Notably, across hundreds of images, marmosets' image-by-image behavior was highly similar to that of humans - nearly as human-like as macaque behavior. Thus, core aspects of visual perception are conserved across monkeys and humans, and marmosets present salient behavioral advantages over other small model organisms for visual neuroscience.
Collapse
Affiliation(s)
- Alexander J.E. Kell
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
- Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Sophie L. Bokor
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
- Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - You-Nah Jeon
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
- Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Tahereh Toosi
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
- Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Elias B. Issa
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
- Department of Neuroscience, Columbia University, New York, NY 10027, USA
| |
Collapse
|
44
|
Mokari-Mahallati M, Ebrahimpour R, Bagheri N, Karimi-Rouzbahani H. Deeper neural network models better reflect how humans cope with contrast variation in object recognition. Neurosci Res 2023:S0168-0102(23)00007-X. [PMID: 36681154 DOI: 10.1016/j.neures.2023.01.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/27/2022] [Accepted: 01/17/2023] [Indexed: 01/20/2023]
Abstract
Visual inputs are far from ideal in everyday situations such as in the fog where the contrasts of input stimuli are low. However, human perception remains relatively robust to contrast variations. To provide insights about the underlying mechanisms of contrast invariance, we addressed two questions. Do contrast effects disappear along the visual hierarchy? Do later stages of the visual hierarchy contribute to contrast invariance? We ran a behavioral experiment where we manipulated the level of stimulus contrast and the involvement of higher-level visual areas through immediate and delayed backward masking of the stimulus. Backward masking led to significant drop in performance in our visual categorization task, supporting the role of higher-level visual areas in contrast invariance. To obtain mechanistic insights, we ran the same categorization task on three state-of the-art computational models of human vision each with a different depth in visual hierarchy. We found contrast effects all along the visual hierarchy, no matter how far into the hierarchy. Moreover, that final layers of deeper hierarchical models, which had been shown to be best models of final stages of the visual system, coped with contrast effects more effectively. These results suggest that, while contrast effects reach the final stages of the hierarchy, those stages play a significant role in compensating for contrast variations in the visual system.
Collapse
Affiliation(s)
- Masoumeh Mokari-Mahallati
- Department of Electrical Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Reza Ebrahimpour
- Center for Cognitive Science, Institute for Convergence Science and Technology (ICST), Sharif University of Technology, Tehran P.O.Box:11155-1639, Islamic Republic of Iran; Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran; School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Islamic Republic of Iran.
| | - Nasour Bagheri
- Department of Electrical Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Hamid Karimi-Rouzbahani
- MRC Cognition & Brain Sciences Unit, University of Cambridge, UK; Mater Research Institute, Faculty of Medicine, University of Queensland, Australia
| |
Collapse
|
45
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
46
|
Chae H, Banerjee A, Dussauze M, Albeanu DF. Long-range functional loops in the mouse olfactory system and their roles in computing odor identity. Neuron 2022; 110:3970-3985.e7. [PMID: 36174573 PMCID: PMC9742324 DOI: 10.1016/j.neuron.2022.09.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 07/12/2022] [Accepted: 09/02/2022] [Indexed: 12/15/2022]
Abstract
Elucidating the neural circuits supporting odor identification remains an open challenge. Here, we analyze the contribution of the two output cell types of the mouse olfactory bulb (mitral and tufted cells) to decode odor identity and concentration and its dependence on top-down feedback from their respective major cortical targets: piriform cortex versus anterior olfactory nucleus. We find that tufted cells substantially outperform mitral cells in decoding both odor identity and intensity. Cortical feedback selectively regulates the activity of its dominant bulb projection cell type and implements different computations. Piriform feedback specifically restructures mitral responses, whereas feedback from the anterior olfactory nucleus preferentially controls the gain of tufted representations without altering their odor tuning. Our results identify distinct functional loops involving the mitral and tufted cells and their cortical targets. We suggest that in addition to the canonical mitral-to-piriform pathway, tufted cells and their target regions are ideally positioned to compute odor identity.
Collapse
Affiliation(s)
- Honggoo Chae
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Arkarup Banerjee
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Cold Spring Harbor Laboratory School for Biological Sciences, Cold Spring Harbor, NY, USA
| | - Marie Dussauze
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Cold Spring Harbor Laboratory School for Biological Sciences, Cold Spring Harbor, NY, USA
| | - Dinu F Albeanu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Cold Spring Harbor Laboratory School for Biological Sciences, Cold Spring Harbor, NY, USA.
| |
Collapse
|
47
|
Zafirova Y, Cui D, Raman R, Vogels R. Keep the head in the right place: Face-body interactions in inferior temporal cortex. Neuroimage 2022; 264:119676. [PMID: 36216293 DOI: 10.1016/j.neuroimage.2022.119676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/23/2022] [Accepted: 10/06/2022] [Indexed: 11/05/2022] Open
Abstract
In primates, faces and bodies activate distinct regions in the inferior temporal (IT) cortex and are typically studied separately. Yet, primates interact with whole agents and not with random concatenations of faces and bodies. Despite its social importance, it is still poorly understood how faces and bodies interact in IT. Here, we addressed this gap by measuring fMRI activations to whole agents and to unnatural face-body configurations in which the head was mislocated with respect to the body, and examined how these relate to the sum of the activations to their corresponding faces and bodies. First, we mapped patches in the IT of awake macaques that were activated more by images of whole monkeys compared to objects and found that these mostly overlapped with body and face patches. In a second fMRI experiment, we obtained no evidence for superadditive responses in these "monkey patches", with the activation to the monkeys being less or equal to the summed face-body activations. However, monkey patches in the anterior IT were activated more by natural compared to unnatural configurations. The stronger activations to natural configurations could not be explained by the summed face-body activations. These univariate results were supported by regression analyses in which we modeled the activations to both configurations as a weighted linear combination of the activations to the faces and bodies, showing higher regression coefficients for the natural compared to the unnatural configurations. Deeper layers of trained convolutional neural networks also contained units that responded more to natural compared to unnatural monkey configurations. Unlike the monkey fMRI patches, these units showed substantial superadditive responses to the natural configurations. Our monkey fMRI data suggest configuration-sensitive face-body interactions in anterior IT, adding to the evidence for an integrated face-body processing in the primate ventral visual stream, and open the way for mechanistic studies using single unit recordings in these patches.
Collapse
Affiliation(s)
- Yordanka Zafirova
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Ding Cui
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rajani Raman
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium
| | - Rufin Vogels
- Laboratorium voor Neuro- en Psychofysiologie, Department of Neurosciences, KU Leuven, Belgium; Leuven Brain Institute, KU Leuven, Belgium.
| |
Collapse
|
48
|
Ayzenberg V, Behrmann M. Does the brain's ventral visual pathway compute object shape? Trends Cogn Sci 2022; 26:1119-1132. [PMID: 36272937 PMCID: PMC11669366 DOI: 10.1016/j.tics.2022.09.019] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
Abstract
A rich behavioral literature has shown that human object recognition is supported by a representation of shape that is tolerant to variations in an object's appearance. Such 'global' shape representations are achieved by describing objects via the spatial arrangement of their local features, or structure, rather than by the appearance of the features themselves. However, accumulating evidence suggests that the ventral visual pathway - the primary substrate underlying object recognition - may not represent global shape. Instead, ventral representations may be better described as a basis set of local image features. We suggest that this evidence forces a reevaluation of the role of the ventral pathway in object perception and posits a broader network for shape perception that encompasses contributions from the dorsal pathway.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Psychology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA; The Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
49
|
Mocz V, Vaziri-Pashkam M, Chun M, Xu Y. Predicting Identity-Preserving Object Transformations in Human Posterior Parietal Cortex and Convolutional Neural Networks. J Cogn Neurosci 2022; 34:2406-2435. [PMID: 36122358 PMCID: PMC9988239 DOI: 10.1162/jocn_a_01916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous research shows that, within human occipito-temporal cortex (OTC), we can use a general linear mapping function to link visual object responses across nonidentity feature changes, including Euclidean features (e.g., position and size) and non-Euclidean features (e.g., image statistics and spatial frequency). Although the learned mapping is capable of predicting responses of objects not included in training, these predictions are better for categories included than those not included in training. These findings demonstrate a near-orthogonal representation of object identity and nonidentity features throughout human OTC. Here, we extended these findings to examine the mapping across both Euclidean and non-Euclidean feature changes in human posterior parietal cortex (PPC), including functionally defined regions in inferior and superior intraparietal sulcus. We additionally examined responses in five convolutional neural networks (CNNs) pretrained with object classification, as CNNs are considered as the current best model of the primate ventral visual system. We separately compared results from PPC and CNNs with those of OTC. We found that a linear mapping function could successfully link object responses in different states of nonidentity transformations in human PPC and CNNs for both Euclidean and non-Euclidean features. Overall, we found that object identity and nonidentity features are represented in a near-orthogonal, rather than complete-orthogonal, manner in PPC and CNNs, just like they do in OTC. Meanwhile, some differences existed among OTC, PPC, and CNNs. These results demonstrate the similarities and differences in how visual object information across an identity-preserving image transformation may be represented in OTC, PPC, and CNNs.
Collapse
|
50
|
Rothmaler K, Berger P, Wiesmann CG. Timing matters: disentangling the neurocognitive sequence of mentalizing. Trends Cogn Sci 2022; 26:906-908. [PMID: 36114127 DOI: 10.1016/j.tics.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 09/01/2022] [Indexed: 01/12/2023]
Abstract
A recent electrocorticographic study by Tan et al. makes an important contribution to understanding the processes involved in mentalizing by adding the temporal dimension to the brain network of mentalizing. Combined with multivariate methods, this approach has the potential to unveil the precise representations underlying mentalizing and their functional interplay.
Collapse
Affiliation(s)
- Katrin Rothmaler
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany.
| | - Philipp Berger
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany
| | - Charlotte Grosse Wiesmann
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany
| |
Collapse
|