1
|
Lu Z, Wang Y. Teaching CORnet human fMRI representations for enhanced model-brain alignment. Cogn Neurodyn 2025; 19:61. [PMID: 40242427 PMCID: PMC11999921 DOI: 10.1007/s11571-025-10252-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 03/24/2025] [Accepted: 04/01/2025] [Indexed: 04/18/2025] Open
Abstract
Deep convolutional neural networks (DCNNs) have demonstrated excellent performance in object recognition and have been found to share some similarities with brain visual processing. However, the substantial gap between DCNNs and human visual perception still exists. Functional magnetic resonance imaging (fMRI) as a widely used technique in cognitive neuroscience can record neural activation in the human visual cortex during the process of visual perception. Can we teach DCNNs human fMRI signals to achieve a more brain-like model? To answer this question, this study proposed ReAlnet-fMRI, a model based on the SOTA vision model CORnet but optimized using human fMRI data through a multi-layer encoding-based alignment framework. This framework has been shown to effectively enable the model to learn human brain representations. The fMRI-optimized ReAlnet-fMRI exhibited higher similarity to the human brain than both CORnet and the control model in within- and across-subject as well as within- and across-modality model-brain (fMRI and EEG) alignment evaluations. Additionally, we conducted an in-depth analysis to investigate how the internal representations of ReAlnet-fMRI differ from CORnet in encoding various object dimensions. These findings provide the possibility of enhancing the brain-likeness of visual models by integrating human neural data, helping to bridge the gap between computer vision and visual neuroscience. Supplementary Information The online version contains supplementary material available at 10.1007/s11571-025-10252-y.
Collapse
Affiliation(s)
- Zitong Lu
- Departmen of Psychology, The Ohio State University, Columbus, 43210 USA
| | - Yile Wang
- Department of Neuroscience, The University of Texas at Dallas, Richardson, USA
| |
Collapse
|
2
|
Zhang J, Li G, Su Q, Cao L, Tian Y, Xu B. Enabling scale and rotation invariance in convolutional neural networks with retina like transformation. Neural Netw 2025; 187:107395. [PMID: 40121784 DOI: 10.1016/j.neunet.2025.107395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 10/28/2024] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Traditional convolutional neural networks (CNNs) struggle with scale and rotation transformations, resulting in reduced performance on transformed images. Previous research focused on designing specific CNN modules to extract transformation-invariant features. However, these methods lack versatility and are not adaptable to a wide range of scenarios. Drawing inspiration from human visual invariance, we propose a novel brain-inspired approach to tackle the invariance problem in CNNs. If we consider a CNN as the visual cortex, we have the potential to design an "eye" that exhibits transformation invariance, allowing CNNs to perceive the world consistently. Therefore, we propose a retina module and then integrate it into CNNs to create transformation-invariant CNNs (TICNN), achieving scale and rotation invariance. The retina module comprises a retina-like transformation and a transformation-aware neural network (TANN). The retina-like transformation supports flexible image transformations, while the TANN regulates these transformations for scaling and rotation. Specifically, we propose a reference-based training method (RBTM) where the retina module learns to align input images with a reference scale and rotation, thereby achieving invariance. Furthermore, we provide mathematical substantiation for the retina module to confirm its feasibility. Experimental results also demonstrate that our method outperforms existing methods in recognizing images with scale and rotation variations. The code will be released at https://github.com/JiaHongZ/TICNN.
Collapse
Affiliation(s)
- Jiahong Zhang
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; Peng Cheng Laboratory, Shenzhen, Guangdong 518066, China.
| | - Qiaoyi Su
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Lihong Cao
- State Key Laboratory of Media Convergence and Communication, Communication, University of China, Beijing 100024, China.
| | - Yonghong Tian
- Peng Cheng Laboratory, Shenzhen, Guangdong 518066, China; Institute for Artificial Intelligence, Peking University, Beijing 100871, China.
| | - Bo Xu
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China.
| |
Collapse
|
3
|
Takeda K, Sasaki M, Abe K, Oizumi M. Unsupervised alignment in neuroscience: Introducing a toolbox for Gromov-Wasserstein optimal transport. J Neurosci Methods 2025; 419:110443. [PMID: 40239770 DOI: 10.1016/j.jneumeth.2025.110443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2025] [Revised: 03/25/2025] [Accepted: 04/02/2025] [Indexed: 04/18/2025]
Abstract
BACKGROUND Understanding how sensory stimuli are represented across different brains, species, and artificial neural networks is a critical topic in neuroscience. Traditional methods for comparing these representations typically rely on supervised alignment, which assumes direct correspondence between stimuli representations across brains or models. However, it has limitations when this assumption is not valid, or when validating the assumption itself is the goal of the research. NEW METHOD To address the limitations of supervised alignment, we propose an unsupervised alignment method based on Gromov-Wasserstein optimal transport (GWOT). GWOT optimally identifies correspondences between representations by leveraging internal relationships without external labels, revealing intricate structural correspondences such as one-to-one, group-to-group, and shifted mappings. RESULTS We provide a comprehensive methodological guide and introduce a toolbox called GWTune for using GWOT in neuroscience. Our results show that GWOT can reveal detailed structural distinctions that supervised methods may overlook. We also demonstrate successful unsupervised alignment in key data domains, including behavioral data, neural activity recordings, and artificial neural network models, demonstrating its flexibility and broad applicability. COMPARISON WITH EXISTING METHODS Unlike traditional supervised alignment methods such as Representational Similarity Analysis, which assume direct correspondence between stimuli, GWOT provides a nuanced approach that can handle different types of structural correspondence, including fine-grained and coarse correspondences. Our method would provide richer insights into the similarity or difference of representations by revealing finer structural differences. CONCLUSION We anticipate that our work will significantly broaden the accessibility and application of unsupervised alignment in neuroscience, offering novel perspectives on complex representational structures. By providing a user-friendly toolbox and a detailed tutorial, we aim to facilitate the adoption of unsupervised alignment techniques, enabling researchers to achieve a deeper understanding of cross-brain and cross-species representation analysis.
Collapse
Affiliation(s)
- Ken Takeda
- Graduate School of Arts and Science, The University of Tokyo, Meguro-ku, Tokyo, Japan
| | - Masaru Sasaki
- Graduate School of Arts and Science, The University of Tokyo, Meguro-ku, Tokyo, Japan
| | - Kota Abe
- Graduate School of Arts and Science, The University of Tokyo, Meguro-ku, Tokyo, Japan
| | - Masafumi Oizumi
- Graduate School of Arts and Science, The University of Tokyo, Meguro-ku, Tokyo, Japan.
| |
Collapse
|
4
|
Gamal M, Eldawlatly S. High-level visual processing in the lateral geniculate nucleus revealed using goal-driven deep learning. J Neurosci Methods 2025; 418:110429. [PMID: 40122470 DOI: 10.1016/j.jneumeth.2025.110429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 03/03/2025] [Accepted: 03/13/2025] [Indexed: 03/25/2025]
Abstract
BACKGROUND The Lateral Geniculate Nucleus (LGN) is an essential contributor to high-level visual processing despite being an early subcortical area in the visual system. Current LGN computational models focus on its basic properties, with less emphasis on its role in high-level vision. NEW METHOD We propose a high-level approach for encoding mouse LGN neural responses to natural scenes. This approach employs two deep neural networks (DNNs); namely VGG16 and ResNet50, as goal-driven models. We use these models as tools to better understand visual features encoded in the LGN. RESULTS Early layers of the DNNs represent the best LGN models. We also demonstrate that numerosity, as a high-level visual feature, is encoded, along with other visual features, in LGN neural activity. Results demonstrate that intermediate layers are better in representing numerosity compared to early layers. Early layers are better at predicting simple visual features, while intermediate layers are better at predicting more complex features. Finally, we show that an ensemble model of an early and an intermediate layer achieves high neural prediction accuracy and numerosity representation. COMPARISON WITH EXISTING METHOD(S) Our approach emphasizes the role of analyzing the inner workings of DNNs to demonstrate the representation of a high-level feature such as numerosity in the LGN, as opposed to the common belief about the simplicity of the LGN. CONCLUSIONS We demonstrate that goal-driven DNNs can be used as high-level vision models of the LGN for neural prediction and as an exploration tool to better understand the role of the LGN.
Collapse
Affiliation(s)
- Mai Gamal
- Computer Science and Engineering Department, German University in Cairo, Cairo 11835, Egypt.
| | - Seif Eldawlatly
- Computer and Systems Engineering Department, Ain Shams University, Cairo 11517, Egypt; Computer Science and Engineering Department, The American University in Cairo, Cairo 11835, Egypt.
| |
Collapse
|
5
|
Takeda K, Abe K, Kitazono J, Oizumi M. Unsupervised alignment reveals structural commonalities and differences in neural representations of natural scenes across individuals and brain areas. iScience 2025; 28:112427. [PMID: 40343275 PMCID: PMC12059663 DOI: 10.1016/j.isci.2025.112427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 02/10/2025] [Accepted: 04/10/2025] [Indexed: 05/11/2025] Open
Abstract
Neuroscience research aims to identify universal neural mechanisms underlying sensory information encoding by comparing neural representations across individuals, typically using Representational Similarity Analysis. However, traditional methods assume direct stimulus correspondence across individuals, limiting the exploration of other possibilities. To address this, we propose an unsupervised alignment framework based on Gromov-Wasserstein Optimal Transport, which identifies correspondences between neural representations solely from internal similarity structures, without relying on stimulus labels. Applying this method to Neuropixels recordings in mice and fMRI data in humans viewing natural scenes, we found that the neural representations in the same visual cortical areas can be well aligned across individuals in an unsupervised manner. Furthermore, alignment across different brain areas is influenced by factors beyond the visual hierarchy, with higher-order visual areas aligning well with each other, but not with lower-order areas. This unsupervised approach reveals more nuanced structural commonalities and differences in neural representations than conventional methods.
Collapse
Affiliation(s)
- Ken Takeda
- Graduate School of Arts and Science, The University of Tokyo, Tokyo, Japan
| | - Kota Abe
- Graduate School of Arts and Science, The University of Tokyo, Tokyo, Japan
| | - Jun Kitazono
- Graduate School of Data Science, Yokohama City University, Kanagawa, Japan
| | - Masafumi Oizumi
- Graduate School of Arts and Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Zhu J, Han Y, Huang X, Feng Y, Ruan X, Lin W, Zhou J, Hou F. Linking behavioral deficits with underlying neural property changes in amblyopia. Neuropsychologia 2025; 214:109156. [PMID: 40324681 DOI: 10.1016/j.neuropsychologia.2025.109156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 04/27/2025] [Accepted: 04/27/2025] [Indexed: 05/07/2025]
Abstract
While it is widely accepted that abnormal visual experience during critical period can lead to significant functional deficits and altered neural property, the quantitative link between behavioral visual losses and underlying neural changes remains elusive. To address this gap, we systematically varied stimulus orientation and contrast to measure 2D psychometric functions of amblyopic and normally sighted participants at two different spatial frequencies. A biologically-interpretable neural population model explicitly incorporated with neural contrast response function (CRF) and orientation tuning property accounted for the complex performance data for both groups. Our results revealed that the poor performance in the amblyopic group can be excellently explained by a rightward-shifted CRF at higher spatial frequency and reduced population Fisher information for coding orientation. Moreover, regression analysis revealed that the behavior contrast threshold from an independent measurement significantly depended on the neural properties estimated by the model. This study demonstrates the potential of biologically-interpretable models to quantitatively bridge the gap between behavioral deficits and underlying neural changes, offering a promising tool for understanding normal and abnormal visual systems.
Collapse
Affiliation(s)
- Jinli Zhu
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Yijin Han
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China; Xi'an People's Hospital, Xi'an, 710004, Shaanxi, China
| | - Xiaolin Huang
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Yufan Feng
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Xiaowei Ruan
- State Key Laboratory of Eye Health, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Wenman Lin
- School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jiawei Zhou
- State Key Laboratory of Eye Health, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| | - Fang Hou
- State Key Laboratory of Eye Health, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| |
Collapse
|
7
|
Hernández-Cámara P, Vila-Tomás J, Laparra V, Malo J. Dissecting the effectiveness of deep features as metric of perceptual image quality. Neural Netw 2025; 185:107189. [PMID: 39874824 DOI: 10.1016/j.neunet.2025.107189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/07/2025] [Accepted: 01/15/2025] [Indexed: 01/30/2025]
Abstract
There is an open debate on the role of artificial networks to understand the visual brain. Internal representations of images in artificial networks develop human-like properties. In particular, evaluating distortions using differences between internal features is correlated to human perception of distortion. However, the origins of this correlation are not well understood. Here, we dissect the different factors involved in the emergence of human-like behavior: function, architecture, and environment. To do so, we evaluate the aforementioned human-network correlation at different depths of 46 pre-trained model configurations that include no psycho-visual information. The results show that most of the models correlate better with human opinion than SSIM (a de-facto standard in subjective image quality). Moreover, some models are better than state-of-the-art networks specifically tuned for the application (LPIPS, DISTS). Regarding the function, supervised classification leads to nets that correlate better with humans than the explored models for self- and non-supervised tasks. However, we found that better performance in the task does not imply more human behavior. Regarding the architecture, simpler models correlate better with humans than very deep nets and generally, the highest correlation is not achieved in the last layer. Finally, regarding the environment, training with large natural datasets leads to bigger correlations than training in smaller databases with restricted content, as expected. We also found that the best classification models are not the best for predicting human distances. In the general debate about understanding human vision, our empirical findings imply that explanations have not to be focused on a single abstraction level, but all function, architecture, and environment are relevant.
Collapse
Affiliation(s)
| | - Jorge Vila-Tomás
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Valero Laparra
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Jesús Malo
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| |
Collapse
|
8
|
Carboni L, Nwaigwe D, Mainsant M, Bayle R, Reyboz M, Mermillod M, Dojat M, Achard S. Exploring continual learning strategies in artificial neural networks through graph-based analysis of connectivity: Insights from a brain-inspired perspective. Neural Netw 2025; 185:107125. [PMID: 39847940 DOI: 10.1016/j.neunet.2025.107125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 11/24/2024] [Accepted: 01/02/2025] [Indexed: 01/25/2025]
Abstract
Artificial Neural Networks (ANNs) aim at mimicking information processing in biological networks. In cognitive neuroscience, graph modeling is a powerful framework widely used to study brain structural and functional connectivity. Yet, the extension of graph modeling to ANNs has been poorly explored especially in terms of functional connectivity (i.e. the contextual change of the activity's units in networks). In the perspective of designing more robust and interpretable ANNs, we study how a brain-inspired graph-based approach can be extended and used to investigate ANN properties and behaviors. We focus our study on different continual learning strategies inspired by the biological mechanisms and modeled with ANNs. We show that graph modeling offers a simple and elegant framework to deeply investigate ANNs, compare their performances, and explore deleterious behaviors such as catastrophic forgetting.
Collapse
Affiliation(s)
- Lucrezia Carboni
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France
| | - Dwight Nwaigwe
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France
| | - Marion Mainsant
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France; Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Raphael Bayle
- Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Marina Reyboz
- Univ. Grenoble Alpes, CEA, LIST, 38000 Grenoble, France
| | - Martial Mermillod
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Michel Dojat
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France; Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, GIN, 38000 Grenoble, France.
| | - Sophie Achard
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
9
|
Shimazaki H. Neural coding: Foundational concepts, statistical formulations, and recent advances. Neurosci Res 2025; 214:75-80. [PMID: 40107457 DOI: 10.1016/j.neures.2025.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 03/06/2025] [Accepted: 03/10/2025] [Indexed: 03/22/2025]
Abstract
Neural coding refers to the processes by which external stimuli are translated into neural activity and represented in a manner that drives behavior. Research in this field aims to elucidate these processes by identifying the neural activity and mechanisms responsible for stimulus recognition and behavioral execution. This article provides a concise review of foundational studies and key concepts in neural coding, along with statistical formulations and recent advances in population coding research enabled by large-scale recordings.
Collapse
|
10
|
Schmid D, Neumann H. A model of thalamo-cortical interaction for incremental binding in mental contour-tracing. PLoS Comput Biol 2025; 21:e1012835. [PMID: 40338986 PMCID: PMC12061125 DOI: 10.1371/journal.pcbi.1012835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 01/29/2025] [Indexed: 05/10/2025] Open
Abstract
Object-basd visual attention marks a key process of mammalian perception. By which mechanisms this process is implemented and how it can be interacted with by means of attentional control is not completely understood yet. Incremental binding is a mechanism required in demanding scenarios of object-based attention and is experimentally well investigated. Attention spreads across a representation of the visual object and labels bound elements by constant up-modulation of neural activity. The speed of incremental binding was found to be dependent on the spatial arrangement of distracting elements in the scene and to be scale invariant giving rise to the growth-cone hypothesis. In this work, we propose a neural dynamical model of incremental binding that provides a mechanistic account for these findings. Through simulations, we investigate the model properties and demonstrate how an attentional spreading mechanism tags neurons that participate in the object binding process. They utilize Gestalt properties and eventually show growth-cone characteristics labeling perceptual items by delayed activity enhancement of neuronal firing rates. We discuss the algorithmic process underlying incremental binding and relate it to our model computations. This theoretical investigation encompasses complexity considerations and finds the model to be not only of explanatory value in terms of neurophysiological evidence, but also to be an efficient implementation of incremental binding striving to establish a normative account. By relating the connectivity motifs of the model to neuroanatomical evidence, we suggest thalamo-cortical interactions to be a likely candidate for the flexible and efficient realization suggested by the model. There, pyramidal cells are proposed to serve as the processors of incremental grouping information. Local bottom-up evidence about stimulus features is integrated via basal dendritic sites. It is combined with an apical signal consisting of contextual grouping information which is gated by attentional task-relevance selection mediated via higher-order thalamic representations.
Collapse
Affiliation(s)
- Daniel Schmid
- Institute for Neural Information Processing, Ulm University, Ulm, Baden-Württemberg, Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, Ulm, Baden-Württemberg, Germany
| |
Collapse
|
11
|
Rhee JY, Echavarría C, Soucy E, Greenwood J, Masís JA, Cox DD. Neural correlates of visual object recognition in rats. Cell Rep 2025; 44:115461. [PMID: 40153435 DOI: 10.1016/j.celrep.2025.115461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/20/2024] [Accepted: 03/05/2025] [Indexed: 03/30/2025] Open
Abstract
Invariant object recognition-the ability to recognize objects across size, rotation, or context-is fundamental for making sense of a dynamic visual world. Though traditionally studied in primates, emerging evidence suggests rodents recognize objects across a range of identity-preserving transformations. We demonstrate that rats robustly perform visual object recognition and explore a neural pathway that may underlie this capacity by developing a pipeline from high-throughput behavior training to cellular resolution imaging in awake, head-fixed animals. Leveraging our optical approach, we systematically profile neurons in primary and higher-order visual areas and their spatial organization. We find that rat visual cortex exhibits several features similar to those observed in the primate ventral stream but also marked deviations, suggesting species-specific differences in how brains solve visual object recognition. This work reinforces the sophisticated visual abilities of rats and offers the technical foundation to use them as a powerful model for mechanistic perception.
Collapse
Affiliation(s)
- Juliana Y Rhee
- The Rockefeller University, New York, NY 10065, USA; Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| | - César Echavarría
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Edward Soucy
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Joel Greenwood
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Kavli Center for Neurotechnology, Yale University, New Haven, CT 06510, USA
| | - Javier A Masís
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - David D Cox
- Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; IBM Research, Cambridge, MA 02142, USA
| |
Collapse
|
12
|
Cao R, Zhang J, Zheng J, Wang Y, Brunner P, Willie JT, Wang S. A neural computational framework for face processing in the human temporal lobe. Curr Biol 2025; 35:1765-1778.e6. [PMID: 40118061 PMCID: PMC12014353 DOI: 10.1016/j.cub.2025.02.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/03/2025] [Accepted: 02/27/2025] [Indexed: 03/23/2025]
Abstract
A key question in cognitive neuroscience is how unified identity representations emerge from visual inputs. Here, we recorded intracranial electroencephalography (iEEG) from the human ventral temporal cortex (VTC) and medial temporal lobe (MTL), as well as single-neuron activity in the MTL, to demonstrate how dense feature-based representations in the VTC are translated into sparse identity-based representations in the MTL. First, we characterized the spatiotemporal neural dynamics of face coding in the VTC and MTL. The VTC, particularly the fusiform gyrus, exhibits robust axis-based feature coding. Remarkably, MTL neurons encode a receptive field within the VTC neural feature space, constructed using VTC neural axes, thereby bridging dense feature and sparse identity representations. We further validated our findings using recordings from a macaque. Lastly, inter-areal interactions between the VTC and MTL provide the physiological basis of this computational framework. Together, we reveal the neurophysiological underpinnings of a computational framework that explains how perceptual information is translated into face identities.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA.
| | - Jie Zhang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Jie Zheng
- Department of Biomedical Engineering, University of California, Davis, Davis, CA 95618, USA
| | - Yue Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA; Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
13
|
Ye Z, Wessel R, Franken TP. Brain-like border ownership signals support prediction of natural videos. iScience 2025; 28:112199. [PMID: 40224014 PMCID: PMC11986989 DOI: 10.1016/j.isci.2025.112199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 02/04/2025] [Accepted: 03/06/2025] [Indexed: 04/15/2025] Open
Abstract
To make sense of visual scenes, the brain must segment foreground from background. This is thought to be facilitated by neurons that signal border ownership (BOS), which indicate which side of a border in their receptive field is owned by an object. How these signals emerge without a teaching signal of what is foreground remains unclear. Here we find that many units in PredNet, a self-supervised deep neural network trained to predict future frames in natural videos, are selective for BOS. They share key properties with BOS neurons in the brain, including robustness to object transformations and hysteresis. Ablation revealed that BOS units contribute more to prediction than other units for videos with moving objects. Our findings suggest that BOS neurons might emerge due to an evolutionary or developmental pressure to predict future input in natural, complex dynamic environments, even without an explicit requirement to segment foreground from background.
Collapse
Affiliation(s)
- Zeyuan Ye
- Department of Physics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Ralf Wessel
- Department of Physics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Tom P. Franken
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
14
|
Scholte HS, de Haan EHF. Beyond binding: from modular to natural vision. Trends Cogn Sci 2025:S1364-6613(25)00074-9. [PMID: 40234139 DOI: 10.1016/j.tics.2025.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 03/17/2025] [Accepted: 03/17/2025] [Indexed: 04/17/2025]
Abstract
The classical view of visual cortex organization as a collection of specialized modules processing distinct features like color and motion has profoundly influenced neuroscience for decades. This framework, rooted in historical philosophical distinctions between qualities, gave rise to the 'binding problem': how the brain integrates these separately processed features into coherent percepts. We present converging evidence from electrophysiology, neuroimaging, and lesion studies that challenges this framework. We argue that the binding problem may be an artifact of theoretical assumptions rather than a real computational challenge for the brain. Drawing insights from deep neural networks (DNNs) and recent empirical findings, we propose a framework where the visual cortex represents naturally co-occurring patterns of information rather than processing isolated features that need binding.
Collapse
Affiliation(s)
- H Steven Scholte
- Psychology Department, University of Amsterdam, 1001NK, Amsterdam, The Netherlands.
| | - Edward H F de Haan
- Psychology Department, University of Amsterdam, 1001NK, Amsterdam, The Netherlands; Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525GD, Nijmegen, The Netherlands; St Hugh's College, Oxford University, Oxford OX2 6LE, UK; Psychology Department, Nottingham University, Nottingham NG7 2RD, UK.
| |
Collapse
|
15
|
Schmitt O. Relationships and representations of brain structures, connectivity, dynamics and functions. Prog Neuropsychopharmacol Biol Psychiatry 2025; 138:111332. [PMID: 40147809 DOI: 10.1016/j.pnpbp.2025.111332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 02/20/2025] [Accepted: 03/10/2025] [Indexed: 03/29/2025]
Abstract
The review explores the complex interplay between brain structures and their associated functions, presenting a diversity of hierarchical models that enhances our understanding of these relationships. Central to this approach are structure-function flow diagrams, which offer a visual representation of how specific neuroanatomical structures are linked to their functional roles. These diagrams are instrumental in mapping the intricate connections between different brain regions, providing a clearer understanding of how functions emerge from the underlying neural architecture. The study details innovative attempts to develop new functional hierarchies that integrate structural and functional data. These efforts leverage recent advancements in neuroimaging techniques such as fMRI, EEG, MEG, and PET, as well as computational models that simulate neural dynamics. By combining these approaches, the study seeks to create a more refined and dynamic hierarchy that can accommodate the brain's complexity, including its capacity for plasticity and adaptation. A significant focus is placed on the overlap of structures and functions within the brain. The manuscript acknowledges that many brain regions are multifunctional, contributing to different cognitive and behavioral processes depending on the context. This overlap highlights the need for a flexible, non-linear hierarchy that can capture the brain's intricate functional landscape. Moreover, the study examines the interdependence of these functions, emphasizing how the loss or impairment of one function can impact others. Another crucial aspect discussed is the brain's ability to compensate for functional deficits following neurological diseases or injuries. The investigation explores how the brain reorganizes itself, often through the recruitment of alternative neural pathways or the enhancement of existing ones, to maintain functionality despite structural damage. This compensatory mechanism underscores the brain's remarkable plasticity, demonstrating its ability to adapt and reconfigure itself in response to injury, thereby ensuring the continuation of essential functions. In conclusion, the study presents a system of brain functions that integrates structural, functional, and dynamic perspectives. It offers a robust framework for understanding how the brain's complex network of structures supports a wide range of cognitive and behavioral functions, with significant implications for both basic neuroscience and clinical applications.
Collapse
Affiliation(s)
- Oliver Schmitt
- Medical School Hamburg - University of Applied Sciences and Medical University - Institute for Systems Medicine, Am Kaiserkai 1, Hamburg 20457, Germany; University of Rostock, Department of Anatomy, Gertrudenstr. 9, Rostock, 18055 Rostock, Germany.
| |
Collapse
|
16
|
Petilli MA, Rodio FM, Günther F, Marelli M. Visual search and real-image similarity: An empirical assessment through the lens of deep learning. Psychon Bull Rev 2025; 32:822-838. [PMID: 39327401 PMCID: PMC12000204 DOI: 10.3758/s13423-024-02583-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 09/28/2024]
Abstract
The ability to predict how efficiently a person finds an object in the environment is a crucial goal of attention research. Central to this issue are the similarity principles initially proposed by Duncan and Humphreys, which outline how the similarity between target and distractor objects (TD) and between distractor objects themselves (DD) affect search efficiency. However, the search principles lack direct quantitative support from an ecological perspective, being a summary approximation of a wide range of lab-based results poorly generalisable to real-world scenarios. This study exploits deep convolutional neural networks to predict human search efficiency from computational estimates of similarity between objects populating, potentially, any visual scene. Our results provide ecological evidence supporting the similarity principles: search performance continuously varies across tasks and conditions and improves with decreasing TD similarity and increasing DD similarity. Furthermore, our results reveal a crucial dissociation: TD and DD similarities mainly operate at two distinct layers of the network: DD similarity at the intermediate layers of coarse object features and TD similarity at the final layers of complex features used for classification. This suggests that these different similarities exert their major effects at two distinct perceptual levels and demonstrates our methodology's potential to offer insights into the depth of visual processing on which the search relies. By combining computational techniques with visual search principles, this approach aligns with modern trends in other research areas and fulfils longstanding demands for more ecologically valid research in the field of visual search.
Collapse
Affiliation(s)
- Marco A Petilli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy.
| | - Francesca M Rodio
- Institute for Advanced Studies, IUSS, Pavia, Italy
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Fritz Günther
- Department of Psychology, Humboldt University at Berlin, Berlin, Germany
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy
- NeuroMI, Milan Center for Neuroscience, Milan, Italy
| |
Collapse
|
17
|
Wang EY, Fahey PG, Ding Z, Papadopoulos S, Ponder K, Weis MA, Chang A, Muhammad T, Patel S, Ding Z, Tran D, Fu J, Schneider-Mizell CM, Reid RC, Collman F, da Costa NM, Franke K, Ecker AS, Reimer J, Pitkow X, Sinz FH, Tolias AS. Foundation model of neural activity predicts response to new stimulus types. Nature 2025; 640:470-477. [PMID: 40205215 PMCID: PMC11981942 DOI: 10.1038/s41586-025-08829-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 02/21/2025] [Indexed: 04/11/2025]
Abstract
The complexity of neural circuits makes it challenging to decipher the brain's algorithms of intelligence. Recent breakthroughs in deep learning have produced models that accurately simulate brain activity, enhancing our understanding of the brain's computational objectives and neural coding. However, it is difficult for such models to generalize beyond their training distribution, limiting their utility. The emergence of foundation models1 trained on vast datasets has introduced a new artificial intelligence paradigm with remarkable generalization capabilities. Here we collected large amounts of neural activity from visual cortices of multiple mice and trained a foundation model to accurately predict neuronal responses to arbitrary natural videos. This model generalized to new mice with minimal training and successfully predicted responses across various new stimulus domains, such as coherent motion and noise patterns. Beyond neural response prediction, the model also accurately predicted anatomical cell types, dendritic features and neuronal connectivity within the MICrONS functional connectomics dataset2. Our work is a crucial step towards building foundation models of the brain. As neuroscience accumulates larger, multimodal datasets, foundation models will reveal statistical regularities, enable rapid adaptation to new tasks and accelerate research.
Collapse
Affiliation(s)
- Eric Y Wang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Paul G Fahey
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Zhuokun Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Stelios Papadopoulos
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Kayla Ponder
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Marissa A Weis
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
| | - Andersen Chang
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Taliah Muhammad
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Saumil Patel
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Zhiwei Ding
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Dat Tran
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Jiakun Fu
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | | | - R Clay Reid
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Katrin Franke
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Bio-X, Stanford University, Stanford, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Alexander S Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| | - Jacob Reimer
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Xaq Pitkow
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Fabian H Sinz
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Andreas S Tolias
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
- Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Bio-X, Stanford University, Stanford, CA, USA.
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
18
|
Casile A, Cordier A, Kim JG, Cometa A, Madsen JR, Stone S, Ben-Yosef G, Ullman S, Anderson W, Kreiman G. Neural correlates of minimal recognizable configurations in the human brain. Cell Rep 2025; 44:115429. [PMID: 40096088 PMCID: PMC12045337 DOI: 10.1016/j.celrep.2025.115429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 07/24/2024] [Accepted: 02/21/2025] [Indexed: 03/19/2025] Open
Abstract
Inferring object identity from incomplete information is a ubiquitous challenge for the visual system. Here, we study the neural mechanisms underlying processing of minimally recognizable configurations (MIRCs) and their subparts, which are unrecognizable (sub-MIRCs). MIRCs and sub-MIRCs are very similar at the pixel level, yet they lead to a dramatic gap in recognition performance. To evaluate how the brain processes such images, we invasively record human neurophysiological responses. Correct identification of MIRCs is associated with a dynamic interplay of feedback and feedforward mechanisms between frontal and temporal areas. Interpretation of sub-MIRC images improves dramatically after exposure to the corresponding full objects. This rapid and unsupervised learning is accompanied by changes in neural responses in the temporal cortex. These results are at odds with purely feedforward models of object recognition and suggest a role for the frontal lobe in providing top-down signals related to object identity in difficult visual tasks.
Collapse
Affiliation(s)
- Antonino Casile
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, University of Messina, 98122 Messina, Italy
| | - Aurelie Cordier
- Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Jiye G Kim
- Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Andrea Cometa
- MoMiLab, IMT School for Advanced Studies, 55100 Lucca, Italy
| | - Joseph R Madsen
- Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Scellig Stone
- Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Shimon Ullman
- Weizmann Institute, Rehovot, Israel; Center for Brains, Minds and Machines, Cambridge, MA 02142, USA
| | - William Anderson
- Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Gabriel Kreiman
- Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Brains, Minds and Machines, Cambridge, MA 02142, USA.
| |
Collapse
|
19
|
Zhang J, Cao R, Zhu X, Zhou H, Wang S. Distinct attentional characteristics of neurons with visual feature coding in the primate brain. SCIENCE ADVANCES 2025; 11:eadq0332. [PMID: 40117351 PMCID: PMC11927616 DOI: 10.1126/sciadv.adq0332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 02/14/2025] [Indexed: 03/23/2025]
Abstract
Visual attention and object recognition are two critical cognitive functions that shape our perception of the world. While these neural processes converge in the temporal cortex, the nature of their interactions remains largely unclear. Here, we systematically investigated the interplay between visual attention and stimulus feature coding by training macaques to perform a free-gaze visual search task with natural stimuli. Recording from a large number of units across multiple brain areas, we found that units exhibiting visual feature coding showed stronger attentional modulation of responses and spike-local field potential coherence than units without feature coding. Across brain areas, attention directed toward search targets enhanced the neuronal pattern separation of stimuli, with this enhancement more pronounced for units encoding visual features. Together, our results suggest a complex interplay between visual feature and attention coding in the primate brain, likely driven by interactions between brain areas engaged in these processes.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
- Peng Cheng Laboratory, Shenzhen 518000, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Xiaocang Zhu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Huihui Zhou
- Peng Cheng Laboratory, Shenzhen 518000, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
20
|
Kim JE, Soh K, Hwang SI, Yang DY, Yoon JH. Memristive neuromorphic interfaces: integrating sensory modalities with artificial neural networks. MATERIALS HORIZONS 2025. [PMID: 40104909 DOI: 10.1039/d5mh00038f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The advent of the Internet of Things (IoT) has led to exponential growth in data generated from sensors, requiring efficient methods to process complex and unstructured external information. Unlike conventional von Neumann sensory systems with separate data collection and processing units, biological sensory systems integrate sensing, memory, and computing to process environmental information in real time with high efficiency. Memristive neuromorphic sensory systems using memristors as their basic components have emerged as promising alternatives to CMOS-based systems. Memristors can closely replicate the key characteristics of biological receptors, neurons, and synapses by integrating the threshold and adaptation properties of receptors, the action potential firing in neurons, and the synaptic plasticity of synapses. Furthermore, through careful engineering of their switching dynamics, the electrical properties of memristors can be tailored to emulate specific functions, while benefiting from high operational speed, low power consumption, and exceptional scalability. Consequently, their integration with high-performance sensors offers a promising pathway toward realizing fully integrated artificial sensory systems that can efficiently process and respond to diverse environmental stimuli in real time. In this review, we first introduce the fundamental principles of memristive neuromorphic technologies for artificial sensory systems, explaining how each component is structured and what functions it performs. We then discuss how these principles can be applied to replicate the four traditional senses, highlighting the underlying mechanisms and recent advances in mimicking biological sensory functions. Finally, we address the remaining challenges and provide prospects for the continued development of memristor-based artificial sensory systems.
Collapse
Affiliation(s)
- Ji Eun Kim
- Electronic Materials Research Center, Korea Institute of Science and Technology (KIST), Seoul 02791, Republic of Korea
- Department of Materials Science and Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Keunho Soh
- School of Advanced Materials and Engineering, Sungkyunkwan University (SKKU), Suwon 16419, Republic of Korea.
| | - Su In Hwang
- School of Advanced Materials and Engineering, Sungkyunkwan University (SKKU), Suwon 16419, Republic of Korea.
| | - Do Young Yang
- School of Advanced Materials and Engineering, Sungkyunkwan University (SKKU), Suwon 16419, Republic of Korea.
| | - Jung Ho Yoon
- School of Advanced Materials and Engineering, Sungkyunkwan University (SKKU), Suwon 16419, Republic of Korea.
| |
Collapse
|
21
|
Marino R, Buffoni L, Chicchi L, Patti FD, Febbe D, Giambagli L, Fanelli D. Learning in Wilson-Cowan Model for Metapopulation. Neural Comput 2025; 37:701-741. [PMID: 40030137 DOI: 10.1162/neco_a_01744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/05/2024] [Indexed: 03/19/2025]
Abstract
The Wilson-Cowan model for metapopulation, a neural mass network model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. In this article, we show how to incorporate stable attractors into such a metapopulation model's dynamics. By doing so, we transform the neural mass network model into a biologically inspired learning algorithm capable of solving different classification tasks. We test it on MNIST and Fashion MNIST in combination with convolutional neural networks, as well as on CIFAR-10 and TF-FLOWERS, and in combination with a transformer architecture (BERT) on IMDB, consistently achieving high classification accuracy.
Collapse
Affiliation(s)
- Raffaele Marino
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| | - Lorenzo Buffoni
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| | - Lorenzo Chicchi
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| | - Francesca Di Patti
- Department of Mathematics and Computer Science, University of Florence, 50134 Florence, Italy
| | - Diego Febbe
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| | - Lorenzo Giambagli
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| | - Duccio Fanelli
- Department of Physics and Astronomy, University of Florence, 50019 Sesto Fiorentino, Florence, Italy
| |
Collapse
|
22
|
May L, Dauphin A, Gjorgjieva J. Pre-training artificial neural networks with spontaneous retinal activity improves motion prediction in natural scenes. PLoS Comput Biol 2025; 21:e1012830. [PMID: 40096645 DOI: 10.1371/journal.pcbi.1012830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 01/27/2025] [Indexed: 03/19/2025] Open
Abstract
The ability to process visual stimuli rich with motion represents an essential skill for animal survival and is largely already present at the onset of vision. Although the exact mechanisms underlying its maturation remain elusive, spontaneous activity patterns in the retina, known as retinal waves, have been shown to contribute to this developmental process. Retinal waves exhibit complex spatio-temporal statistics and contribute to the establishment of circuit connectivity and function in the visual system, including the formation of retinotopic maps and the refinement of receptive fields in downstream areas such as the thalamus and visual cortex. Recent work in mice has shown that retinal waves have statistical features matching those of natural visual stimuli, such as optic flow, suggesting that they could prime the visual system for motion processing upon vision onset. Motivated by these findings, we examined whether artificial neural network (ANN) models trained on natural movies show improved performance if pre-trained with retinal waves. We employed the spatio-temporally complex task of next-frame prediction, in which the ANN was trained to predict the next frame based on preceding input frames of a movie. We found that pre-training ANNs with retinal waves enhances the processing of real-world visual stimuli and accelerates learning. Strikingly, when we merely replaced the initial training epochs on naturalistic stimuli with retinal waves, keeping the total training time the same, we still found that an ANN trained on retinal waves temporarily outperforms one trained solely on natural movies. Similar to observations made in biological systems, we also found that pre-training with spontaneous activity refines the receptive field of ANN neurons. Overall, our work sheds light on the functional role of spatio-temporally patterned spontaneous activity in the processing of motion in natural scenes, suggesting it acts as a training signal to prepare the developing visual system for adult visual processing.
Collapse
Affiliation(s)
- Lilly May
- School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Alice Dauphin
- School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute of Machine Learning and Neural Computation, Graz University of Technology, Graz, Austria
| | | |
Collapse
|
23
|
Jang H, Sinha P, Boix X. Configural processing as an optimized strategy for robust object recognition in neural networks. Commun Biol 2025; 8:386. [PMID: 40055492 PMCID: PMC11889204 DOI: 10.1038/s42003-025-07672-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 02/04/2025] [Indexed: 05/13/2025] Open
Abstract
Configural processing, the perception of spatial relationships among an object's components, is crucial for object recognition, yet its teleology and underlying mechanisms remain unclear. We hypothesize that configural processing drives robust recognition under varying conditions. Using identification tasks with composite letter stimuli, we compare neural network models trained with either configural or local cues. We find that configural cues support robust generalization across geometric transformations (e.g., rotation, scaling) and novel feature sets. When both cues are available, configural cues dominate local features. Layerwise analysis reveals that sensitivity to configural cues emerges later in processing, likely enhancing robustness to pixel-level transformations. Notably, this occurs in a purely feedforward manner without recurrent computations. These findings with letter stimuli successfully extend to naturalistic face images. Our results demonstrate that configural processing emerges in a naíve network based on task contingencies, and is beneficial for robust object processing under varying viewing conditions.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Pawan Sinha
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xavier Boix
- Artificial Intelligence Laboratory, Fujitsu Research of America, Silicon Valley, CA, USA.
| |
Collapse
|
24
|
Darjani N, Noroozi J, Dehaqani MRA. Unveiling the content of frontal feedback in challenging object recognition. Neuroimage 2025; 308:121058. [PMID: 39884415 DOI: 10.1016/j.neuroimage.2025.121058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 12/09/2024] [Accepted: 01/27/2025] [Indexed: 02/01/2025] Open
Abstract
Object recognition under challenging real-world conditions, including partial occlusion, remains an enduring focus of investigation in cognitive visual neuroscience. This study addresses the insufficiently elucidated neural mechanisms and temporal dynamics involved in this complex process, concentrating on the persistent challenge of recognizing objects obscured by occlusion. Through the analysis of human EEG data, we decode feedback characteristics within frontotemporal networks, uncovering intricate neural mechanisms during occlusion coding, with a specific emphasis on processing complex stimuli such as occluded faces. Our findings elucidate the critical role of frontal feedback in the late processing stage of occluded face recognition, contributing to enhanced accuracy in identification. Temporal dynamics reveal distinct characteristics in both early and late processing stages, allowing the discernment of two unique types of occlusion processing that go beyond visual features, incorporating higher-order associations. The increased synchronized activity between frontal and temporal areas during the processing of occluded stimuli underscores the importance of frontotemporal coordination in challenging real-world conditions. A comparative analysis with macaque IT cortex recordings validates the contribution of the frontal cortex in the late stage of occluded face processing. Notably, the observed disparity between human EEG and two deep computational models, both with and without the consideration of feedback connection, emphasize the necessity for expanding models to accurately simulate frontal feedback.
Collapse
Affiliation(s)
- Nastaran Darjani
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Jalaledin Noroozi
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mohammad-Reza A Dehaqani
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran; School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
| |
Collapse
|
25
|
Blauch NM, Plaut DC, Vin R, Behrmann M. Individual variation in the functional lateralization of human ventral temporal cortex: Local competition and long-range coupling. IMAGING NEUROSCIENCE (CAMBRIDGE, MASS.) 2025; 3:imag_a_00488. [PMID: 40078535 PMCID: PMC11894816 DOI: 10.1162/imag_a_00488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 01/07/2025] [Accepted: 01/08/2025] [Indexed: 03/14/2025]
Abstract
The ventral temporal cortex (VTC) of the human cerebrum is critically engaged in high-level vision. One intriguing aspect of this region is its functional lateralization, with neural responses to words being stronger in the left hemisphere, and neural responses to faces being stronger in the right hemisphere; such patterns can be summarized with a signed laterality index (LI), positive for leftward laterality. Converging evidence has suggested that word laterality emerges to couple efficiently with left-lateralized frontotemporal language regions, but evidence is more mixed regarding the sources of the right lateralization for face perception. Here, we use individual differences as a tool to test three theories of VTC organization arising from (1) local competition between words and faces driven by long-range coupling between words and language processes, (2) local competition between faces and other categories, and (3) long-range coupling with VTC and temporal areas exhibiting local competition between language and social processing. First, in an in-house functional MRI experiment, we did not obtain a negative correlation in the LIs of word and face selectivity relative to object responses, but did find a positive correlation when using selectivity relative to a fixation baseline, challenging ideas of local competition between words and faces driving rightward face lateralization. We next examined broader local LI interactions with faces using the large-scale Human Connectome Project (HCP) dataset. Face and tool LIs were significantly anti-correlated, while face and body LIs were positively correlated, consistent with the idea that generic local representational competition and cooperation may shape face lateralization. Last, we assessed the role of long-range coupling in the development of VTC lateralization. Within our in-house experiment, substantial positive correlation was evident between VTC text LI and that of several other nodes of a distributed text-processing circuit. In the HCP data, VTC face LI was both negatively correlated with language LI and positively correlated with social processing in different subregions of the posterior temporal lobe (PSL and STSp, respectively). In summary, we find no evidence of local face-word competition in VTC; instead, more generic local interactions shape multiple lateralities within VTC, including face laterality. Moreover, face laterality is also influenced by long-range coupling with social processing in the posterior temporal lobe, where social processing may become right lateralized due to local competition with language.
Collapse
Affiliation(s)
- Nicholas M. Blauch
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Psychology, Harvard University, Cambridge, MA, United States
| | - David C. Plaut
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Raina Vin
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Neurosciences Graduate Program, Yale University, New Haven, CT, United States
| | - Marlene Behrmann
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Ophthalmology, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
26
|
Jaffe PI, Santiago-Reyes GX, Schafer RJ, Bissett PG, Poldrack RA. An image-computable model of speeded decision-making. eLife 2025; 13:RP98351. [PMID: 40019474 PMCID: PMC11870652 DOI: 10.7554/elife.98351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2025] Open
Abstract
Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.
Collapse
Affiliation(s)
- Paul I Jaffe
- Department of Psychology, Stanford UniversityStanfordUnited States
| | | | | | | | | |
Collapse
|
27
|
Mao H, Hasse BA, Schwartz AB. Hybrid Neural Network Models Explain Cortical Neuronal Activity During Volitional Movement. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.20.636945. [PMID: 40027649 PMCID: PMC11870545 DOI: 10.1101/2025.02.20.636945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Massive interconnectivity in large-scale neural networks is the key feature underlying their powerful and complex functionality. We have developed hybrid neural network (HNN) models that allow us to find statistical structure in this connectivity. Describing this structure is critical for understanding biological and artificial neural networks. The HNNs are composed of artificial neurons, a subset of which are trained to reproduce the responses of individual neurons recorded experimentally. The experimentally observed firing rates came from populations of neurons recorded in the motor cortices of monkeys performing a reaching task. After training, these networks (recurrent and spiking) underwent the same state transitions as those observed in the empirical data, a result that helps resolve a long-standing question of prescribed vs ongoing control of volitional movement. Because all aspects of the models are exposed, we were able to analyze the dynamic statistics of the connections between neurons. Our results show that the dynamics of extrinsic input to the network changed this connectivity to cause the state transitions. Two processes at the synaptic level were recognized: one in which many different neurons contributed to a buildup of membrane potential and another in which more specific neurons triggered an action potential. HNNs facilitate modeling of realistic neuron-neuron connectivity and provide foundational descriptions of large-scale network functionality.
Collapse
Affiliation(s)
- Hongwei Mao
- Department of Neurobiology, University of Pittsburgh School of Medicine
| | - Brady A. Hasse
- Department of Neurobiology, University of Pittsburgh School of Medicine
| | - Andrew B. Schwartz
- Department of Neurobiology, University of Pittsburgh School of Medicine
- Systems Neuroscience Center, University of Pittsburgh School of Medicine
- Department of Bioengineering, University of Pittsburgh School of Engineering
| |
Collapse
|
28
|
Moore JA, Kang C, Vigneshwaran V, Stanley EAM, Memon A, Wilms M, Forkert ND. Towards realistic simulation of disease progression in the visual cortex with CNNs. Sci Rep 2025; 15:6099. [PMID: 39972104 PMCID: PMC11839997 DOI: 10.1038/s41598-025-89738-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 02/07/2025] [Indexed: 02/21/2025] Open
Abstract
Convolutional neural networks (CNNs) and mammalian visual systems share architectural and information processing similarities. We leverage these parallels to develop an in-silico CNN model simulating diseases affecting the visual system. This model aims to replicate neural complexities in an experimentally controlled environment. Therefore, we examine object recognition and internal representations of a CNN under neurodegeneration and neuroplasticity conditions simulated through synaptic weight decay and retraining. This approach can model neurodegeneration from events like tau accumulation, reflecting cognitive decline in diseases such as posterior cortical atrophy, a condition that can accompany Alzheimer's disease and primarily affects the visual system. After each degeneration iteration, we retrain unaffected synapses to simulate ongoing neuroplasticity. Our results show that with significant synaptic decay and limited retraining, the model's representational similarity decreases compared to a healthy model. Early CNN layers retain high similarity to the healthy model, while later layers are more prone to degradation. The results of this study reveal a progressive decline in object recognition proficiency, mirroring posterior cortical atrophy progression. In-silico modeling of neurodegenerative diseases can enhance our understanding of disease progression and aid in developing targeted rehabilitation and treatments.
Collapse
Affiliation(s)
- Jasmine A Moore
- Department of Radiology, University of Calgary, Calgary, Canada.
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada.
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, Canada.
| | - Chris Kang
- Department of Radiology, University of Calgary, Calgary, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
| | - Vibujithan Vigneshwaran
- Department of Radiology, University of Calgary, Calgary, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
| | - Emma A M Stanley
- Department of Radiology, University of Calgary, Calgary, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, Canada
| | - Ashar Memon
- Cumming School of Medicine, University of Calgary, Calgary, Canada
| | - Matthias Wilms
- Department of Radiology, University of Calgary, Calgary, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
- Department of Pediatrics, University of Calgary, Calgary, Canada
- Alberta Children's Hospital Research Institute, Calgary, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
- Alberta Children's Hospital Research Institute, Calgary, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, Canada
| |
Collapse
|
29
|
O’Connell TP, Bonnen T, Friedman Y, Tewari A, Sitzmann V, Tenenbaum JB, Kanwisher N. Approximating Human-Level 3D Visual Inferences With Deep Neural Networks. Open Mind (Camb) 2025; 9:305-324. [PMID: 40013087 PMCID: PMC11864798 DOI: 10.1162/opmi_a_00189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 01/14/2025] [Indexed: 02/28/2025] Open
Abstract
Humans make rich inferences about the geometry of the visual world. While deep neural networks (DNNs) achieve human-level performance on some psychophysical tasks (e.g., rapid classification of object or scene categories), they often fail in tasks requiring inferences about the underlying shape of objects or scenes. Here, we ask whether and how this gap in 3D shape representation between DNNs and humans can be closed. First, we define the problem space: after generating a stimulus set to evaluate 3D shape inferences using a match-to-sample task, we confirm that standard DNNs are unable to reach human performance. Next, we construct a set of candidate 3D-aware DNNs including 3D neural field (Light Field Network), autoencoder, and convolutional architectures. We investigate the role of the learning objective and dataset by training single-view (the model only sees one viewpoint of an object per training trial) and multi-view (the model is trained to associate multiple viewpoints of each object per training trial) versions of each architecture. When the same object categories appear in the model training and match-to-sample test sets, multi-view DNNs approach human-level performance for 3D shape matching, highlighting the importance of a learning objective that enforces a common representation across viewpoints of the same object. Furthermore, the 3D Light Field Network was the model most similar to humans across all tests, suggesting that building in 3D inductive biases increases human-model alignment. Finally, we explore the generalization performance of multi-view DNNs to out-of-distribution object categories not seen during training. Overall, our work shows that multi-view learning objectives for DNNs are necessary but not sufficient to make similar 3D shape inferences as humans and reveals limitations in capturing human-like shape inferences that may be inherent to DNN modeling approaches. We provide a methodology for understanding human 3D shape perception within a deep learning framework and highlight out-of-domain generalization as the next challenge for learning human-like 3D representations with DNNs.
Collapse
Affiliation(s)
| | - Tyler Bonnen
- EECS, University of California, Berkeley, Berkeley, CA, USA
| | | | | | | | | | | |
Collapse
|
30
|
Kleinman M, Wang T, Xiao D, Feghhi E, Lee K, Carr N, Li Y, Hadidi N, Chandrasekaran C, Kao JC. The information bottleneck as a principle underlying multi-area cortical representations during decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.07.12.548742. [PMID: 37502862 PMCID: PMC10369960 DOI: 10.1101/2023.07.12.548742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Decision-making emerges from distributed computations across multiple brain areas, but it is unclear why the brain distributes the computation. In deep learning, artificial neural networks use multiple areas (or layers) and form optimal representations of task inputs. These optimal representations are sufficient to perform the task well, but minimal so they are invariant to other irrelevant variables. We recorded single neurons and multiunits in dorsolateral prefrontal cortex (DLPFC) and dorsal premotor cortex (PMd) in monkeys during a perceptual decision-making task. We found that while DLPFC represents task-related inputs required to compute the choice, the downstream PMd contains a minimal sufficient, or optimal, representation of the choice. To identify a mechanism for how cortex may form these optimal representations, we trained a multi-area recurrent neural network (RNN) to perform the task. Remarkably, DLPFC and PMd resembling representations emerged in the early and late areas of the multi-area RNN, respectively. The DLPFC-resembling area partially orthogonalized choice information and task inputs and this choice information was preferentially propagated to downstream areas through selective alignment with inter-area connections, while remaining task information was not. Our results suggest that cortex uses multi-area computation to form minimal sufficient representations by preferential propagation of relevant information between areas.
Collapse
Affiliation(s)
- Michael Kleinman
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
| | - Tian Wang
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Derek Xiao
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
| | - Ebrahim Feghhi
- Neurosciences Program, University of California, Los Angeles, CA, USA
| | - Kenji Lee
- Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA
| | - Nicole Carr
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Yuke Li
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Nima Hadidi
- Neurosciences Program, University of California, Los Angeles, CA, USA
| | - Chandramouli Chandrasekaran
- Department of Anatomy & Neurobiology, Boston University School of Medicine, Boston, MA, USA
- Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Jonathan C Kao
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, CA, USA
- Neurosciences Program, University of California, Los Angeles, CA, USA
| |
Collapse
|
31
|
Luo X, Mok RM, Roads BD, Love BC. Coordinating multiple mental faculties during learning. Sci Rep 2025; 15:5319. [PMID: 39939457 PMCID: PMC11822098 DOI: 10.1038/s41598-025-89732-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 02/07/2025] [Indexed: 02/14/2025] Open
Abstract
Complex behavior is supported by the coordination of multiple brain regions. How do brain regions coordinate absent a homunculus? We propose coordination is achieved by a controller-peripheral architecture in which peripherals (e.g., the ventral visual stream) aim to supply needed inputs to their controllers (e.g., the hippocampus and prefrontal cortex) while expending minimal resources. We developed a formal model within this framework to address how multiple brain regions coordinate to support rapid learning from a few example images. The model captured how higher-level activity in the controller shaped lower-level visual representations, affecting their precision and sparsity in a manner that paralleled brain measures. In particular, the peripheral encoded visual information to the extent needed to support the smooth operation of the controller. Alternative models optimized by gradient descent irrespective of architectural constraints could not account for human behavior or brain responses, and, typical of standard deep learning approaches, were unstable trial-by-trial learners. While previous work offered accounts of specific faculties, such as perception, attention, and learning, the controller-peripheral approach is a step toward addressing next generation questions concerning how multiple faculties coordinate.
Collapse
Affiliation(s)
- Xiaoliang Luo
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Robert M Mok
- MRC Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Rd, Cambridge, CB2 7EF, UK
- Department of Psychology, Royal Holloway, University of London, Egham, TW20 0EX, UK
| | - Brett D Roads
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Bradley C Love
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, UK
- The Alan Turing Institute, 96 Euston Rd, London, NW1 2DB, UK
| |
Collapse
|
32
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgments of natural stimuli. Sci Rep 2025; 15:5316. [PMID: 39939679 PMCID: PMC11821992 DOI: 10.1038/s41598-025-88910-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/31/2025] [Indexed: 02/14/2025] Open
Abstract
In natural visually guided behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on blank backgrounds. Natural images, however, contain task-irrelevant background elements that might interfere with the perception of object features. Recent studies suggest that visual feature estimation can be modeled through the linear decoding of task-relevant information from visual cortex. So, if the representations of task-relevant and irrelevant features are not orthogonal in the neural population, then variation in the task-irrelevant features would impair task performance. We tested this hypothesis using human psychophysics and monkey neurophysiology combined with parametrically variable naturalistic stimuli. We demonstrate that (1) the neural representation of one feature (the position of an object) in visual area V4 is orthogonal to those of several background features, (2) the ability of human observers to precisely judge object position was largely unaffected by those background features, and (3) many features of the object and the background (and of objects from a separate stimulus set) are orthogonally represented in V4 neural population responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of object features despite the richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - Amy M Ni
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marlene R Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
33
|
Cao R, Brunner P, Chakravarthula PN, Wahlstrom KL, Inman C, Smith EH, Li X, Mamelak AN, Brandmeir NJ, Rutishauser U, Willie JT, Wang S. A neuronal code for object representation and memory in the human amygdala and hippocampus. Nat Commun 2025; 16:1510. [PMID: 39929825 PMCID: PMC11811184 DOI: 10.1038/s41467-025-56793-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 01/29/2025] [Indexed: 02/13/2025] Open
Abstract
How the brain encodes, recognizes, and memorizes general visual objects is a fundamental question in neuroscience. Here, we investigated the neural processes underlying visual object perception and memory by recording from 3173 single neurons in the human amygdala and hippocampus across four experiments. We employed both passive-viewing and recognition memory tasks involving a diverse range of naturalistic object stimuli. Our findings reveal a region-based feature code for general objects, where neurons exhibit receptive fields in the high-level visual feature space. This code can be validated by independent new stimuli and replicated across all experiments, including fixation-based analyses with large natural scenes. This region code explains the long-standing visual category selectivity, preferentially enhances memory of encoded stimuli, predicts memory performance, encodes image memorability, and exhibits intricate interplay with memory contexts. Together, region-based feature coding provides an important mechanism for visual object processing in the human brain.
Collapse
Affiliation(s)
- Runnan Cao
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, USA.
| | - Peter Brunner
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA
| | | | | | - Cory Inman
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Elliot H Smith
- Department of Neurosurgery, University of Utah, Salt Lake City, UT, USA
| | - Xin Li
- Department of Computer Science, University at Albany, Albany, NY, USA
| | - Adam N Mamelak
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Ueli Rutishauser
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jon T Willie
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA.
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Neurosurgery, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
34
|
Elder JR, Zheng J, Shimelis LB, Rutishauser U, Lin MM. An invariant schema emerges within a neural network during hierarchical learning of visual boundaries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.30.635821. [PMID: 39975149 PMCID: PMC11838474 DOI: 10.1101/2025.01.30.635821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Neural circuits must balance plasticity and stability to enable continual learning without catastrophic forgetting, a pervasive feature of artificial neural networks trained using end-to-end learning (e.g. backpropagation). Here, we apply an alternative, hierarchical learning algorithm to the cognitive task of boundary detection in video clips. In contrast to backpropagation, hierarchical training converges to a network executing a fixed schema and generates firing statistics consistent with single-neuron recordings from human subjects performing the same task. The hierarchically trained network's schema circuit remains invariant following training on sparse data, with additional data serving to refine the upstream representation.
Collapse
Affiliation(s)
- James R. Elder
- Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Lyda Hill Dept. of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Molecular Biophysics Program, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jie Zheng
- Department of Biomedical Engineering, University of California at Davis, Davis, CA, USA
- Department of Neurological Surgery, UC Davis Health, Davis, CA, USA
| | - Lydia B. Shimelis
- Biomedical Engineering and Neuroscience, Harvard University, Cambridge, MA, USA
| | - Ueli Rutishauser
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Neural Science and Medicine, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Milo M. Lin
- Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Lyda Hill Dept. of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Dept. of Biophysics, University of Texas Southwestern Medical Ctr., Dallas, TX, USA
- Center for Alzheimer’s and Neurodegenerative Diseases, University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
35
|
Jiang Y, Dale R. Mapping the learning curves of deep learning networks. PLoS Comput Biol 2025; 21:e1012286. [PMID: 39928655 PMCID: PMC11841907 DOI: 10.1371/journal.pcbi.1012286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 02/20/2025] [Accepted: 01/13/2025] [Indexed: 02/12/2025] Open
Abstract
There is an important challenge in systematically interpreting the internal representations of deep neural networks (DNNs). Existing techniques are often less effective for non-tabular tasks, or they primarily focus on qualitative, ad-hoc interpretations of models. In response, this study introduces a cognitive science-inspired, multi-dimensional quantification and visualization approach that captures two temporal dimensions of model learning: the "information-processing trajectory" and the "developmental trajectory." The former represents the influence of incoming signals on an agent's decision-making, while the latter conceptualizes the gradual improvement in an agent's performance throughout its lifespan. Tracking the learning curves of DNNs enables researchers to explicitly identify the model appropriateness of a given task, examine the properties of the underlying input signals, and assess the model's alignment (or lack thereof) with human learning experiences. To illustrate this method, we conducted 750 runs of simulations on two temporal tasks: gesture detection and sentence classification, showcasing its applicability across different types of deep learning tasks. Using four descriptive metrics to quantify the mapped learning curves-start, end - start, max, tmax-, we identified significant differences in learning patterns based on data sources and class distinctions (all p's < .0001), the prominent role of spatial semantics in gesture learning, and larger information gains in language learning. We highlight three key insights gained from mapping learning curves: non-monotonic progress, pairwise comparisons, and domain distinctions. We reflect on the theoretical implications of this method for cognitive processing, language models and representations from multiple modalities.
Collapse
Affiliation(s)
- Yanru Jiang
- Department of Communication, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Rick Dale
- Department of Communication, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
36
|
Arias M, Behrendt L, Dreßler L, Raka A, Perrier C, Elias M, Gomez D, Renoult JP, Tedore C. Testing the equivalency of human "predators" and deep neural networks in the detection of cryptic moths. J Evol Biol 2025; 38:214-224. [PMID: 39589804 DOI: 10.1093/jeb/voae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 09/16/2024] [Accepted: 11/25/2024] [Indexed: 11/28/2024]
Abstract
Researchers have shown growing interest in using deep neural networks (DNNs) to efficiently test the effects of perceptual processes on the evolution of colour patterns and morphologies. Whether this is a valid approach remains unclear, as it is unknown whether the relative detectability of ecologically relevant stimuli to DNNs actually matches that of biological neural networks. To test this, we compare image classification performance by humans and 6 DNNs (AlexNet, VGG-16, VGG-19, ResNet-18, SqueezeNet, and GoogLeNet) trained to detect artificial moths on tree trunks. Moths varied in their degree of crypsis, conferred by different sizes and spatial configurations of transparent wing elements. Like humans, four of six DNN architectures found moths with larger transparent elements harder to detect. However, humans and only one DNN architecture (GoogLeNet) found moths with transparent elements touching one side of the moth's outline harder to detect than moths with untouched outlines. When moths took up a smaller proportion of the image (i.e., were viewed from further away), the camouflaging effect of transparent elements touching the moth's outline was reduced for DNNs but enhanced for humans. Viewing distance can thus interact with camouflage type in opposing directions in humans and DNNs, which warrants a deeper investigation of viewing distance/size interactions with a broader range of stimuli. Overall, our results suggest that human and DNN responses had some similarities, but not enough to justify widespread use of DNNs for studies of camouflage.
Collapse
Affiliation(s)
- Mónica Arias
- CIRAD, UMR PHIM, F-34398, Montpellier, France
- PHIM, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France
| | - Lis Behrendt
- Faculty of Mathematics, Informatics and Natural Sciences, Institute of Cell and Systems Biology of Animals, University of Hamburg, Hamburg, Germany
| | - Lyn Dreßler
- Faculty of Mathematics, Informatics and Natural Sciences, Institute of Cell and Systems Biology of Animals, University of Hamburg, Hamburg, Germany
| | - Adelina Raka
- Faculty of Mathematics, Informatics and Natural Sciences, Institute of Cell and Systems Biology of Animals, University of Hamburg, Hamburg, Germany
| | - Charles Perrier
- CBGP, INRAE, CIRAD, IRD, Institut Agro, Univ Montpellier, Montpellier, France
| | - Marianne Elias
- ISYEB, Department Origins and Evolution, CNRS, MNHN, Sorbonne Université, EPHE, Université des Antilles, Paris, France
- Smithsonian Tropical Research Institute, Gamboa, Panama
| | - Doris Gomez
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | | - Cynthia Tedore
- Faculty of Mathematics, Informatics and Natural Sciences, Institute of Cell and Systems Biology of Animals, University of Hamburg, Hamburg, Germany
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
37
|
Vinck M, Uran C, Dowdall JR, Rummell B, Canales-Johnson A. Large-scale interactions in predictive processing: oscillatory versus transient dynamics. Trends Cogn Sci 2025; 29:133-148. [PMID: 39424521 PMCID: PMC7616854 DOI: 10.1016/j.tics.2024.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 09/17/2024] [Accepted: 09/26/2024] [Indexed: 10/21/2024]
Abstract
How do the two main types of neural dynamics, aperiodic transients and oscillations, contribute to the interactions between feedforward (FF) and feedback (FB) pathways in sensory inference and predictive processing? We discuss three theoretical perspectives. First, we critically evaluate the theory that gamma and alpha/beta rhythms play a role in classic hierarchical predictive coding (HPC) by mediating FF and FB communication, respectively. Second, we outline an alternative functional model in which rapid sensory inference is mediated by aperiodic transients, whereas oscillations contribute to the stabilization of neural representations over time and plasticity processes. Third, we propose that the strong dependence of oscillations on predictability can be explained based on a biologically plausible alternative to classic HPC, namely dendritic HPC.
Collapse
Affiliation(s)
- Martin Vinck
- Ernst Strüngmann Institute (ESI) for Neuroscience, in Cooperation with the Max Planck Society, 60528 Frankfurt am Main, Germany; Donders Centre for Neuroscience, Department of Neurophysics, Radboud University, 6525 Nijmegen, The Netherlands.
| | - Cem Uran
- Ernst Strüngmann Institute (ESI) for Neuroscience, in Cooperation with the Max Planck Society, 60528 Frankfurt am Main, Germany; Donders Centre for Neuroscience, Department of Neurophysics, Radboud University, 6525 Nijmegen, The Netherlands.
| | - Jarrod R Dowdall
- Robarts Research Institute, Western University, London, ON, Canada
| | - Brian Rummell
- Ernst Strüngmann Institute (ESI) for Neuroscience, in Cooperation with the Max Planck Society, 60528 Frankfurt am Main, Germany
| | - Andres Canales-Johnson
- Facultad de Ciencias de la Salud, Universidad Catolica del Maule, 3480122 Talca, Chile; Department of Psychology, University of Cambridge, Cambridge CB2 3EB, UK.
| |
Collapse
|
38
|
Greco A, Rastelli C, Ubaldi A, Riva G. Immersive exposure to simulated visual hallucinations modulates high-level human cognition. Conscious Cogn 2025; 128:103808. [PMID: 39862735 DOI: 10.1016/j.concog.2025.103808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 12/29/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025]
Abstract
Psychedelic drugs offer valuable insights into consciousness, but disentangling their causal effects on perceptual and high-level cognition is nontrivial. Technological advances in virtual reality (VR) and machine learning have enabled the immersive simulation of visual hallucinations. However, comprehensive experimental data on how these simulated hallucinations affects high-level human cognition is lacking. Here, we exposed human participants to VR panoramic videos and their psychedelic counterparts generated by the DeepDream algorithm. Participants exhibited reduced task-switching costs after simulated psychedelic exposure compared to naturalistic exposure, consistent with increased cognitive flexibility. No significant differences were observed between naturalistic and simulated psychedelic exposure in linguistic automatic association tasks at word and sentence levels. Crucially, visually grounded high-level cognitive processes were modulated by exposure to simulated hallucinations. Our results provide insights into the interdependence of bottom-up and top-down cognitive processes and altered states of consciousness without pharmacological intervention, potentially informing both basic neuroscience and clinical applications.
Collapse
Affiliation(s)
- Antonino Greco
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Germany; Centre for Integrative Neuroscience, University of Tübingen, Germany; MEG Center, University of Tübingen, Germany.
| | - Clara Rastelli
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Germany; MEG Center, University of Tübingen, Germany; Department of Psychology and Cognitive Science (DiPSCo), University of Trento, Italy
| | - Andrea Ubaldi
- Humane Technology Lab, Catholic University of Sacred Heart, Milan, Italy; Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano IRCCS, Milan, Italy
| | - Giuseppe Riva
- Humane Technology Lab, Catholic University of Sacred Heart, Milan, Italy; Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano IRCCS, Milan, Italy.
| |
Collapse
|
39
|
Greco A, Siegel M. A spatiotemporal style transfer algorithm for dynamic visual stimulus generation. NATURE COMPUTATIONAL SCIENCE 2025; 5:155-169. [PMID: 39706876 PMCID: PMC11860245 DOI: 10.1038/s43588-024-00746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 11/21/2024] [Indexed: 12/23/2024]
Abstract
Understanding how visual information is encoded in biological and artificial systems often requires the generation of appropriate stimuli to test specific hypotheses, but available methods for video generation are scarce. Here we introduce the spatiotemporal style transfer (STST) algorithm, a dynamic visual stimulus generation framework that allows the manipulation and synthesis of video stimuli for vision research. We show how stimuli can be generated that match the low-level spatiotemporal features of their natural counterparts, but lack their high-level semantic features, providing a useful tool to study object recognition. We used these stimuli to probe PredNet, a predictive coding deep network, and found that its next-frame predictions were not disrupted by the omission of high-level information, with human observers also confirming the preservation of low-level features and lack of high-level information in the generated stimuli. We also introduce a procedure for the independent spatiotemporal factorization of dynamic stimuli. Testing such factorized stimuli on humans and deep vision models suggests a spatial bias in how humans and deep vision models encode dynamic visual information. These results showcase potential applications of the STST algorithm as a versatile tool for dynamic stimulus generation in vision science.
Collapse
Affiliation(s)
- Antonino Greco
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.
- Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- MEG Center, University of Tübingen, Tübingen, Germany.
| | - Markus Siegel
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.
- Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- MEG Center, University of Tübingen, Tübingen, Germany.
- German Center for Mental Health (DZPG), Tübingen, Germany.
| |
Collapse
|
40
|
Balwani AH, Wang AQ, Najafi F, Choi H. CONSTRUCTING BIOLOGICALLY CONSTRAINED RNNS VIA DALE'S BACKPROP AND TOPOLOGICALLY-INFORMED PRUNING. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.09.632231. [PMID: 39868098 PMCID: PMC11760306 DOI: 10.1101/2025.01.09.632231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Recurrent neural networks (RNNs) have emerged as a prominent tool for modeling cortical function, and yet their conventional architecture is lacking in physiological and anatomical fidelity. In particular, these models often fail to incorporate two crucial biological constraints: i) Dale's law, i.e., sign constraints that preserve the "type" of projections from individual neurons, and ii) Structured connectivity motifs, i.e., highly sparse yet defined connections amongst various neuronal populations. Both constraints are known to impair learning performance in artificial neural networks, especially when trained to perform complicated tasks; but as modern experimental methodologies allow us to record from diverse neuronal populations spanning multiple brain regions, using RNN models to study neuronal interactions without incorporating these fundamental biological properties raises questions regarding the validity of the insights gleaned from them. To address these concerns, our work develops methods that let us train RNNs which respect Dale's law whilst simultaneously maintaining a specific sparse connectivity pattern across the entire network. We provide mathematical grounding and guarantees for our approaches incorporating both types of constraints, and show empirically that our models match the performance of RNNs trained without any constraints. Finally, we demonstrate the utility of our methods for inferring multi-regional interactions by training RNN models of the cortical network to reconstruct 2-photon calcium imaging data during visual behaviour in mice, whilst enforcing data-driven, cell-type specific connectivity constraints between various neuronal populations spread across multiple cortical layers and brain areas. In doing so, we find that the interactions inferred by our model corroborate experimental findings in agreement with the theory of predictive coding, thus validating the applicability of our methods.
Collapse
Affiliation(s)
| | - Alex Q. Wang
- Computational Science and Engineering Program, Georgia Institute of Technology
| | - Farzaneh Najafi
- School of Biological Sciences, Georgia Institute of Technology
| | - Hannah Choi
- School of Mathematics, Georgia Institute of Technology
| |
Collapse
|
41
|
Orouji S, Taschereau-Dumouchel V, Cortese A, Odegaard B, Cushing C, Cherkaoui M, Kawato M, Lau H, Peters MAK. Task relevant autoencoding enhances machine learning for human neuroscience. Sci Rep 2025; 15:1365. [PMID: 39779744 PMCID: PMC11711280 DOI: 10.1038/s41598-024-83867-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, and so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior rather than noise or other irrelevant factors. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE) designed to identify behaviorally-relevant target neural patterns. We benchmarked TRACE against a standard autoencoder and other models for two severely truncated machine learning datasets (to match the data typically available in functional magnetic resonance imaging [fMRI] data for an individual subject), then evaluated all models on fMRI data from 59 subjects who observed animals and objects. TRACE outperformed alternative models nearly unilaterally, showing up to 12% increased classification accuracy and up to 56% improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior.
Collapse
Affiliation(s)
- Seyedmehdi Orouji
- Department of Cognitive Sciences, University of California, 2201 Social & Behavioral Sciences Gateway, Irvine, CA, 92697, USA.
| | - Vincent Taschereau-Dumouchel
- Department of Psychiatry and Addictology, Université de Montréal, Montreal, H3C 3J7, Canada
- Centre de Recherche de L'institut Universitaire en Santé Mentale de Montréal, Montréal, Canada
| | - Aurelio Cortese
- ATR Computational Neuroscience Laboratories, Kyoto, 619-0288, Japan
| | - Brian Odegaard
- Department of Psychology, University of Florida, Gainesville, FL, 32603, USA
| | - Cody Cushing
- Department of Psychology, University of California Los Angeles, Los Angeles, 90095, USA
| | - Mouslim Cherkaoui
- Department of Psychology, University of California Los Angeles, Los Angeles, 90095, USA
| | - Mitsuo Kawato
- ATR Computational Neuroscience Laboratories, Kyoto, 619-0288, Japan
| | - Hakwan Lau
- RIKEN Center for Brain Science, Tokyo, Japan
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California, 2201 Social & Behavioral Sciences Gateway, Irvine, CA, 92697, USA.
- Center for the Neurobiology of Learning and Memory, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
42
|
Blauch NM, Plaut DC, Vin R, Behrmann M. Individual variation in the functional lateralization of human ventral temporal cortex: Local competition and long-range coupling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.10.15.618268. [PMID: 39464049 PMCID: PMC11507683 DOI: 10.1101/2024.10.15.618268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The ventral temporal cortex (VTC) of the human cerebrum is critically engaged in high-level vision. One intriguing aspect of this region is its functional lateralization, with neural responses to words being stronger in the left hemisphere, and neural responses to faces being stronger in the right hemisphere; such patterns can be summarized with a signed laterality index (LI), positive for leftward laterality. Converging evidence has suggested that word laterality emerges to couple efficiently with left-lateralized frontotemporal language regions, but evidence is more mixed regarding the sources of the right-lateralization for face perception. Here, we use individual differences as a tool to test three theories of VTC organization arising from: 1) local competition between words and faces driven by long-range coupling between words and language processes, 2) local competition between faces and other categories, 3) long-range coupling with VTC and temporal areas exhibiting local competition between language and social processing. First, in an in-house functional MRI experiment, we did not obtain a negative correlation in the LIs of word and face selectivity relative to object responses, but did find a positive correlation when using selectivity relative to a fixation baseline, challenging ideas of local competition between words and faces driving rightward face lateralization. We next examined broader local LI interactions with faces using the large-scale Human Connectome Project (HCP) dataset. Face and tool LIs were significantly anti-correlated, while face and body LIs were positively correlated, consistent with the idea that generic local representational competition and cooperation may shape face lateralization. Last, we assessed the role of long-range coupling in the development of VTC lateralization. Within our in-house experiment, substantial positive correlation was evident between VTC text LI and that of several other nodes of a distributed text-processing circuit. In the HCP data, VTC face LI was both negatively correlated with language LI and positively correlated with social processing in different subregions of the posterior temporal lobe (PSL and STSp, respectively). In summary, we find no evidence of local face-word competition in VTC; instead, more generic local interactions shape multiple lateralities within VTC, including face laterality. Moreover, face laterality is also influenced by long-range coupling with social processing in the posterior temporal lobe, where social processing may become right-lateralized due to local competition with language.
Collapse
Affiliation(s)
- Nicholas M Blauch
- Program in Neural Computation, Carnegie Mellon University
- Neuroscience Institute, Carnegie Mellon University
- Department of Psychology, Harvard University
| | - David C Plaut
- Department of Psychology, Carnegie Mellon University
- Neuroscience Institute, Carnegie Mellon University
| | - Raina Vin
- Department of Psychology, Carnegie Mellon University
- Neurosciences Graduate Program, Yale University
| | - Marlene Behrmann
- Department of Psychology, Carnegie Mellon University
- Neuroscience Institute, Carnegie Mellon University
- Department of Opthamology, University of Pittsburgh
| |
Collapse
|
43
|
Subramanian A, Price S, Kumbhar O, Sizikova E, Majaj NJ, Pelli DG. Benchmarking the speed-accuracy tradeoff in object recognition by humans and neural networks. J Vis 2025; 25:4. [PMID: 39752176 PMCID: PMC11706240 DOI: 10.1167/jov.25.1.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 10/25/2024] [Indexed: 01/04/2025] Open
Abstract
Active object recognition, fundamental to tasks like reading and driving, relies on the ability to make time-sensitive decisions. People exhibit a flexible tradeoff between speed and accuracy, a crucial human skill. However, current computational models struggle to incorporate time. To address this gap, we present the first dataset (with 148 observers) exploring the speed-accuracy tradeoff (SAT) in ImageNet object recognition. Participants performed a 16-way ImageNet categorization task where their responses counted only if they occurred near the time of a fixed-delay beep. Each block of trials allowed one reaction time. As expected, human accuracy increases with reaction time. We compare human performance with that of dynamic neural networks that adapt their computation to the available inference time. Time is a scarce resource for human object recognition, and finding an appropriate analog in neural networks is challenging. Networks can repeat operations by using layers, recurrent cycles, or early exits. We use the repetition count as a network's analog for time. In our analysis, the number of layers, recurrent cycles, and early exits correlates strongly with floating-point operations, making them suitable time analogs. Comparing networks and humans on SAT-fit error, category-wise correlation, and SAT-curve steepness, we find cascaded dynamic neural networks most promising in modeling human speed and accuracy. Surprisingly, convolutional recurrent networks, typically favored in human object recognition modeling, perform the worst on our benchmark.
Collapse
Affiliation(s)
- Ajay Subramanian
- Department of Psychology, New York University, New York, NY, USA
| | - Sara Price
- Center for Data Science, New York University, New York, NY, USA
| | - Omkar Kumbhar
- Computer Science Department, New York University, New York, NY, USA
| | - Elena Sizikova
- Center for Data Science, New York University, New York, NY, USA
| | - Najib J Majaj
- Center for Neural Science, New York University, New York, NY, USA
| | - Denis G Pelli
- Department of Psychology, New York University, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| |
Collapse
|
44
|
Mukherjee S, Babadi B, Shamma S. Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields. PLoS Comput Biol 2025; 21:e1012721. [PMID: 39746112 PMCID: PMC11774495 DOI: 10.1371/journal.pcbi.1012721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/28/2025] [Accepted: 12/16/2024] [Indexed: 01/04/2025] Open
Abstract
Characterizing neuronal responses to natural stimuli remains a central goal in sensory neuroscience. In auditory cortical neurons, the stimulus selectivity of elicited spiking activity is summarized by a spectrotemporal receptive field (STRF) that relates neuronal responses to the stimulus spectrogram. Though effective in characterizing primary auditory cortical responses, STRFs of non-primary auditory neurons can be quite intricate, reflecting their mixed selectivity. The complexity of non-primary STRFs hence impedes understanding how acoustic stimulus representations are transformed along the auditory pathway. Here, we focus on the relationship between ferret primary auditory cortex (A1) and a secondary region, dorsal posterior ectosylvian gyrus (PEG). We propose estimating receptive fields in PEG with respect to a well-established high-dimensional computational model of primary-cortical stimulus representations. These "cortical receptive fields" (CortRF) are estimated greedily to identify the salient primary-cortical features modulating spiking responses and in turn related to corresponding spectrotemporal features. Hence, they provide biologically plausible hierarchical decompositions of STRFs in PEG. Such CortRF analysis was applied to PEG neuronal responses to speech and temporally orthogonal ripple combination (TORC) stimuli and, for comparison, to A1 neuronal responses. CortRFs of PEG neurons captured their selectivity to more complex spectrotemporal features than A1 neurons; moreover, CortRF models were more predictive of PEG (but not A1) responses to speech. Our results thus suggest that secondary-cortical stimulus representations can be computed as sparse combinations of primary-cortical features that facilitate encoding natural stimuli. Thus, by adding the primary-cortical representation, we can account for PEG single-unit responses to natural sounds better than bypassing it and considering as input the auditory spectrogram. These results confirm with explicit details the presumed hierarchical organization of the auditory cortex.
Collapse
Affiliation(s)
- Shoutik Mukherjee
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Laboratoire des Systèmes Perceptifs, Department des Études Cognitive, École Normale Supériure, Paris Sciences et Lettres University, Paris, France
| |
Collapse
|
45
|
Akbarinia A. Exploring the categorical nature of colour perception: Insights from artificial networks. Neural Netw 2025; 181:106758. [PMID: 39368278 DOI: 10.1016/j.neunet.2024.106758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/26/2024] [Accepted: 09/23/2024] [Indexed: 10/07/2024]
Abstract
The electromagnetic spectrum of light from a rainbow is a continuous signal, yet we perceive it vividly in several distinct colour categories. The origins and underlying mechanisms of this phenomenon remain partly unexplained. We investigate categorical colour perception in artificial neural networks (ANNs) using the odd-one-out paradigm. In the first experiment, we compared unimodal vision networks (e.g., ImageNet object recognition) to multimodal vision-language models (e.g., CLIP text-image matching). Our results show that vision networks predict a significant portion of human data (approximately 80%), while vision-language models account for the remaining unexplained data, even in non-linguistic experiments. These findings suggest that categorical colour perception is a language-independent representation, though it is partly shaped by linguistic colour terms during its development. In the second experiment, we explored how the visual task influences the colour categories of an ANN by examining twenty-four Taskonomy networks. Our results indicate that human-like colour categories are task-dependent, predominantly emerging in semantic and 3D tasks, with a notable absence in low-level tasks. To explain this difference, we analysed kernel responses before the winner-takes-all stage, observing that networks with mismatching colour categories may still align in underlying continuous representations. Our findings quantify the dual influence of visual signals and linguistic factors in categorical colour perception and demonstrate the task-dependent nature of this phenomenon, suggesting that categorical colour perception emerges to facilitate certain visual tasks.
Collapse
Affiliation(s)
- Arash Akbarinia
- Department of Experimental Psychology, University of Giessen, Germany.
| |
Collapse
|
46
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2025; 53:219-241. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
47
|
Gupta P, Dobs K. Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition. PLoS Comput Biol 2025; 21:e1012751. [PMID: 39869654 PMCID: PMC11790231 DOI: 10.1371/journal.pcbi.1012751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/03/2025] [Accepted: 12/24/2024] [Indexed: 01/29/2025] Open
Abstract
The human visual system possesses a remarkable ability to detect and process faces across diverse contexts, including the phenomenon of face pareidolia--seeing faces in inanimate objects. Despite extensive research, it remains unclear why the visual system employs such broadly tuned face detection capabilities. We hypothesized that face pareidolia results from the visual system's optimization for recognizing both faces and objects. To test this hypothesis, we used task-optimized deep convolutional neural networks (CNNs) and evaluated their alignment with human behavioral signatures and neural responses, measured via magnetoencephalography (MEG), related to pareidolia processing. Specifically, we trained CNNs on tasks involving combinations of face identification, face detection, object categorization, and object detection. Using representational similarity analysis, we found that CNNs that included object categorization in their training tasks represented pareidolia faces, real faces, and matched objects more similarly to neural responses than those that did not. Although these CNNs showed similar overall alignment with neural data, a closer examination of their internal representations revealed that specific training tasks had distinct effects on how pareidolia faces were represented across layers. Finally, interpretability methods revealed that only a CNN trained for both face identification and object categorization relied on face-like features-such as 'eyes'-to classify pareidolia stimuli as faces, mirroring findings in human perception. Our results suggest that human-like face pareidolia may emerge from the visual system's optimization for face identification within the context of generalized object categorization.
Collapse
Affiliation(s)
- Pranjul Gupta
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Katharina Dobs
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain, and Behavior, Universities of Marburg, Giessen and Darmstadt, Marburg, Germany
| |
Collapse
|
48
|
Duyck S, Costantino AI, Bracci S, Op de Beeck H. A computational deep learning investigation of animacy perception in the human brain. Commun Biol 2024; 7:1718. [PMID: 39741161 DOI: 10.1038/s42003-024-07415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
The functional organization of the human object vision pathway distinguishes between animate and inanimate objects. To understand animacy perception, we explore the case of zoomorphic objects resembling animals. While the perception of these objects as animal-like seems obvious to humans, such "Animal bias" is a striking discrepancy between the human brain and deep neural networks (DNNs). We computationally investigated the potential origins of this bias. We successfully induced this bias in DNNs trained explicitly with zoomorphic objects. Alternative training schedules failed to cause an Animal bias. We considered the superordinate distinction between animate and inanimate classes, the sensitivity for faces and bodies, the bias for shape over texture, the role of ecologically valid categories, recurrent connections, and language-informed visual processing. These findings provide computational support that the Animal bias for zoomorphic objects is a unique property of human perception yet can be explained by human learning history.
Collapse
Affiliation(s)
- Stefanie Duyck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Andrea I Costantino
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium.
| | - Stefania Bracci
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy
| | - Hans Op de Beeck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
49
|
Hosseini E, Casto C, Zaslavsky N, Conwell C, Richardson M, Fedorenko E. Universality of representation in biological and artificial neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.26.629294. [PMID: 39764030 PMCID: PMC11703180 DOI: 10.1101/2024.12.26.629294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Many artificial neural networks (ANNs) trained with ecologically plausible objectives on naturalistic data align with behavior and neural representations in biological systems. Here, we show that this alignment is a consequence of convergence onto the same representations by high-performing ANNs and by brains. We developed a method to identify stimuli that systematically vary the degree of inter-model representation agreement. Across language and vision, we then showed that stimuli from high- and low-agreement sets predictably modulated model-to-brain alignment. We also examined which stimulus features distinguish high- from low-agreement sentences and images. Our results establish representation universality as a core component in the model-to-brain alignment and provide a new approach for using ANNs to uncover the structure of biological representations and computations.
Collapse
Affiliation(s)
- Eghbal Hosseini
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Colton Casto
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
| | - Noga Zaslavsky
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology, New York University, New York, NY, USA
| | - Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Mark Richardson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA
| |
Collapse
|
50
|
Marczak-Czajka A, Redgrave T, Mitcheff M, Villano M, Czajka A. Assessment of human emotional reactions to visual stimuli "deep-dreamed" by artificial neural networks. Front Psychol 2024; 15:1509392. [PMID: 39776961 PMCID: PMC11703666 DOI: 10.3389/fpsyg.2024.1509392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
Introduction While the fact that visual stimuli synthesized by Artificial Neural Networks (ANN) may evoke emotional reactions is documented, the precise mechanisms that connect the strength and type of such reactions with the ways of how ANNs are used to synthesize visual stimuli are yet to be discovered. Understanding these mechanisms allows for designing methods that synthesize images attenuating or enhancing selected emotional states, which may provide unobtrusive and widely-applicable treatment of mental dysfunctions and disorders. Methods The Convolutional Neural Network (CNN), a type of ANN used in computer vision tasks which models the ways humans solve visual tasks, was applied to synthesize ("dream" or "hallucinate") images with no semantic content to maximize activations of neurons in precisely-selected layers in the CNN. The evoked emotions of 150 human subjects observing these images were self-reported on a two-dimensional scale (arousal and valence) utilizing self-assessment manikin (SAM) figures. Correlations between arousal and valence values and image visual properties (e.g., color, brightness, clutter feature congestion, and clutter sub-band entropy) as well as the position of the CNN's layers stimulated to obtain a given image were calculated. Results Synthesized images that maximized activations of some of the CNN layers led to significantly higher or lower arousal and valence levels compared to average subject's reactions. Multiple linear regression analysis found that a small set of selected image global visual features (hue, feature congestion, and sub-band entropy) are significant predictors of the measured arousal, however no statistically significant dependencies were found between image global visual features and the measured valence. Conclusion This study demonstrates that the specific method of synthesizing images by maximizing small and precisely-selected parts of the CNN used in this work may lead to synthesis of visual stimuli that enhance or attenuate emotional reactions. This method paves the way for developing tools that stimulate, in a non-invasive way, to support wellbeing (manage stress, enhance mood) and to assist patients with certain mental conditions by complementing traditional methods of therapeutic interventions.
Collapse
Affiliation(s)
- Agnieszka Marczak-Czajka
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Timothy Redgrave
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Mahsa Mitcheff
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Michael Villano
- Department of Psychology, University of Notre Dame, Notre Dame, IN, United States
| | - Adam Czajka
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| |
Collapse
|