1
|
Delhaye E, Besson G, Bahri MA, Bastin C. Object fine-grained discrimination as a sensitive cognitive marker of transentorhinal integrity. Commun Biol 2025; 8:800. [PMID: 40415135 DOI: 10.1038/s42003-025-08201-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 05/08/2025] [Indexed: 05/27/2025] Open
Abstract
The transentorhinal cortex (tErC) is one of the first regions affected by Alzheimer's disease (AD), often showing changes before clinical symptoms appear. Understanding its role in cognition is key to detecting early cognitive impairments in AD. This study tested the hypothesis that the tErC supports fine-grained representations of unique individual objects, sensitively to the granularity of the demanded discrimination, influencing both perceptual and mnemonic functions. We examined the tErC's role in object versus scene discrimination, using objective (based on a pretrained convolutional neural network, CNN) and subjective (human-rated) measures of visual similarity. Our results show that the structural integrity of the tErC is specifically related to the sensitivity to visual similarity for objects, but not for scenes. Importantly, this relationship depends on how visual similarity is measured: it appears only when using CNN visual similarity measures in perceptual discrimination, and solely when using subjective similarity ratings in mnemonic discrimination. Furthermore, in mnemonic discrimination, object sensitivity to visual similarity was specifically associated with the integrity of tErC-BA36 connectivity, only when similarity was computed from subjective ratings. Altogether, these findings suggest that discrimination sensitivity to object visual similarity may represent a specific marker of tErC integrity after accounting for the type of similarity measured.
Collapse
Affiliation(s)
- Emma Delhaye
- GIGA Research, CRC Human Imaging, University of Liège, Liège, Belgium.
- PsyNCog Research Unit, Faculty of Psychology, University of Liège, Liège, Belgium.
- CICPSI, Faculty of Psychology, University of Lisbon, Lisbon, Portugal.
| | - Gabriel Besson
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Mohamed Ali Bahri
- GIGA Research, CRC Human Imaging, University of Liège, Liège, Belgium
| | - Christine Bastin
- GIGA Research, CRC Human Imaging, University of Liège, Liège, Belgium
- PsyNCog Research Unit, Faculty of Psychology, University of Liège, Liège, Belgium
| |
Collapse
|
2
|
Verosky NJ, Morgan E. Temporal dependencies in event onsets and event content contain redundant information about musical meter. Cognition 2025; 263:106179. [PMID: 40414145 DOI: 10.1016/j.cognition.2025.106179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 02/26/2025] [Accepted: 05/05/2025] [Indexed: 05/27/2025]
Abstract
Musical stimuli present listeners with complex temporal information and rich periodic structure. Periodic patterns in music typically involve multiple hierarchical levels: a basic-level repeating pulse known as the "beat," and a higher-order grouping of beats into the "meter." Previous work has found that a musical stimulus's meter is predicted by recurring temporal patterns of note event onsets, measured by profiles of autocorrelation over time lags. Traditionally, that work has emphasized periodic structure in the timing of event onsets (i.e., repeating rhythms). Here, we suggest that musical meter is in fact a more general perceptual phenomenon, instantiating complex profiles of temporal dependencies across both event onsets and multiple feature dimensions in the actual content of events. We use classification techniques to test whether profiles of temporal dependencies in event onsets and in multiple types of event content predict musical meter. Applying random forest models to three musical corpora, we reproduce findings that profiles of temporal dependencies in note event onsets contain information about meter, but we find that profiles of temporal dependencies in pitch height, interval size, and tonal expectancy also contain such information, with high redundancy among temporal dependencies in event onsets and event content as predictors of meter. Moreover, information about meter is distributed across temporal dependencies at multiple time lags, as indicated by the baseline performance of an unsupervised classifier that selects the single time lag with maximum autocorrelation. Redundant profiles of temporal dependencies across multiple stimulus features may provide strong constraints on musical structure that inform listeners' predictive processes.
Collapse
Affiliation(s)
- Niels J Verosky
- Department of Psychology, Yale University, 100 College St., New Haven, CT 06510, United States.
| | - Emily Morgan
- Department of Linguistics, University of California, Davis, United States
| |
Collapse
|
3
|
Huang S, Howard CM, Bogdan PC, Morales-Torres R, Slayton M, Cabeza R, Davis SW. Trial-level Representational Similarity Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.27.645646. [PMID: 40236023 PMCID: PMC11996353 DOI: 10.1101/2025.03.27.645646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Neural representation refers to the brain activity that stands in for one's cognitive experience, and in cognitive neuroscience, the principal method to studying neural representations is representational similarity analysis (RSA). The classic RSA (cRSA) approach examines the overall quality of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data. However, because cRSA cannot model representation at the level of individual trials, it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation. Here, we formally introduce trial-level RSA (tRSA), an analytical framework that estimates the strength of neural representation for singular experimental trials and evaluates hypotheses using multi-level models. First, we verified the correspondence between tRSA and cRSA in quantifying the overall representation strength across all trials. Second, we compared the statistical inferences drawn from both approaches using simulated data that reflected a wide range of scenarios. Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive to true effects. Third, using real fMRI datasets, we further demonstrated several issues with cRSA, to which tRSA was more robust. Finally, we presented some novel findings of neural representations that could only be assessed with tRSA and not cRSA. In summary, tRSA proves to be a robust and versatile analytical approach for cognitive neuroscience and beyond.
Collapse
|
4
|
Cortinovis D, Peelen MV, Bracci S. Tool Representations in Human Visual Cortex. J Cogn Neurosci 2025; 37:515-531. [PMID: 39620956 DOI: 10.1162/jocn_a_02281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2025]
Abstract
Tools such as pens, forks, and scissors play an important role in many daily-life activities, an importance underscored by the presence in visual cortex of a set of tool-selective brain regions. This review synthesizes decades of neuroimaging research that investigated the representational spaces in the visual ventral stream for objects, such as tools, that are specifically characterized by action-related properties. Overall, results reveal a dissociation between representational spaces in ventral and lateral occipito-temporal cortex (OTC). While lateral OTC encodes both visual (shape) and action-related properties of objects, distinguishing between objects acting as end-effectors (e.g., tools, hands) versus similar noneffector manipulable objects (e.g., a glass), ventral OTC primarily represents objects' visual features such as their surface properties (e.g., material and texture). These areas act in concert with regions outside of OTC to support object interaction and tool use. The parallel investigation of the dimensions underlying object representations in artificial neural networks reveals both the possibilities and the difficulties in capturing the action-related dimensions that distinguish tools from other objects. Although artificial neural networks offer promise as models of visual cortex computations, challenges persist in replicating the action-related dimensions that go beyond mere visual features. Taken together, we propose that regions in OTC support the representation of tools based on a behaviorally relevant action code and suggest future paths to generate a computational model of this object space.
Collapse
|
5
|
Duyck S, Costantino AI, Bracci S, Op de Beeck H. A computational deep learning investigation of animacy perception in the human brain. Commun Biol 2024; 7:1718. [PMID: 39741161 DOI: 10.1038/s42003-024-07415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
The functional organization of the human object vision pathway distinguishes between animate and inanimate objects. To understand animacy perception, we explore the case of zoomorphic objects resembling animals. While the perception of these objects as animal-like seems obvious to humans, such "Animal bias" is a striking discrepancy between the human brain and deep neural networks (DNNs). We computationally investigated the potential origins of this bias. We successfully induced this bias in DNNs trained explicitly with zoomorphic objects. Alternative training schedules failed to cause an Animal bias. We considered the superordinate distinction between animate and inanimate classes, the sensitivity for faces and bodies, the bias for shape over texture, the role of ecologically valid categories, recurrent connections, and language-informed visual processing. These findings provide computational support that the Animal bias for zoomorphic objects is a unique property of human perception yet can be explained by human learning history.
Collapse
Affiliation(s)
- Stefanie Duyck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Andrea I Costantino
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium.
| | - Stefania Bracci
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy
| | - Hans Op de Beeck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
6
|
Xiong S, Tan Y, Wang G, Yan P, Xiang X. Learning feature relationships in CNN model via relational embedding convolution layer. Neural Netw 2024; 179:106510. [PMID: 39024707 DOI: 10.1016/j.neunet.2024.106510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 06/28/2024] [Accepted: 07/03/2024] [Indexed: 07/20/2024]
Abstract
Establishing the relationships among hierarchical visual attributes of objects in the visual world is crucial for human cognition. The classic convolution neural network (CNN) can successfully extract hierarchical features but ignore the relationships among features, resulting in shortcomings compared to humans in areas like interpretability and domain generalization. Recently, algorithms have introduced feature relationships by external prior knowledge and special auxiliary modules, which have been proven to bring multiple improvements in many computer vision tasks. However, prior knowledge is often difficult to obtain, and auxiliary modules bring additional consumption of computing and storage resources, which limits the flexibility and practicality of the algorithm. In this paper, we aim to drive the CNN model to learn the relationships among hierarchical deep features without prior knowledge and consumption increasing, while enhancing the fundamental performance of some aspects. Firstly, the task of learning the relationships among hierarchical features in CNN is defined and three key problems related to this task are pointed out, including the quantitative metric of connection intensity, the threshold of useless connections, and the updating strategy of relation graph. Secondly, Relational Embedding Convolution (RE-Conv) layer is proposed for the representation of feature relationships in convolution layer, followed by a scheme called use & disuse strategy which aims to address the three problems of feature relation learning. Finally, the improvements brought by the proposed feature relation learning scheme have been demonstrated through numerous experiments, including interpretability, domain generalization, noise robustness, and inference efficiency. In particular, the proposed scheme outperforms many state-of-the-art methods in the domain generalization community and can be seamlessly integrated with existing methods for further improvement. Meanwhile, it maintains comparable precision to the original CNN model while reducing floating point operations (FLOPs) by approximately 50%.
Collapse
Affiliation(s)
- Shengzhou Xiong
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Yihua Tan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Guoyou Wang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Pei Yan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| | - Xuanyu Xiang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China; National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, China.
| |
Collapse
|
7
|
Kallmayer A, Võ MLH. Anchor objects drive realism while diagnostic objects drive categorization in GAN generated scenes. COMMUNICATIONS PSYCHOLOGY 2024; 2:68. [PMID: 39242968 PMCID: PMC11332195 DOI: 10.1038/s44271-024-00119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/15/2024] [Indexed: 09/09/2024]
Abstract
Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N1 = 50, N2 = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.
Collapse
Affiliation(s)
- Aylin Kallmayer
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany.
| | - Melissa L-H Võ
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany
| |
Collapse
|
8
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
9
|
Lu Z, Golomb JD. Human EEG and artificial neural networks reveal disentangled representations of object real-world size in natural images. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.19.553999. [PMID: 37662197 PMCID: PMC10473678 DOI: 10.1101/2023.08.19.553999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Remarkably, human brains have the ability to accurately perceive and process the real-world size of objects, despite vast differences in distance and perspective. While previous studies have delved into this phenomenon, distinguishing this ability from other visual perceptions, like depth, has been challenging. Using the THINGS EEG2 dataset with high time-resolution human brain recordings and more ecologically valid naturalistic stimuli, our study uses an innovative approach to disentangle neural representations of object real-world size from retinal size and perceived real-world depth in a way that was not previously possible. Leveraging this state-of-the-art dataset, our EEG representational similarity results reveal a pure representation of object real-world size in human brains. We report a representational timeline of visual object processing: object real-world depth appeared first, then retinal size, and finally, real-world size. Additionally, we input both these naturalistic images and object-only images without natural background into artificial neural networks. Consistent with the human EEG findings, we also successfully disentangled representation of object real-world size from retinal size and real-world depth in all three types of artificial neural networks (visual-only ResNet, visual-language CLIP, and language-only Word2Vec). Moreover, our multi-modal representational comparison framework across human EEG and artificial neural networks reveals real-world size as a stable and higher-level dimension in object space incorporating both visual and semantic information. Our research provides a detailed and clear characterization of the object processing process, which offers further advances and insights into our understanding of object space and the construction of more brain-like visual models.
Collapse
|
10
|
Li AY, Mur M. Neural networks need real-world behavior. Behav Brain Sci 2023; 46:e398. [PMID: 38054287 DOI: 10.1017/s0140525x23001504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Bowers et al. propose to use controlled behavioral experiments when evaluating deep neural networks as models of biological vision. We agree with the sentiment and draw parallels to the notion that "neuroscience needs behavior." As a promising path forward, we suggest complementing image recognition tasks with increasingly realistic and well-controlled task environments that engage real-world object recognition behavior.
Collapse
Affiliation(s)
- Aedan Y Li
- Department of Psychology, Western University, London, ON, Canada , www.aedanyueli.com
| | - Marieke Mur
- Department of Psychology, Western University, London, ON, Canada , www.aedanyueli.com
- Department of Computer Science, Western University, London, ON, Canada
| |
Collapse
|
11
|
von Seth J, Nicholls VI, Tyler LK, Clarke A. Recurrent connectivity supports higher-level visual and semantic object representations in the brain. Commun Biol 2023; 6:1207. [PMID: 38012301 PMCID: PMC10682037 DOI: 10.1038/s42003-023-05565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Visual object recognition has been traditionally conceptualised as a predominantly feedforward process through the ventral visual pathway. While feedforward artificial neural networks (ANNs) can achieve human-level classification on some image-labelling tasks, it's unclear whether computational models of vision alone can accurately capture the evolving spatiotemporal neural dynamics. Here, we probe these dynamics using a combination of representational similarity and connectivity analyses of fMRI and MEG data recorded during the recognition of familiar, unambiguous objects. Modelling the visual and semantic properties of our stimuli using an artificial neural network as well as a semantic feature model, we find that unique aspects of the neural architecture and connectivity dynamics relate to visual and semantic object properties. Critically, we show that recurrent processing between the anterior and posterior ventral temporal cortex relates to higher-level visual properties prior to semantic object properties, in addition to semantic-related feedback from the frontal lobe to the ventral temporal lobe between 250 and 500 ms after stimulus onset. These results demonstrate the distinct contributions made by semantic object properties in explaining neural activity and connectivity, highlighting it as a core part of object recognition not fully accounted for by current biologically inspired neural networks.
Collapse
Affiliation(s)
- Jacqueline von Seth
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Lorraine K Tyler
- Department of Psychology, University of Cambridge, Cambridge, UK
- Cambridge Centre for Ageing and Neuroscience (Cam-CAN), University of Cambridge and MRC Cognition and Brain Sciences Unit, Cambridge, UK
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK.
| |
Collapse
|