1
|
Hansen BC, Greene MR, Lewinsohn HAS, Kris AE, Smyth S, Tang B. Brain-guided convolutional neural networks reveal task-specific representations in scene processing. Sci Rep 2025; 15:13025. [PMID: 40234494 PMCID: PMC12000445 DOI: 10.1038/s41598-025-96307-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Accepted: 03/27/2025] [Indexed: 04/17/2025] Open
Abstract
Scene categorization is the dominant proxy for visual understanding, yet humans can perform a large number of visual tasks within any scene. Consequently, we know little about how different tasks change how a scene is processed, represented, and its features ultimately used. Here, we developed a novel brain-guided convolutional neural network (CNN) where each convolutional layer was separately guided by neural responses taken at different time points while observers performed a pre-cued object detection task or a scene affordance task on the same set of images. We then reconstructed each layer's activation maps via deconvolution to spatially assess how different features were used within each task. The brain-guided CNN made use of image features that human observers identified as being crucial to complete each task starting around 244 ms and persisted to 402 ms. Critically, because the same images were used across the two tasks, the CNN could only succeed if the neural data captured task-relevant differences. Our analyses of the activation maps across layers revealed that the brain's spatiotemporal representation of local image features evolves systematically over time. This underscores how distinct image features emerge at different stages of processing, shaped by the observer's goals and behavioral context.
Collapse
Affiliation(s)
- Bruce C Hansen
- Department of Psychological & Brain Sciences, Neuroscience Program, Colgate University, Hamilton, NY, USA.
| | - Michelle R Greene
- Barnard College, Department of Psychology, Columbia University, New York, NY, USA
| | - Henry A S Lewinsohn
- Department of Psychological & Brain Sciences, Neuroscience Program, Colgate University, Hamilton, NY, USA
| | - Audrey E Kris
- Department of Psychological & Brain Sciences, Neuroscience Program, Colgate University, Hamilton, NY, USA
| | - Sophie Smyth
- Department of Psychological & Brain Sciences, Neuroscience Program, Colgate University, Hamilton, NY, USA
| | - Binghui Tang
- Department of Psychological & Brain Sciences, Neuroscience Program, Colgate University, Hamilton, NY, USA
| |
Collapse
|
2
|
Yao L, Fu Q, Liu CH, Wang J, Yi Z. Distinguishing the roles of edge, color, and other surface information in basic and superordinate scene representation. Neuroimage 2025; 310:121100. [PMID: 40021071 DOI: 10.1016/j.neuroimage.2025.121100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 02/13/2025] [Accepted: 02/26/2025] [Indexed: 03/03/2025] Open
Abstract
The human brain possesses a remarkable ability to recognize scenes depicted in line drawings, despite that these drawings contain only edge information. It remains unclear how the brain uses this information alongside surface information in scene recognition. Here, we combined electroencephalogram (EEG) and multivariate pattern analysis (MVPA) methods to distinguish the roles of edge, color, and other surface information in scene representation at the basic category level and superordinate naturalness level over time. The time-resolved decoding results indicated that edge information in line drawings is both sufficient and more effective than in color photographs and grayscale images at the superordinate naturalness level. Meanwhile, color and other surface information are exclusively involved in neural representation at the basic category level. The time generalization analysis further revealed that edge information is crucial for representation at both levels of abstraction. These findings highlight the distinct roles of edge, color, and other surface information in dynamic neural scene processing, shedding light on how the human brain represents scene information at different levels of abstraction.
Collapse
Affiliation(s)
- Liansheng Yao
- State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Qiufang Fu
- State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.
| | - Chang Hong Liu
- Department of Psychology, Bournemouth University, Fern Barrow, Poole, UK
| | - Jianyong Wang
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| | - Zhang Yi
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
3
|
Schmidt F, Hebart MN, Schmid AC, Fleming RW. Core dimensions of human material perception. Proc Natl Acad Sci U S A 2025; 122:e2417202122. [PMID: 40042912 PMCID: PMC11912425 DOI: 10.1073/pnas.2417202122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 01/24/2025] [Indexed: 03/19/2025] Open
Abstract
Visually categorizing and comparing materials is crucial for everyday behavior, but what organizational principles underlie our mental representation of materials? Here, we used a large-scale data-driven approach to uncover core latent dimensions of material representations from behavior. First, we created an image dataset of 200 systematically sampled materials and 600 photographs (STUFF dataset, https://osf.io/myutc/). Using these images, we next collected 1.87 million triplet similarity judgments and used a computational model to derive a set of sparse, positive dimensions underlying these judgments. The resulting multidimensional embedding space predicted independent material similarity judgments and the similarity matrix of all images close to the human intersubject consistency. We found that representations of individual images were captured by a combination of 36 material dimensions that were highly reproducible and interpretable, comprising perceptual (e.g., grainy, blue) as well as conceptual (e.g., mineral, viscous) dimensions. These results provide the foundation for a comprehensive understanding of how humans make sense of materials.
Collapse
Affiliation(s)
- Filipp Schmidt
- Experimental Psychology, Justus Liebig University, Giessen35394, Germany
- Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Marburg35032, Germany
| | - Martin N. Hebart
- Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Marburg35032, Germany
- Department of Medicine, Justus Liebig University, Giessen35390, Germany
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
| | - Alexandra C. Schmid
- Experimental Psychology, Justus Liebig University, Giessen35394, Germany
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD20814
| | - Roland W. Fleming
- Experimental Psychology, Justus Liebig University, Giessen35394, Germany
- Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Marburg35032, Germany
| |
Collapse
|
4
|
Wang G, Chen L, Cichy RM, Kaiser D. Enhanced and idiosyncratic neural representations of personally typical scenes. Proc Biol Sci 2025; 292:20250272. [PMID: 40132631 PMCID: PMC11936675 DOI: 10.1098/rspb.2025.0272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 02/25/2025] [Accepted: 02/26/2025] [Indexed: 03/27/2025] Open
Abstract
Previous research shows that the typicality of visual scenes (i.e. if they are good examples of a category) determines how easily they can be perceived and represented in the brain. However, the unique visual diets individuals are exposed to across their lifetimes should sculpt very personal notions of typicality. Here, we thus investigated whether scenes that are more typical to individual observers are more accurately perceived and represented in the brain. We used drawings to enable participants to describe typical scenes (e.g. a kitchen) and converted these drawings into three-dimensional renders. These renders were used as stimuli in a scene categorization task, during which we recorded electroencephalography (EEG). In line with previous findings, categorization was most accurate for renders resembling the typical scene drawings of individual participants. Our EEG analyses reveal two critical insights on how these individual differences emerge on the neural level. First, personally typical scenes yielded enhanced neural representations from around 200 ms after onset. Second, personally typical scenes were represented in idiosyncratic ways, with reduced dependence on high-level visual features. We interpret these findings in a predictive processing framework, where individual differences in internal models of scene categories formed through experience shape visual analysis in idiosyncratic ways.
Collapse
Affiliation(s)
- Gongting Wang
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Gießen, Germany
| | - Lixiang Chen
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Gießen, Germany
| | | | - Daniel Kaiser
- Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Gießen, Germany
- Center for Mind, Brain and Behavior (CMBB), Justus-Liebig-Universität Gießen, Philipps-Universität Marburg and Technische Universität Darmstadt, Marburg, Germany
| |
Collapse
|
5
|
Cortinovis D, Peelen MV, Bracci S. Tool Representations in Human Visual Cortex. J Cogn Neurosci 2025; 37:515-531. [PMID: 39620956 DOI: 10.1162/jocn_a_02281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2025]
Abstract
Tools such as pens, forks, and scissors play an important role in many daily-life activities, an importance underscored by the presence in visual cortex of a set of tool-selective brain regions. This review synthesizes decades of neuroimaging research that investigated the representational spaces in the visual ventral stream for objects, such as tools, that are specifically characterized by action-related properties. Overall, results reveal a dissociation between representational spaces in ventral and lateral occipito-temporal cortex (OTC). While lateral OTC encodes both visual (shape) and action-related properties of objects, distinguishing between objects acting as end-effectors (e.g., tools, hands) versus similar noneffector manipulable objects (e.g., a glass), ventral OTC primarily represents objects' visual features such as their surface properties (e.g., material and texture). These areas act in concert with regions outside of OTC to support object interaction and tool use. The parallel investigation of the dimensions underlying object representations in artificial neural networks reveals both the possibilities and the difficulties in capturing the action-related dimensions that distinguish tools from other objects. Although artificial neural networks offer promise as models of visual cortex computations, challenges persist in replicating the action-related dimensions that go beyond mere visual features. Taken together, we propose that regions in OTC support the representation of tools based on a behaviorally relevant action code and suggest future paths to generate a computational model of this object space.
Collapse
|
6
|
Mononen R, Saarela T, Vallinoja J, Olkkonen M, Henriksson L. Cortical Encoding of Spatial Structure and Semantic Content in 3D Natural Scenes. J Neurosci 2025; 45:e2157232024. [PMID: 39788741 PMCID: PMC11866997 DOI: 10.1523/jneurosci.2157-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 11/25/2024] [Accepted: 12/24/2024] [Indexed: 01/12/2025] Open
Abstract
Our visual system enables us to effortlessly navigate and recognize real-world visual environments. Functional magnetic resonance imaging (fMRI) studies suggest a network of scene-responsive cortical visual areas, but much less is known about the temporal order in which different scene properties are analyzed by the human visual system. In this study, we selected a set of 36 full-color natural scenes that varied in spatial structure and semantic content that our male and female human participants viewed both in 2D and 3D while we recorded magnetoencephalography (MEG) data. MEG enables tracking of cortical activity in humans at millisecond timescale. We compared the representational geometry in the MEG responses with predictions based on the scene stimuli using the representational similarity analysis framework. The representational structure first reflected the spatial structure in the scenes in time window 90-125 ms, followed by the semantic content in time window 140-175 ms after stimulus onset. The 3D stereoscopic viewing of the scenes affected the responses relatively late, from ∼140 ms from stimulus onset. Taken together, our results indicate that the human visual system rapidly encodes a scene's spatial structure and suggest that this information is based on monocular instead of binocular depth cues.
Collapse
Affiliation(s)
- Riikka Mononen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- MEG Core, Aalto NeuroImaging, Aalto University, Espoo FI-00076, Finland
| | - Toni Saarela
- Department of Psychology, University of Helsinki, Helsinki FI-00014, Finland
| | - Jaakko Vallinoja
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- MEG Core, Aalto NeuroImaging, Aalto University, Espoo FI-00076, Finland
| | - Maria Olkkonen
- Department of Psychology, University of Helsinki, Helsinki FI-00014, Finland
| | - Linda Henriksson
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- MEG Core, Aalto NeuroImaging, Aalto University, Espoo FI-00076, Finland
| |
Collapse
|
7
|
Greene MR, Rohan AM. The brain prioritizes the basic level of object category abstraction. Sci Rep 2025; 15:31. [PMID: 39747114 PMCID: PMC11695711 DOI: 10.1038/s41598-024-80546-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 11/19/2024] [Indexed: 01/04/2025] Open
Abstract
The same object can be described at multiple levels of abstraction ("parka", "coat", "clothing"), yet human observers consistently name objects at a mid-level of specificity known as the basic level. Little is known about the temporal dynamics involved in retrieving neural representations that prioritize the basic level, nor how these dynamics change with evolving task demands. In this study, observers viewed 1080 objects arranged in a three-tier category taxonomy while 64-channel EEG was recorded. Observers performed a categorical one-back task in different recording sessions on the basic or subordinate levels. We used time-resolved multiple regression to assess the utility of superordinate-, basic-, and subordinate-level categories across the scalp. We found robust use of basic-level category information starting at about 50 ms after stimulus onset and moving from posterior electrodes (149 ms) through lateral (261 ms) to anterior sites (332 ms). Task differences were not evident in the first 200 ms of processing but were observed between 200-300 ms after stimulus presentation. Together, this work demonstrates that the object category representations prioritize the basic level and do so relatively early, congruent with results that show that basic-level categorization is an automatic and obligatory process.
Collapse
Affiliation(s)
- Michelle R Greene
- Bates College Program in Neuroscience, Bates College, Lewiston, ME, USA.
- Department of Psychology, Barnard College, Columbia University, 3009 Broadway, New York, NY 10027, USA.
| | - Alyssa Magill Rohan
- Bates College Program in Neuroscience, Bates College, Lewiston, ME, USA
- Boston Children's Hospital, Boston, USA
| |
Collapse
|
8
|
Peng Y, Gong X, Lu H, Fang F. Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers. J Cogn Neurosci 2024; 36:2458-2480. [PMID: 39106158 DOI: 10.1162/jocn_a_02233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.
Collapse
Affiliation(s)
- Yujia Peng
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- Institute for Artificial Intelligence, Peking University, Beijing, People's Republic of China
- National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China
- Department of Psychology, University of California, Los Angeles
| | - Xizi Gong
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
| | - Hongjing Lu
- Department of Psychology, University of California, Los Angeles
- Department of Statistics, University of California, Los Angeles
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, People's Republic of China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, People's Republic of China
| |
Collapse
|
9
|
Naveilhan C, Saulay-Carret M, Zory R, Ramanoël S. Spatial Contextual Information Modulates Affordance Processing and Early Electrophysiological Markers of Scene Perception. J Cogn Neurosci 2024; 36:2084-2099. [PMID: 39023371 DOI: 10.1162/jocn_a_02223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Scene perception allows humans to extract information from their environment and plan navigation efficiently. The automatic extraction of potential paths in a scene, also referred to as navigational affordance, is supported by scene-selective regions (SSRs) that enable efficient human navigation. Recent evidence suggests that the activity of these SSRs can be influenced by information from adjacent spatial memory areas. However, it remains unexplored how this contextual information could influence the extraction of bottom-up information, such as navigational affordances, from a scene and the underlying neural dynamics. Therefore, we analyzed ERPs in 26 young adults performing scene and spatial memory tasks in artificially generated rooms with varying numbers and locations of available doorways. We found that increasing the number of navigational affordances only impaired performance in the spatial memory task. ERP results showed a similar pattern of activity for both tasks, but with increased P2 amplitude in the spatial memory task compared with the scene memory. Finally, we reported no modulation of the P2 component by the number of affordances in either task. This modulation of early markers of visual processing suggests that the dynamics of SSR activity are influenced by a priori knowledge, with increased amplitude when participants have more contextual information about the perceived scene. Overall, our results suggest that prior spatial knowledge about the scene, such as the location of a goal, modulates early cortical activity associated with SSRs, and that this information may interact with bottom-up processing of scene content, such as navigational affordances.
Collapse
Affiliation(s)
| | | | - Raphaël Zory
- LAMHESS, Université Côte d'Azur, Nice, France
- Institut Universitaire de France (IUF)
| | - Stephen Ramanoël
- LAMHESS, Université Côte d'Azur, Nice, France
- INSERM, CNRS, Institut de la Vision, Sorbonne Université, Paris, France
| |
Collapse
|
10
|
Duecker K, Idiart M, van Gerven M, Jensen O. Oscillations in an artificial neural network convert competing inputs into a temporal code. PLoS Comput Biol 2024; 20:e1012429. [PMID: 39259769 PMCID: PMC11419396 DOI: 10.1371/journal.pcbi.1012429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 09/23/2024] [Accepted: 08/17/2024] [Indexed: 09/13/2024] Open
Abstract
The field of computer vision has long drawn inspiration from neuroscientific studies of the human and non-human primate visual system. The development of convolutional neural networks (CNNs), for example, was informed by the properties of simple and complex cells in early visual cortex. However, the computational relevance of oscillatory dynamics experimentally observed in the visual system are typically not considered in artificial neural networks (ANNs). Computational models of neocortical dynamics, on the other hand, rarely take inspiration from computer vision. Here, we combine methods from computational neuroscience and machine learning to implement multiplexing in a simple ANN using oscillatory dynamics. We first trained the network to classify individually presented letters. Post-training, we added temporal dynamics to the hidden layer, introducing refraction in the hidden units as well as pulsed inhibition mimicking neuronal alpha oscillations. Without these dynamics, the trained network correctly classified individual letters but produced a mixed output when presented with two letters simultaneously, indicating a bottleneck problem. When introducing refraction and oscillatory inhibition, the output nodes corresponding to the two stimuli activate sequentially, ordered along the phase of the inhibitory oscillations. Our model implements the idea that inhibitory oscillations segregate competing inputs in time. The results of our simulations pave the way for applications in deeper network architectures and more complicated machine learning problems.
Collapse
Affiliation(s)
- Katharina Duecker
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, United Kingdom
- Department of Neuroscience, Brown University, Providence, Rhode Island, United States of America
| | - Marco Idiart
- Institute of Physics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Ole Jensen
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
11
|
Ma AC, Cameron AD, Wiener M. Memorability shapes perceived time (and vice versa). Nat Hum Behav 2024; 8:1296-1308. [PMID: 38649460 DOI: 10.1038/s41562-024-01863-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 03/13/2024] [Indexed: 04/25/2024]
Abstract
Visual stimuli are known to vary in their perceived duration. Some visual stimuli are also known to linger for longer in memory. Yet, whether these two features of visual processing are linked is unknown. Despite early assumptions that time is an extracted or higher-order feature of perception, more recent work over the past two decades has demonstrated that timing may be instantiated within sensory modality circuits. A primary location for many of these studies is the visual system, where duration-sensitive responses have been demonstrated. Furthermore, visual stimulus features have been observed to shift perceived duration. These findings suggest that visual circuits mediate or construct perceived time. Here we present evidence across a series of experiments that perceived time is affected by the image properties of scene size, clutter and memorability. More specifically, we observe that scene size and memorability dilate time, whereas clutter contracts it. Furthermore, the durations of more memorable images are also perceived more precisely. Conversely, the longer the perceived duration of an image, the more memorable it is. To explain these findings, we applied a recurrent convolutional neural network model of the ventral visual system, in which images are progressively processed over time. We find that more memorable images are processed faster, and that this increase in processing speed predicts both the lengthening and the increased precision of perceived durations. These findings provide evidence for a link between image features, time perception and memory that can be further explored with models of visual processing.
Collapse
Affiliation(s)
- Alex C Ma
- Department of Psychology, George Mason University, Fairfax, VA, USA
| | - Ayana D Cameron
- Department of Psychology, George Mason University, Fairfax, VA, USA
| | - Martin Wiener
- Department of Psychology, George Mason University, Fairfax, VA, USA.
| |
Collapse
|
12
|
Stecher R, Kaiser D. Representations of imaginary scenes and their properties in cortical alpha activity. Sci Rep 2024; 14:12796. [PMID: 38834699 DOI: 10.1038/s41598-024-63320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 05/28/2024] [Indexed: 06/06/2024] Open
Abstract
Imagining natural scenes enables us to engage with a myriad of simulated environments. How do our brains generate such complex mental images? Recent research suggests that cortical alpha activity carries information about individual objects during visual imagery. However, it remains unclear if more complex imagined contents such as natural scenes are similarly represented in alpha activity. Here, we answer this question by decoding the contents of imagined scenes from rhythmic cortical activity patterns. In an EEG experiment, participants imagined natural scenes based on detailed written descriptions, which conveyed four complementary scene properties: openness, naturalness, clutter level and brightness. By conducting classification analyses on EEG power patterns across neural frequencies, we were able to decode both individual imagined scenes as well as their properties from the alpha band, showing that also the contents of complex visual images are represented in alpha rhythms. A cross-classification analysis between alpha power patterns during the imagery task and during a perception task, in which participants were presented images of the described scenes, showed that scene representations in the alpha band are partly shared between imagery and late stages of perception. This suggests that alpha activity mediates the top-down re-activation of scene-related visual contents during imagery.
Collapse
Affiliation(s)
- Rico Stecher
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany.
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, 35032, Marburg, Germany
| |
Collapse
|
13
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
14
|
Lahner B, Mohsenzadeh Y, Mullin C, Oliva A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol 2024; 22:e3002564. [PMID: 38557761 PMCID: PMC10984539 DOI: 10.1371/journal.pbio.3002564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Yalda Mohsenzadeh
- The Brain and Mind Institute, The University of Western Ontario, London, Canada
- Department of Computer Science, The University of Western Ontario, London, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Caitlin Mullin
- Vision: Science to Application (VISTA), York University, Toronto, Ontario, Canada
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
15
|
Dwivedi K, Sadiya S, Balode MP, Roig G, Cichy RM. Visual features are processed before navigational affordances in the human brain. Sci Rep 2024; 14:5573. [PMID: 38448446 PMCID: PMC10917749 DOI: 10.1038/s41598-024-55652-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 02/26/2024] [Indexed: 03/08/2024] Open
Abstract
To navigate through their immediate environment humans process scene information rapidly. How does the cascade of neural processing elicited by scene viewing to facilitate navigational planning unfold over time? To investigate, we recorded human brain responses to visual scenes with electroencephalography and related those to computational models that operationalize three aspects of scene processing (2D, 3D, and semantic information), as well as to a behavioral model capturing navigational affordances. We found a temporal processing hierarchy: navigational affordance is processed later than the other scene features (2D, 3D, and semantic) investigated. This reveals the temporal order with which the human brain computes complex scene information and suggests that the brain leverages these pieces of information to plan navigation.
Collapse
Affiliation(s)
- Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany
| | - Sari Sadiya
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany.
- Frankfurt Institute for Advanced Studies (FIAS), Frankfurt, Germany.
| | - Marta P Balode
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Institute of Neuroinformatics, ETH Zurich and University of Zurich, Zurich, Switzerland
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany
- The Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
16
|
Liu J, Fan T, Chen Y, Zhao J. Seeking the neural representation of statistical properties in print during implicit processing of visual words. NPJ SCIENCE OF LEARNING 2023; 8:60. [PMID: 38102191 PMCID: PMC10724295 DOI: 10.1038/s41539-023-00209-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 11/29/2023] [Indexed: 12/17/2023]
Abstract
Statistical learning (SL) plays a key role in literacy acquisition. Studies have increasingly revealed the influence of distributional statistical properties of words on visual word processing, including the effects of word frequency (lexical level) and mappings between orthography, phonology, and semantics (sub-lexical level). However, there has been scant evidence to directly confirm that the statistical properties contained in print can be directly characterized by neural activities. Using time-resolved representational similarity analysis (RSA), the present study examined neural representations of different types of statistical properties in visual word processing. From the perspective of predictive coding, an equal probability sequence with low built-in prediction precision and three oddball sequences with high built-in prediction precision were designed with consistent and three types of inconsistent (orthographically inconsistent, orthography-to-phonology inconsistent, and orthography-to-semantics inconsistent) Chinese characters as visual stimuli. In the three oddball sequences, consistent characters were set as the standard stimuli (probability of occurrence p = 0.75) and three types of inconsistent characters were set as deviant stimuli (p = 0.25), respectively. In the equal probability sequence, the same consistent and inconsistent characters were presented randomly with identical occurrence probability (p = 0.25). Significant neural representation activities of word frequency were observed in the equal probability sequence. By contrast, neural representations of sub-lexical statistics only emerged in oddball sequences where short-term predictions were shaped. These findings reveal that the statistical properties learned from long-term print environment continues to play a role in current word processing mechanisms and these mechanisms can be modulated by short-term predictions.
Collapse
Affiliation(s)
- Jianyi Liu
- School of Psychology, Shaanxi Normal University, and Key Laboratory for Behavior and Cognitive Neuroscience of Shaanxi Province, Xi'an, China.
| | - Tengwen Fan
- School of Psychology, Shaanxi Normal University, and Key Laboratory for Behavior and Cognitive Neuroscience of Shaanxi Province, Xi'an, China
| | - Yan Chen
- Key laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- Key laboratory of Human Development and Mental Health of Hubei Province, School of Psychology, Central China Normal University, Wuhan, China
| | - Jingjing Zhao
- School of Psychology, Shaanxi Normal University, and Key Laboratory for Behavior and Cognitive Neuroscience of Shaanxi Province, Xi'an, China.
| |
Collapse
|
17
|
Csaky R, van Es MWJ, Jones OP, Woolrich M. Interpretable many-class decoding for MEG. Neuroimage 2023; 282:120396. [PMID: 37805019 PMCID: PMC10938061 DOI: 10.1016/j.neuroimage.2023.120396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/11/2023] [Accepted: 09/27/2023] [Indexed: 10/09/2023] Open
Abstract
Multivariate pattern analysis (MVPA) of Magnetoencephalography (MEG) and Electroencephalography (EEG) data is a valuable tool for understanding how the brain represents and discriminates between different stimuli. Identifying the spatial and temporal signatures of stimuli is typically a crucial output of these analyses. Such analyses are mainly performed using linear, pairwise, sliding window decoding models. These allow for relative ease of interpretation, e.g. by estimating a time-course of decoding accuracy, but have limited decoding performance. On the other hand, full epoch multiclass decoding models, commonly used for brain-computer interface (BCI) applications, can provide better decoding performance. However interpretation methods for such models have been designed with a low number of classes in mind. In this paper, we propose an approach that combines a multiclass, full epoch decoding model with supervised dimensionality reduction, while still being able to reveal the contributions of spatiotemporal and spectral features using permutation feature importance. Crucially, we introduce a way of doing supervised dimensionality reduction of input features within a neural network optimised for the classification task, improving performance substantially. We demonstrate the approach on 3 different many-class task-MEG datasets using image presentations. Our results demonstrate that this approach consistently achieves higher accuracy than the peak accuracy of a sliding window decoder while estimating the relevant spatiotemporal features in the MEG signal.
Collapse
Affiliation(s)
- Richard Csaky
- Oxford Centre for Human Brain Activity, Department of Psychiatry, University of Oxford, OX3 7JX, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, OX3 9DU, Oxford, UK; Christ Church, OX1 1DP, Oxford, UK.
| | - Mats W J van Es
- Oxford Centre for Human Brain Activity, Department of Psychiatry, University of Oxford, OX3 7JX, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, OX3 9DU, Oxford, UK.
| | - Oiwi Parker Jones
- Wellcome Centre for Integrative Neuroimaging, OX3 9DU, Oxford, UK; Department of Engineering Science, University of Oxford, OX1 3PJ, Oxford, UK; Jesus College, OX1 3DW, Oxford, UK.
| | - Mark Woolrich
- Oxford Centre for Human Brain Activity, Department of Psychiatry, University of Oxford, OX3 7JX, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, OX3 9DU, Oxford, UK.
| |
Collapse
|
18
|
Lee Masson H, Isik L. Rapid Processing of Observed Touch through Social Perceptual Brain Regions: An EEG-fMRI Fusion Study. J Neurosci 2023; 43:7700-7711. [PMID: 37871963 PMCID: PMC10634570 DOI: 10.1523/jneurosci.0995-23.2023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 08/09/2023] [Accepted: 08/31/2023] [Indexed: 10/25/2023] Open
Abstract
Seeing social touch triggers a strong social-affective response that involves multiple brain networks, including visual, social perceptual, and somatosensory systems. Previous studies have identified the specific functional role of each system, but little is known about the speed and directionality of the information flow. Is this information extracted via the social perceptual system or from simulation from somatosensory cortex? To address this, we examined the spatiotemporal neural processing of observed touch. Twenty-one human participants (seven males) watched 500-ms video clips showing social and nonsocial touch during electroencephalogram (EEG) recording. Visual and social-affective features were rapidly extracted in the brain, beginning at 90 and 150 ms after video onset, respectively. Combining the EEG data with functional magnetic resonance imaging (fMRI) data from our prior study with the same stimuli reveals that neural information first arises in early visual cortex (EVC), then in the temporoparietal junction and posterior superior temporal sulcus (TPJ/pSTS), and finally in the somatosensory cortex. EVC and TPJ/pSTS uniquely explain EEG neural patterns, while somatosensory cortex does not contribute to EEG patterns alone, suggesting that social-affective information may flow from TPJ/pSTS to somatosensory cortex. Together, these findings show that social touch is processed quickly, within the timeframe of feedforward visual processes, and that the social-affective meaning of touch is first extracted by a social perceptual pathway. Such rapid processing of social touch may be vital to its effective use during social interaction.SIGNIFICANCE STATEMENT Seeing physical contact between people evokes a strong social-emotional response. Previous research has identified the brain systems responsible for this response, but little is known about how quickly and in what direction the information flows. We demonstrated that the brain processes the social-emotional meaning of observed touch quickly, starting as early as 150 ms after the stimulus onset. By combining electroencephalogram (EEG) data with functional magnetic resonance imaging (fMRI) data, we show for the first time that the social-affective meaning of touch is first extracted by a social perceptual pathway and followed by the later involvement of somatosensory simulation. This rapid processing of touch through the social perceptual route may play a pivotal role in effective usage of touch in social communication and interaction.
Collapse
Affiliation(s)
- Haemy Lee Masson
- Department of Psychology, Durham University, Durham DH1 3LE, United Kingdom
- Department of Cognitive Science, Johns Hopkins University, Baltimore, Maryland 21218
| | - Leyla Isik
- Department of Cognitive Science, Johns Hopkins University, Baltimore, Maryland 21218
| |
Collapse
|
19
|
Karapetian A, Boyanova A, Pandaram M, Obermayer K, Kietzmann TC, Cichy RM. Empirically Identifying and Computationally Modeling the Brain-Behavior Relationship for Human Scene Categorization. J Cogn Neurosci 2023; 35:1879-1897. [PMID: 37590093 PMCID: PMC10586810 DOI: 10.1162/jocn_a_02043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Humans effortlessly make quick and accurate perceptual decisions about the nature of their immediate visual environment, such as the category of the scene they face. Previous research has revealed a rich set of cortical representations potentially underlying this feat. However, it remains unknown which of these representations are suitably formatted for decision-making. Here, we approached this question empirically and computationally, using neuroimaging and computational modeling. For the empirical part, we collected EEG data and RTs from human participants during a scene categorization task (natural vs. man-made). We then related EEG data to behavior to behavior using a multivariate extension of signal detection theory. We observed a correlation between neural data and behavior specifically between ∼100 msec and ∼200 msec after stimulus onset, suggesting that the neural scene representations in this time period are suitably formatted for decision-making. For the computational part, we evaluated a recurrent convolutional neural network (RCNN) as a model of brain and behavior. Unifying our previous observations in an image-computable model, the RCNN predicted well the neural representations, the behavioral scene categorization data, as well as the relationship between them. Our results identify and computationally characterize the neural and behavioral correlates of scene categorization in humans.
Collapse
Affiliation(s)
- Agnessa Karapetian
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | | | | | - Klaus Obermayer
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Technische Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| | | | - Radoslaw M Cichy
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| |
Collapse
|
20
|
Brandman T, Peelen MV. Objects sharpen visual scene representations: evidence from MEG decoding. Cereb Cortex 2023; 33:9524-9531. [PMID: 37365829 PMCID: PMC10431745 DOI: 10.1093/cercor/bhad222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/28/2023] Open
Abstract
Real-world scenes consist of objects, defined by local information, and scene background, defined by global information. Although objects and scenes are processed in separate pathways in visual cortex, their processing interacts. Specifically, previous studies have shown that scene context makes blurry objects look sharper, an effect that can be observed as a sharpening of object representations in visual cortex from around 300 ms after stimulus onset. Here, we use MEG to show that objects can also sharpen scene representations, with the same temporal profile. Photographs of indoor (closed) and outdoor (open) scenes were blurred such that they were difficult to categorize on their own but easily disambiguated by the inclusion of an object. Classifiers were trained to distinguish MEG response patterns to intact indoor and outdoor scenes, presented in an independent run, and tested on degraded scenes in the main experiment. Results revealed better decoding of scenes with objects than scenes alone and objects alone from 300 ms after stimulus onset. This effect was strongest over left posterior sensors. These findings show that the influence of objects on scene representations occurs at similar latencies as the influence of scenes on object representations, in line with a common predictive processing mechanism.
Collapse
Affiliation(s)
- Talia Brandman
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen 6525 GD, The Netherlands
| | - Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
21
|
Wencheng W, Ge Y, Zuo Z, Chen L, Qin X, Zuxiang L. Visual number sense for real-world scenes shared by deep neural networks and humans. Heliyon 2023; 9:e18517. [PMID: 37560656 PMCID: PMC10407052 DOI: 10.1016/j.heliyon.2023.e18517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023] Open
Abstract
Recently, visual number sense has been identified from deep neural networks (DNNs). However, whether DNNs have the same capacity for real-world scenes, rather than the simple geometric figures that are often tested, is unclear. In this study, we explore the number perception of scenes using AlexNet and find that numerosity can be represented by the pattern of group activation of the category layer units. The global activation of these units increases with the number of objects in the scene, and the variations in their activation decrease accordingly. By decoding the numerosity from this pattern, we reveal that the embedding coefficient of a scene determines the likelihood of potential objects to contribute to numerical perception. This was demonstrated by the more optimized performance for pictures with relatively high embedding coefficients in both DNNs and humans. This study for the first time shows that a distinct feature in visual environments, revealed by DNNs, can modulate human perception, supported by a group-coding mechanism.
Collapse
Affiliation(s)
- Wu Wencheng
- AHU-IAI AI Joint Laboratory, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Yingxi Ge
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Zhentao Zuo
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Lin Chen
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Xu Qin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Hefei, 230601, China
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei, 230601, China
- School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Liu Zuxiang
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
22
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
23
|
Wu YH, Podvalny E, He BJ. Spatiotemporal neural dynamics of object recognition under uncertainty in humans. eLife 2023; 12:e84797. [PMID: 37184213 PMCID: PMC10231926 DOI: 10.7554/elife.84797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
While there is a wealth of knowledge about core object recognition-our ability to recognize clear, high-contrast object images-how the brain accomplishes object recognition tasks under increased uncertainty remains poorly understood. We investigated the spatiotemporal neural dynamics underlying object recognition under increased uncertainty by combining MEG and 7 Tesla (7T) fMRI in humans during a threshold-level object recognition task. We observed an early, parallel rise of recognition-related signals across ventral visual and frontoparietal regions that preceded the emergence of category-related information. Recognition-related signals in ventral visual regions were best explained by a two-state representational format whereby brain activity bifurcated for recognized and unrecognized images. By contrast, recognition-related signals in frontoparietal regions exhibited a reduced representational space for recognized images, yet with sharper category information. These results provide a spatiotemporally resolved view of neural activity supporting object recognition under uncertainty, revealing a pattern distinct from that underlying core object recognition.
Collapse
Affiliation(s)
- Yuan-hao Wu
- Neuroscience Institute, New York University Grossman School of MedicineNew YorkUnited States
| | - Ella Podvalny
- Neuroscience Institute, New York University Grossman School of MedicineNew YorkUnited States
| | - Biyu J He
- Neuroscience Institute, New York University Grossman School of MedicineNew YorkUnited States
- Department of Neurology, New York University Grossman School of MedicineNew YorkUnited States
- Department of Neuroscience & Physiology, New York University Grossman School of MedicineNew YorkUnited States
- Department of Radiology, New York University Grossman School of MedicineNew YorkUnited States
| |
Collapse
|
24
|
Effects of Natural Scene Inversion on Visual-evoked Brain Potentials and Pupillary Responses: A Matter of Effortful Processing of Unfamiliar Configurations. Neuroscience 2023; 509:201-209. [PMID: 36462569 DOI: 10.1016/j.neuroscience.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/17/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022]
Abstract
The inversion of a picture of a face hampers the accuracy and speed at which observers can perceptually process it. Event-related potentials and pupillary responses, successfully used as biomarkers of face inversion in the past, suggest that the perception of visual features, that are organized in an unfamiliar manner, recruits demanding additional processes. However, it remains unclear whether such inversion effects generalize beyond face stimuli and whether indeed more mental effort is needed to process inverted images. Here we aimed to study the effects of natural scene inversion on visual evoked potentials and pupil dilations. We simultaneously measured responses of 47 human participants to presentations of images showing upright or inverted natural scenes. For inverted scenes, we observed relatively stronger occipito-temporo-parietal N1 peak amplitudes and larger pupil dilations (on top of an initial orienting response) than for upright scenes. This study revealed neural and physiological markers of natural scene inversion that are in line with inversion effects of other stimulus types and demonstrates the robustness and generalizability of the phenomenon that unfamiliar configurations of visual content require increased processing effort.
Collapse
|
25
|
Wingfield C, Zhang C, Devereux B, Fonteneau E, Thwaites A, Liu X, Woodland P, Marslen-Wilson W, Su L. On the similarities of representations in artificial and brain neural networks for speech recognition. Front Comput Neurosci 2022; 16:1057439. [PMID: 36618270 PMCID: PMC9811675 DOI: 10.3389/fncom.2022.1057439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction In recent years, machines powered by deep learning have achieved near-human levels of performance in speech recognition. The fields of artificial intelligence and cognitive neuroscience have finally reached a similar level of performance, despite their huge differences in implementation, and so deep learning models can-in principle-serve as candidates for mechanistic models of the human auditory system. Methods Utilizing high-performance automatic speech recognition systems, and advanced non-invasive human neuroimaging technology such as magnetoencephalography and multivariate pattern-information analysis, the current study aimed to relate machine-learned representations of speech to recorded human brain representations of the same speech. Results In one direction, we found a quasi-hierarchical functional organization in human auditory cortex qualitatively matched with the hidden layers of deep artificial neural networks trained as part of an automatic speech recognizer. In the reverse direction, we modified the hidden layer organization of the artificial neural network based on neural activation patterns in human brains. The result was a substantial improvement in word recognition accuracy and learned speech representations. Discussion We have demonstrated that artificial and brain neural networks can be mutually informative in the domain of speech recognition.
Collapse
Affiliation(s)
- Cai Wingfield
- Department of Psychology, Lancaster University, Lancaster, United Kingdom
| | - Chao Zhang
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Barry Devereux
- School of Electronics, Electrical Engineering and Computer Science, Queens University Belfast, Belfast, United Kingdom
| | - Elisabeth Fonteneau
- Department of Psychology, University Paul Valéry Montpellier, Montpellier, France
| | - Andrew Thwaites
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Xunying Liu
- Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Phil Woodland
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | | | - Li Su
- Department of Neuroscience, Neuroscience Institute, Insigneo Institute for in silico Medicine, University of Sheffield, Sheffield, United Kingdom,Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Li Su
| |
Collapse
|
26
|
Kaiser D. Spectral brain signatures of aesthetic natural perception in the α and β frequency bands. J Neurophysiol 2022; 128:1501-1505. [PMID: 36259673 DOI: 10.1152/jn.00385.2022] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
During our everyday lives, visual beauty is often conveyed by sustained and dynamic visual stimulation, such as when we walk through an enchanting forest or watch our pets playing. Here, I devised an MEG experiment that mimics such situations: participants viewed 8 s videos of everyday situations and rated their beauty. Using multivariate analysis, I linked aesthetic ratings to 1) sustained MEG broadband responses and 2) spectral MEG responses in the α and β frequency bands. These effects were not accounted for by a set of high- and low-level visual descriptors of the videos, suggesting that they are genuinely related to aesthetic perception. My findings provide the first characterization of spectral brain signatures linked to aesthetic experiences in the real world.NEW & NOTEWORTHY In the real world, aesthetic experiences arise from complex and dynamic inputs. This study shows that such aesthetic experiences are represented in a spectral neural code: cortical α and β activity track our judgments of the aesthetic appearance of natural videos, providing a new starting point for studying the neural correlates of beauty through rhythmic brain activity.
Collapse
Affiliation(s)
- Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-University, Gießen, Germany.,Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus-Liebig-University Gießen, Germany
| |
Collapse
|
27
|
Kaiser D. Characterizing Dynamic Neural Representations of Scene Attractiveness. J Cogn Neurosci 2022; 34:1988-1997. [PMID: 35802607 DOI: 10.1162/jocn_a_01891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Aesthetic experiences during natural vision are varied: They can arise from viewing scenic landscapes, interesting architecture, or attractive people. Recent research in the field of neuroaesthetics has taught us a lot about where in the brain such aesthetic experiences are represented. Much less is known about when such experiences arise during the cortical processing cascade. Particularly, the dynamic neural representation of perceived attractiveness for rich natural scenes is not well understood. Here, I present data from an EEG experiment, in which participants provided attractiveness judgments for a set of diverse natural scenes. Using multivariate pattern analysis, I demonstrate that scene attractiveness is mirrored in early brain signals that arise within 200 msec of vision, suggesting that the aesthetic appeal of scenes is first resolved during perceptual processing. In more detailed analyses, I show that even such early neural correlates of scene attractiveness are partly related to interindividual variation in aesthetic preferences and that they generalize across scene contents. Together, these results characterize the time-resolved neural dynamics that give rise to aesthetic experiences in complex natural environments.
Collapse
|
28
|
Yang T, Yu X, Ma N, Zhang Y, Li H. Deep representation-based transfer learning for deep neural networks. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Li Y, Zhang M, Liu S, Luo W. EEG decoding of multidimensional information from emotional faces. Neuroimage 2022; 258:119374. [PMID: 35700944 DOI: 10.1016/j.neuroimage.2022.119374] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 06/03/2022] [Accepted: 06/10/2022] [Indexed: 10/18/2022] Open
Abstract
Humans can detect and recognize faces quickly, but there has been little research on the temporal dynamics of the different dimensional face information that is extracted. The present study aimed to investigate the time course of neural responses to the representation of different dimensional face information, such as age, gender, emotion, and identity. We used support vector machine decoding to obtain representational dissimilarity matrices of event-related potential responses to different faces for each subject over time. In addition, we performed representational similarity analysis with the model representational dissimilarity matrices that contained different dimensional face information. Three significant findings were observed. First, the extraction process of facial emotion occurred before that of facial identity and lasted for a long time, which was specific to the right frontal region. Second, arousal was preferentially extracted before valence during the processing of facial emotional information. Third, different dimensional face information exhibited representational stability during different periods. In conclusion, these findings reveal the precise temporal dynamics of multidimensional information processing in faces and provide powerful support for computational models on emotional face perception.
Collapse
Affiliation(s)
- Yiwen Li
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Province, Dalian 116029, China
| | - Mingming Zhang
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Province, Dalian 116029, China
| | - Shuaicheng Liu
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Province, Dalian 116029, China
| | - Wenbo Luo
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Province, Dalian 116029, China.
| |
Collapse
|
30
|
Beach SD, Ozernov-Palchik O, May SC, Centanni TM, Perrachione TK, Pantazis D, Gabrieli JDE. The Neural Representation of a Repeated Standard Stimulus in Dyslexia. Front Hum Neurosci 2022; 16:823627. [PMID: 35634200 PMCID: PMC9133793 DOI: 10.3389/fnhum.2022.823627] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 04/19/2022] [Indexed: 11/13/2022] Open
Abstract
The neural representation of a repeated stimulus is the standard against which a deviant stimulus is measured in the brain, giving rise to the well-known mismatch response. It has been suggested that individuals with dyslexia have poor implicit memory for recently repeated stimuli, such as the train of standards in an oddball paradigm. Here, we examined how the neural representation of a standard emerges over repetitions, asking whether there is less sensitivity to repetition and/or less accrual of "standardness" over successive repetitions in dyslexia. We recorded magnetoencephalography (MEG) as adults with and without dyslexia were passively exposed to speech syllables in a roving-oddball design. We performed time-resolved multivariate decoding of the MEG sensor data to identify the neural signature of standard vs. deviant trials, independent of stimulus differences. This "multivariate mismatch" was equally robust and had a similar time course in the two groups. In both groups, standards generated by as few as two repetitions were distinct from deviants, indicating normal sensitivity to repetition in dyslexia. However, only in the control group did standards become increasingly different from deviants with repetition. These results suggest that many of the mechanisms that give rise to neural adaptation as well as mismatch responses are intact in dyslexia, with the possible exception of a putatively predictive mechanism that successively integrates recent sensory information into feedforward processing.
Collapse
Affiliation(s)
- Sara D. Beach
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Ola Ozernov-Palchik
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Sidney C. May
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Tracy M. Centanni
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Tyler K. Perrachione
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, MA, United States
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| |
Collapse
|
31
|
Hermann KL, Singh SR, Rosenthal IA, Pantazis D, Conway BR. Temporal dynamics of the neural representation of hue and luminance polarity. Nat Commun 2022; 13:661. [PMID: 35115511 PMCID: PMC8814185 DOI: 10.1038/s41467-022-28249-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 01/12/2022] [Indexed: 11/09/2022] Open
Abstract
Hue and luminance contrast are basic visual features. Here we use multivariate analyses of magnetoencephalography data to investigate the timing of the neural computations that extract them, and whether they depend on common neural circuits. We show that hue and luminance-contrast polarity can be decoded from MEG data and, with lower accuracy, both features can be decoded across changes in the other feature. These results are consistent with the existence of both common and separable neural mechanisms. The decoding time course is earlier and more temporally precise for luminance polarity than hue, a result that does not depend on task, suggesting that luminance contrast is an updating signal that separates visual events. Meanwhile, cross-temporal generalization is slightly greater for representations of hue compared to luminance polarity, providing a neural correlate of the preeminence of hue in perceptual grouping and memory. Finally, decoding of luminance polarity varies depending on the hues used to obtain training and testing data. The pattern of results is consistent with observations that luminance contrast is mediated by both L-M and S cone sub-cortical mechanisms.
Collapse
Affiliation(s)
- Katherine L Hermann
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD, 20892, USA
- Department of Psychology, Stanford University, Stanford, CA, 94305, USA
| | - Shridhar R Singh
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD, 20892, USA
| | - Isabelle A Rosenthal
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD, 20892, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bevil R Conway
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD, 20892, USA.
- National Institute of Mental Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
32
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
33
|
Harel A, Nador JD, Bonner MF, Epstein RA. Early Electrophysiological Markers of Navigational Affordances in Scenes. J Cogn Neurosci 2021; 34:397-410. [PMID: 35015877 DOI: 10.1162/jocn_a_01810] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Scene perception and spatial navigation are interdependent cognitive functions, and there is increasing evidence that cortical areas that process perceptual scene properties also carry information about the potential for navigation in the environment (navigational affordances). However, the temporal stages by which visual information is transformed into navigationally relevant information are not yet known. We hypothesized that navigational affordances are encoded during perceptual processing and therefore should modulate early visually evoked ERPs, especially the scene-selective P2 component. To test this idea, we recorded ERPs from participants while they passively viewed computer-generated room scenes matched in visual complexity. By simply changing the number of doors (no doors, 1 door, 2 doors, 3 doors), we were able to systematically vary the number of pathways that afford movement in the local environment, while keeping the overall size and shape of the environment constant. We found that rooms with no doors evoked a higher P2 response than rooms with three doors, consistent with prior research reporting higher P2 amplitude to closed relative to open scenes. Moreover, we found P2 amplitude scaled linearly with the number of doors in the scenes. Navigability effects on the ERP waveform were also observed in a multivariate analysis, which showed significant decoding of the number of doors and their location at earlier time windows. Together, our results suggest that navigational affordances are represented in the early stages of scene perception. This complements research showing that the occipital place area automatically encodes the structure of navigable space and strengthens the link between scene perception and navigation.
Collapse
|
34
|
Functional selectivity for social interaction perception in the human superior temporal sulcus during natural viewing. Neuroimage 2021; 245:118741. [PMID: 34800663 DOI: 10.1016/j.neuroimage.2021.118741] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 09/15/2021] [Accepted: 11/16/2021] [Indexed: 11/22/2022] Open
Abstract
Recognizing others' social interactions is a crucial human ability. Using simple stimuli, previous studies have shown that social interactions are selectively processed in the superior temporal sulcus (STS), but prior work with movies has suggested that social interactions are processed in the medial prefrontal cortex (mPFC), part of the theory of mind network. It remains unknown to what extent social interaction selectivity is observed in real world stimuli when controlling for other covarying perceptual and social information, such as faces, voices, and theory of mind. The current study utilizes a functional magnetic resonance imaging (fMRI) movie paradigm and advanced machine learning methods to uncover the brain mechanisms uniquely underlying naturalistic social interaction perception. We analyzed two publicly available fMRI datasets, collected while both male and female human participants (n = 17 and 18) watched two different commercial movies in the MRI scanner. By performing voxel-wise encoding and variance partitioning analyses, we found that broad social-affective features predict neural responses in social brain regions, including the STS and mPFC. However, only the STS showed robust and unique selectivity specifically to social interactions, independent from other covarying features. This selectivity was observed across two separate fMRI datasets. These findings suggest that naturalistic social interaction perception recruits dedicated neural circuity in the STS, separate from the theory of mind network, and is a critical dimension of human social understanding.
Collapse
|
35
|
Abstract
During natural vision, our brains are constantly exposed to complex, but regularly structured environments. Real-world scenes are defined by typical part-whole relationships, where the meaning of the whole scene emerges from configurations of localized information present in individual parts of the scene. Such typical part-whole relationships suggest that information from individual scene parts is not processed independently, but that there are mutual influences between the parts and the whole during scene analysis. Here, we review recent research that used a straightforward, but effective approach to study such mutual influences: By dissecting scenes into multiple arbitrary pieces, these studies provide new insights into how the processing of whole scenes is shaped by their constituent parts and, conversely, how the processing of individual parts is determined by their role within the whole scene. We highlight three facets of this research: First, we discuss studies demonstrating that the spatial configuration of multiple scene parts has a profound impact on the neural processing of the whole scene. Second, we review work showing that cortical responses to individual scene parts are shaped by the context in which these parts typically appear within the environment. Third, we discuss studies demonstrating that missing scene parts are interpolated from the surrounding scene context. Bridging these findings, we argue that efficient scene processing relies on an active use of the scene's part-whole structure, where the visual brain matches scene inputs with internal models of what the world should look like.
Collapse
Affiliation(s)
- Daniel Kaiser
- Justus-Liebig-Universität Gießen, Germany.,Philipps-Universität Marburg, Germany.,University of York, United Kingdom
| | - Radoslaw M Cichy
- Freie Universität Berlin, Germany.,Humboldt-Universität zu Berlin, Germany.,Bernstein Centre for Computational Neuroscience Berlin, Germany
| |
Collapse
|
36
|
Sun ED, Dekel R. ImageNet-trained deep neural networks exhibit illusion-like response to the Scintillating grid. J Vis 2021; 21:15. [PMID: 34677575 PMCID: PMC8543405 DOI: 10.1167/jov.21.11.15] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Deep neural network (DNN) models for computer vision are capable of human-level object recognition. Consequently, similarities between DNN and human vision are of interest. Here, we characterize DNN representations of Scintillating grid visual illusion images in which white disks are perceived to be partially black. Specifically, we use VGG-19 and ResNet-101 DNN models that were trained for image classification and consider the representational dissimilarity (\(L^1\) distance in the penultimate layer) between pairs of images: one with white Scintillating grid disks and the other with disks of decreasing luminance levels. Results showed a nonmonotonic relation, such that decreasing disk luminance led to an increase and subsequently a decrease in representational dissimilarity. That is, the Scintillating grid image with white disks was closer, in terms of the representation, to images with black disks than images with gray disks. In control nonillusion images, such nonmonotonicity was rare. These results suggest that nonmonotonicity in a deep computational representation is a potential test for illusion-like response geometry in DNN models.
Collapse
Affiliation(s)
- Eric D Sun
- Mather House, Harvard University, Cambridge, MA, USA.,
| | - Ron Dekel
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, PA, Israel.,
| |
Collapse
|
37
|
Hansen BC, Greene MR, Field DJ. Dynamic Electrode-to-Image (DETI) mapping reveals the human brain's spatiotemporal code of visual information. PLoS Comput Biol 2021; 17:e1009456. [PMID: 34570753 PMCID: PMC8496831 DOI: 10.1371/journal.pcbi.1009456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 10/07/2021] [Accepted: 09/16/2021] [Indexed: 11/18/2022] Open
Abstract
A number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI’s poor temporal resolution. In this study, we leveraged the high temporal resolution of EEG to develop an encoding technique based on the distribution of responses generated by a population of real-world scenes. This approach maps neural signals to each pixel within a given image and reveals location-specific transformations of the visual code, providing a spatiotemporal signature for the image at each electrode. Our analyses of the mapping results revealed that scenes undergo a series of nonuniform transformations that prioritize different spatial frequencies at different regions of scenes over time. This mapping technique offers a potential avenue for future studies to explore how dynamic feedforward and recurrent processes inform and refine high-level representations of our visual world. The visual information that we sample from our environment undergoes a series of neural modifications, with each modification state (or visual code) consisting of a unique distribution of responses across neurons along the visual pathway. However, current noninvasive neuroimaging techniques provide an account of that code that is coarse with respect to time or space. Here, we present dynamic electrode-to-image (DETI) mapping, an analysis technique that capitalizes on the high temporal resolution of EEG to map neural signals to each pixel within a given image to reveal location-specific modifications of the visual code. The DETI technique reveals maps of features that are associated with the neural signal at each pixel and at each time point. DETI mapping shows that real-world scenes undergo a series of nonuniform modifications over both space and time. Specifically, we find that the visual code varies in a location-specific manner, likely reflecting that neural processing prioritizes different features at different image locations over time. DETI mapping therefore offers a potential avenue for future studies to explore how each modification state informs and refines the conceptual meaning of our visual world.
Collapse
Affiliation(s)
- Bruce C. Hansen
- Colgate University, Department of Psychological & Brain Sciences, Neuroscience Program, Hamilton New York, United States of America
- * E-mail:
| | - Michelle R. Greene
- Bates College, Neuroscience Program, Lewiston, Maine, United States of America
| | - David J. Field
- Cornell University, Department of Psychology, Ithaca, New York, United States of America
| |
Collapse
|
38
|
Lindsay GW. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J Cogn Neurosci 2021; 33:2017-2031. [DOI: 10.1162/jocn_a_01544] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Abstract
Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.
Collapse
|
39
|
Dwivedi K, Cichy RM, Roig G. Unraveling Representations in Scene-selective Brain Regions Using Scene-Parsing Deep Neural Networks. J Cogn Neurosci 2021; 33:2032-2043. [PMID: 32897121 PMCID: PMC7612022 DOI: 10.1162/jocn_a_01624] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
Visual scene perception is mediated by a set of cortical regions that respond preferentially to images of scenes, including the occipital place area (OPA) and parahippocampal place area (PPA). However, the differential contribution of OPA and PPA to scene perception remains an open research question. In this study, we take a deep neural network (DNN)-based computational approach to investigate the differences in OPA and PPA function. In a first step, we search for a computational model that predicts fMRI responses to scenes in OPA and PPA well. We find that DNNs trained to predict scene components (e.g., wall, ceiling, floor) explain higher variance uniquely in OPA and PPA than a DNN trained to predict scene category (e.g., bathroom, kitchen, office). This result is robust across several DNN architectures. On this basis, we then determine whether particular scene components predicted by DNNs differentially account for unique variance in OPA and PPA. We find that variance in OPA responses uniquely explained by the navigation-related floor component is higher compared to the variance explained by the wall and ceiling components. In contrast, PPA responses are better explained by the combination of wall and floor, that is, scene components that together contain the structure and texture of the scene. This differential sensitivity to scene components suggests differential functions of OPA and PPA in scene processing. Moreover, our results further highlight the potential of the proposed computational approach as a general tool in the investigation of the neural basis of human scene perception.
Collapse
Affiliation(s)
- Kshitij Dwivedi
- Department of Education and Psychology, Free Universität Berlin, Germany
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| | | | - Gemma Roig
- Department of Computer Science, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
40
|
Marcolin F, Vezzetti E, Monaci M. Face perception foundations for pattern recognition algorithms. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.074] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
41
|
Abstract
Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva & Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.
Collapse
Affiliation(s)
- Matt D Anderson
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - Erich W Graf
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - James H Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada.,
| | - Krista A Ehinger
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.,
| | - Wendy J Adams
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| |
Collapse
|
42
|
Al-Tahan H, Mohsenzadeh Y. Reconstructing feedback representations in the ventral visual pathway with a generative adversarial autoencoder. PLoS Comput Biol 2021; 17:e1008775. [PMID: 33760819 PMCID: PMC8059812 DOI: 10.1371/journal.pcbi.1008775] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 04/21/2021] [Accepted: 02/08/2021] [Indexed: 11/19/2022] Open
Abstract
While vision evokes a dense network of feedforward and feedback neural processes in the brain, visual processes are primarily modeled with feedforward hierarchical neural networks, leaving the computational role of feedback processes poorly understood. Here, we developed a generative autoencoder neural network model and adversarially trained it on a categorically diverse data set of images. We hypothesized that the feedback processes in the ventral visual pathway can be represented by reconstruction of the visual information performed by the generative model. We compared representational similarity of the activity patterns in the proposed model with temporal (magnetoencephalography) and spatial (functional magnetic resonance imaging) visual brain responses. The proposed generative model identified two segregated neural dynamics in the visual brain. A temporal hierarchy of processes transforming low level visual information into high level semantics in the feedforward sweep, and a temporally later dynamics of inverse processes reconstructing low level visual information from a high level latent representation in the feedback sweep. Our results append to previous studies on neural feedback processes by presenting a new insight into the algorithmic function and the information carried by the feedback processes in the ventral visual pathway.
Collapse
Affiliation(s)
- Haider Al-Tahan
- Department of Computer Science, The University of Western Ontario, London, Ontario, Canada
- Brain and Mind Institute, The University of Western Ontario, London, Ontario, Canada
| | - Yalda Mohsenzadeh
- Department of Computer Science, The University of Western Ontario, London, Ontario, Canada
- Brain and Mind Institute, The University of Western Ontario, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
43
|
Tidare J, Leon M, Astrand E. Time-resolved estimation of strength of motor imagery representation by multivariate EEG decoding. J Neural Eng 2021; 18. [PMID: 33264756 DOI: 10.1088/1741-2552/abd007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 12/02/2020] [Indexed: 11/11/2022]
Abstract
Objective. Multivariate decoding enables access to information encoded in multiple brain activity features with high temporal resolution. However, whether the strength, of which this information is represented in the brain, can be extracted across time within single trials remains largely unexplored.Approach.In this study, we addressed this question by applying a support vector machine (SVM) to extract motor imagery (MI) representations, from electroencephalogram (EEG) data, and by performing time-resolved single-trial analyses of the multivariate decoding. EEG was recorded from a group of healthy participants during MI of opening and closing of the same hand.Main results.Cross-temporal decoding revealed both dynamic and stationary MI-relevant features during the task. Specifically, features representing MI evolved dynamically early in the trial and later stabilized into a stationary network of MI features. Using a hierarchical genetic algorithm for selection of MI-relevant features, we identified primarily contralateral alpha and beta frequency features over the sensorimotor and parieto-occipital cortices as stationary which extended into a bilateral pattern in the later part of the trial. During the stationary encoding of MI, by extracting the SVM prediction scores, we analyzed MI-relevant EEG activity patterns with respect to the temporal dynamics within single trials. We show that the SVM prediction score correlates to the amplitude of univariate MI-relevant features (as documented from an extensive repertoire of previous MI studies) within single trials, strongly suggesting that these are functional variations of MI strength hidden in trial averages.Significance.Our work demonstrates a powerful approach for estimating MI strength continually within single trials, having far-reaching impact for single-trial analyses. In terms of MI neurofeedback for motor rehabilitation, these results set the ground for more refined neurofeedback reflecting the strength of MI that can be provided to patients continually in time.
Collapse
Affiliation(s)
- Jonatan Tidare
- School of Innovation, Design, and Engineering, Mälardalen University, Högskoleplan 1, 722 20, Västerås, Sweden
| | - Miguel Leon
- School of Innovation, Design, and Engineering, Mälardalen University, Högskoleplan 1, 722 20, Västerås, Sweden
| | - Elaine Astrand
- School of Innovation, Design, and Engineering, Mälardalen University, Högskoleplan 1, 722 20, Västerås, Sweden
| |
Collapse
|
44
|
Causal Evidence for a Double Dissociation between Object- and Scene-Selective Regions of Visual Cortex: A Preregistered TMS Replication Study. J Neurosci 2020; 41:751-756. [PMID: 33262244 DOI: 10.1523/jneurosci.2162-20.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/23/2020] [Accepted: 10/26/2020] [Indexed: 12/15/2022] Open
Abstract
Natural scenes are characterized by individual objects as well as by global scene properties such as spatial layout. Functional neuroimaging research has shown that this distinction between object and scene processing is one of the main organizing principles of human high-level visual cortex. For example, object-selective regions, including the lateral occipital complex (LOC), were shown to represent object content (but not scene layout), while scene-selective regions, including the occipital place area (OPA), were shown to represent scene layout (but not object content). Causal evidence for a double dissociation between LOC and OPA in representing objects and scenes is currently limited, however. One TMS experiment, conducted in a relatively small sample (N = 13), reported an interaction between LOC and OPA stimulation and object and scene recognition performance (Dilks et al., 2013). Here, we present a high-powered preregistered replication of this study (N = 72, including male and female human participants), using group-average fMRI coordinates to target LOC and OPA. Results revealed unambiguous evidence for a double dissociation between LOC and OPA: relative to vertex stimulation, TMS over LOC selectively impaired the recognition of objects, while TMS over OPA selectively impaired the recognition of scenes. Furthermore, we found that these effects were stable over time and consistent across individual objects and scenes. These results show that LOC and OPA can be reliably and selectively targeted with TMS, even when defined based on group-average fMRI coordinates. More generally, they support the distinction between object and scene processing as an organizing principle of human high-level visual cortex.SIGNIFICANCE STATEMENT Our daily-life environments are characterized both by individual objects and by global scene properties. The distinction between object and scene processing features prominently in visual cognitive neuroscience, with fMRI studies showing that this distinction is one of the main organizing principles of human high-level visual cortex. However, causal evidence for the selective involvement of object- and scene-selective regions in processing their preferred category is less conclusive. Here, testing a large sample (N = 72) using an established paradigm and a preregistered protocol, we found that TMS over object-selective cortex (lateral occipital complex) selectively impaired object recognition, while TMS over scene-selective cortex (occipital place area) selectively impaired scene recognition. These results provide strong causal evidence for the distinction between object and scene processing in human visual cortex.
Collapse
|
45
|
Vlcek K, Fajnerova I, Nekovarova T, Hejtmanek L, Janca R, Jezdik P, Kalina A, Tomasek M, Krsek P, Hammer J, Marusic P. Mapping the Scene and Object Processing Networks by Intracranial EEG. Front Hum Neurosci 2020; 14:561399. [PMID: 33192393 PMCID: PMC7581859 DOI: 10.3389/fnhum.2020.561399] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 09/02/2020] [Indexed: 11/13/2022] Open
Abstract
Human perception and cognition are based predominantly on visual information processing. Much of the information regarding neuronal correlates of visual processing has been derived from functional imaging studies, which have identified a variety of brain areas contributing to visual analysis, recognition, and processing of objects and scenes. However, only two of these areas, namely the parahippocampal place area (PPA) and the lateral occipital complex (LOC), were verified and further characterized by intracranial electroencephalogram (iEEG). iEEG is a unique measurement technique that samples a local neuronal population with high temporal and anatomical resolution. In the present study, we aimed to expand on previous reports and examine brain activity for selectivity of scenes and objects in the broadband high-gamma frequency range (50–150 Hz). We collected iEEG data from 27 epileptic patients while they watched a series of images, containing objects and scenes, and we identified 375 bipolar channels responding to at least one of these two categories. Using K-means clustering, we delineated their brain localization. In addition to the two areas described previously, we detected significant responses in two other scene-selective areas, not yet reported by any electrophysiological studies; namely the occipital place area (OPA) and the retrosplenial complex. Moreover, using iEEG we revealed a much broader network underlying visual processing than that described to date, using specialized functional imaging experimental designs. Here, we report the selective brain areas for scene processing include the posterior collateral sulcus and the anterior temporal region, which were already shown to be related to scene novelty and landmark naming. The object-selective responses appeared in the parietal, frontal, and temporal regions connected with tool use and object recognition. The temporal analyses specified the time course of the category selectivity through the dorsal and ventral visual streams. The receiver operating characteristic analyses identified the PPA and the fusiform portion of the LOC as being the most selective for scenes and objects, respectively. Our findings represent a valuable overview of visual processing selectivity for scenes and objects based on iEEG analyses and thus, contribute to a better understanding of visual processing in the human brain.
Collapse
Affiliation(s)
- Kamil Vlcek
- Department of Neurophysiology of Memory, Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Iveta Fajnerova
- Department of Neurophysiology of Memory, Institute of Physiology, Czech Academy of Sciences, Prague, Czechia.,National Institute of Mental Health, Prague, Czechia
| | - Tereza Nekovarova
- Department of Neurophysiology of Memory, Institute of Physiology, Czech Academy of Sciences, Prague, Czechia.,National Institute of Mental Health, Prague, Czechia
| | - Lukas Hejtmanek
- Department of Neurophysiology of Memory, Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Radek Janca
- Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czechia
| | - Petr Jezdik
- Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czechia
| | - Adam Kalina
- Department of Neurology, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czechia
| | - Martin Tomasek
- Department of Neurosurgery, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czechia
| | - Pavel Krsek
- Department of Paediatric Neurology, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czechia
| | - Jiri Hammer
- Department of Neurology, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czechia
| | - Petr Marusic
- Department of Neurology, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czechia
| |
Collapse
|
46
|
Zhang X, Yao L, Wang X, Monaghan JJM, Mcalpine D, Zhang Y. A survey on deep learning-based non-invasive brain signals: recent advances and new frontiers. J Neural Eng 2020; 18. [PMID: 33171452 DOI: 10.1088/1741-2552/abc902] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 11/10/2020] [Indexed: 12/25/2022]
Abstract
Brain signals refer to the biometric information collected from the human brain. The research on brain signals aims to discover the underlying neurological or physical status of the individuals by signal decoding. The emerging deep learning techniques have improved the study of brain signals significantly in recent years. In this work, we first present a taxonomy of non-invasive brain signals and the basics of deep learning algorithms. Then, we provide a comprehensive survey of the frontiers of applying deep learning for non-invasive brain signals analysis, by summarizing a large number of recent publications. Moreover, upon the deep learning-powered brain signal studies, we report the potential real-world applications which benefit not only disabled people but also normal individuals. Finally, we discuss the opening challenges and future directions.
Collapse
Affiliation(s)
- Xiang Zhang
- Harvard University, Cambridge, Massachusetts, UNITED STATES
| | - Lina Yao
- University of New South Wales, Sydney, New South Wales, AUSTRALIA
| | - Xianzhi Wang
- Faculty of Engineering and IT, University of Technology Sydney, 81 Broadway, Ultimo, Sydney, New South Wales, 2007, AUSTRALIA
| | | | - David Mcalpine
- Macquarie University, Sydney, New South Wales, AUSTRALIA
| | - Yu Zhang
- Stanford University, Stanford, California, 94305-6104, UNITED STATES
| |
Collapse
|
47
|
Multilayer perceptron based deep neural network for early detection of coronary heart disease. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00509-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
48
|
Děchtěrenko F, Lukavský J, Štipl J. False memories for scenes using the DRM paradigm. Vision Res 2020; 178:48-59. [PMID: 33113436 DOI: 10.1016/j.visres.2020.09.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 09/25/2020] [Accepted: 09/29/2020] [Indexed: 11/24/2022]
Abstract
People are remarkably good at remembering photographs. To further investigate the nature of the stored representations and the fidelity of human memories, it would be useful to evaluate the visual similarity of stimuli presented in experiments. Here, we explored the possible use of convolutional neural networks (CNN) as a measure of perceptual or representational similarity of visual scenes with respect to visual memory research. In Experiment 1, we presented participants with sets of nine images from the same scene category and tested whether they were able to detect the most distant scene in the image space defined by CNN. Experiment 2 was a visual variant of the Deese-Roediger-McDermott paradigm. We asked participants to remember a set of photographs from the same scene category. The photographs were preselected based on their distance to a particular visual prototype (defined as centroid of the image space). In the recognition test, we observed higher false alarm rates for scenes closer to this visual prototype. Our findings show that the similarity measured by CNN is reflected in human behavior: people can detect odd-one-out scenes or be lured to false alarms with similar stimuli. This method can be used for further studies regarding visual memory for complex scenes.
Collapse
Affiliation(s)
- Filip Děchtěrenko
- Institute of Psychology, Czech Academy of Sciences, Hybernská 8, 110 00 Prague, Czech Republic; Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic.
| | - Jiří Lukavský
- Institute of Psychology, Czech Academy of Sciences, Hybernská 8, 110 00 Prague, Czech Republic; Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic
| | - Jiří Štipl
- Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic
| |
Collapse
|
49
|
Cai J, Feng J, Wang J, Zhao Y. Quasi-synchronization of neural networks with diffusion effects via intermittent control of regional division. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
50
|
Kaiser D, Inciuraite G, Cichy RM. Rapid contextualization of fragmented scene information in the human visual system. Neuroimage 2020; 219:117045. [PMID: 32540354 DOI: 10.1016/j.neuroimage.2020.117045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 04/24/2020] [Accepted: 06/09/2020] [Indexed: 10/24/2022] Open
Abstract
Real-world environments are extremely rich in visual information. At any given moment in time, only a fraction of this information is available to the eyes and the brain, rendering naturalistic vision a collection of incomplete snapshots. Previous research suggests that in order to successfully contextualize this fragmented information, the visual system sorts inputs according to spatial schemata, that is knowledge about the typical composition of the visual world. Here, we used a large set of 840 different natural scene fragments to investigate whether this sorting mechanism can operate across the diverse visual environments encountered during real-world vision. We recorded brain activity using electroencephalography (EEG) while participants viewed incomplete scene fragments at fixation. Using representational similarity analysis on the EEG data, we tracked the fragments' cortical representations across time. We found that the fragments' typical vertical location within the environment (top or bottom) predicted their cortical representations, indexing a sorting of information according to spatial schemata. The fragments' cortical representations were most strongly organized by their vertical location at around 200 ms after image onset, suggesting rapid perceptual sorting of information according to spatial schemata. In control analyses, we show that this sorting is flexible with respect to visual features: it is neither explained by commonalities between visually similar indoor and outdoor scenes, nor by the feature organization emerging from a deep neural network trained on scene categorization. Demonstrating such a flexible sorting across a wide range of visually diverse scenes suggests a contextualization mechanism suitable for complex and variable real-world environments.
Collapse
Affiliation(s)
- Daniel Kaiser
- Department of Psychology, University of York, York, UK.
| | - Gabriele Inciuraite
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| |
Collapse
|