1
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
2
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.16.537079. [PMID: 37163104 PMCID: PMC10168209 DOI: 10.1101/2023.04.16.537079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that (1) in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and (2) lesioning these neurons by setting their output to 0 or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
3
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
4
|
von Seth J, Nicholls VI, Tyler LK, Clarke A. Recurrent connectivity supports higher-level visual and semantic object representations in the brain. Commun Biol 2023; 6:1207. [PMID: 38012301 PMCID: PMC10682037 DOI: 10.1038/s42003-023-05565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Visual object recognition has been traditionally conceptualised as a predominantly feedforward process through the ventral visual pathway. While feedforward artificial neural networks (ANNs) can achieve human-level classification on some image-labelling tasks, it's unclear whether computational models of vision alone can accurately capture the evolving spatiotemporal neural dynamics. Here, we probe these dynamics using a combination of representational similarity and connectivity analyses of fMRI and MEG data recorded during the recognition of familiar, unambiguous objects. Modelling the visual and semantic properties of our stimuli using an artificial neural network as well as a semantic feature model, we find that unique aspects of the neural architecture and connectivity dynamics relate to visual and semantic object properties. Critically, we show that recurrent processing between the anterior and posterior ventral temporal cortex relates to higher-level visual properties prior to semantic object properties, in addition to semantic-related feedback from the frontal lobe to the ventral temporal lobe between 250 and 500 ms after stimulus onset. These results demonstrate the distinct contributions made by semantic object properties in explaining neural activity and connectivity, highlighting it as a core part of object recognition not fully accounted for by current biologically inspired neural networks.
Collapse
Affiliation(s)
- Jacqueline von Seth
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Lorraine K Tyler
- Department of Psychology, University of Cambridge, Cambridge, UK
- Cambridge Centre for Ageing and Neuroscience (Cam-CAN), University of Cambridge and MRC Cognition and Brain Sciences Unit, Cambridge, UK
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
5
|
Karapetian A, Boyanova A, Pandaram M, Obermayer K, Kietzmann TC, Cichy RM. Empirically Identifying and Computationally Modeling the Brain-Behavior Relationship for Human Scene Categorization. J Cogn Neurosci 2023; 35:1879-1897. [PMID: 37590093 PMCID: PMC10586810 DOI: 10.1162/jocn_a_02043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Humans effortlessly make quick and accurate perceptual decisions about the nature of their immediate visual environment, such as the category of the scene they face. Previous research has revealed a rich set of cortical representations potentially underlying this feat. However, it remains unknown which of these representations are suitably formatted for decision-making. Here, we approached this question empirically and computationally, using neuroimaging and computational modeling. For the empirical part, we collected EEG data and RTs from human participants during a scene categorization task (natural vs. man-made). We then related EEG data to behavior to behavior using a multivariate extension of signal detection theory. We observed a correlation between neural data and behavior specifically between ∼100 msec and ∼200 msec after stimulus onset, suggesting that the neural scene representations in this time period are suitably formatted for decision-making. For the computational part, we evaluated a recurrent convolutional neural network (RCNN) as a model of brain and behavior. Unifying our previous observations in an image-computable model, the RCNN predicted well the neural representations, the behavioral scene categorization data, as well as the relationship between them. Our results identify and computationally characterize the neural and behavioral correlates of scene categorization in humans.
Collapse
Affiliation(s)
- Agnessa Karapetian
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | | | | | - Klaus Obermayer
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Technische Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| | | | - Radoslaw M Cichy
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| |
Collapse
|
6
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
7
|
Dynamic speaker localization based on a novel lightweight R–CNN model. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08251-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
8
|
Gifford AT, Dwivedi K, Roig G, Cichy RM. A large and rich EEG dataset for modeling human visual object recognition. Neuroimage 2022; 264:119754. [PMID: 36400378 PMCID: PMC9771828 DOI: 10.1016/j.neuroimage.2022.119754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 09/14/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
The human brain achieves visual object recognition through multiple stages of linear and nonlinear transformations operating at a millisecond scale. To predict and explain these rapid transformations, computational neuroscientists employ machine learning modeling techniques. However, state-of-the-art models require massive amounts of data to properly train, and to the present day there is a lack of vast brain datasets which extensively sample the temporal dynamics of visual object recognition. Here we collected a large and rich dataset of high temporal resolution EEG responses to images of objects on a natural background. This dataset includes 10 participants, each with 82,160 trials spanning 16,740 image conditions. Through computational modeling we established the quality of this dataset in five ways. First, we trained linearizing encoding models that successfully synthesized the EEG responses to arbitrary images. Second, we correctly identified the recorded EEG data image conditions in a zero-shot fashion, using EEG synthesized responses to hundreds of thousands of candidate image conditions. Third, we show that both the high number of conditions as well as the trial repetitions of the EEG dataset contribute to the trained models' prediction accuracy. Fourth, we built encoding models whose predictions well generalize to novel participants. Fifth, we demonstrate full end-to-end training of randomly initialized DNNs that output EEG responses for arbitrary input images. We release this dataset as a tool to foster research in visual neuroscience and computer vision.
Collapse
Affiliation(s)
- Alessandro T. Gifford
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany,Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany,Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany,Corresponding author.
| | - Kshitij Dwivedi
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Gemma Roig
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Radoslaw M. Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany,Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany,Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
9
|
Xu H, Liu M, Zhang D. How does the brain represent the semantic content of an image? Neural Netw 2022; 154:31-42. [DOI: 10.1016/j.neunet.2022.06.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 04/13/2022] [Accepted: 06/28/2022] [Indexed: 11/24/2022]
|
10
|
Armeni K, Güçlü U, van Gerven M, Schoffelen JM. A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension. Sci Data 2022; 9:278. [PMID: 35676293 PMCID: PMC9177538 DOI: 10.1038/s41597-022-01382-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Recently, cognitive neuroscientists have increasingly studied the brain responses to narratives. At the same time, we are witnessing exciting developments in natural language processing where large-scale neural network models can be used to instantiate cognitive hypotheses in narrative processing. Yet, they learn from text alone and we lack ways of incorporating biological constraints during training. To mitigate this gap, we provide a narrative comprehension magnetoencephalography (MEG) data resource that can be used to train neural network models directly on brain data. We recorded from 3 participants, 10 separate recording hour-long sessions each, while they listened to audiobooks in English. After story listening, participants answered short questions about their experience. To minimize head movement, the participants wore MEG-compatible head casts, which immobilized their head position during recording. We report a basic evoked-response analysis showing that the responses accurately localize to primary auditory areas. The responses are robust and conserved across 10 sessions for every participant. We also provide usage notes and briefly outline possible future uses of the resource.
Collapse
Affiliation(s)
- Kristijan Armeni
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Jan-Mathijs Schoffelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| |
Collapse
|
11
|
Karimi-Rouzbahani H, Woolgar A. When the Whole Is Less Than the Sum of Its Parts: Maximum Object Category Information and Behavioral Prediction in Multiscale Activation Patterns. Front Neurosci 2022; 16:825746. [PMID: 35310090 PMCID: PMC8924472 DOI: 10.3389/fnins.2022.825746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/24/2022] [Indexed: 11/19/2022] Open
Abstract
Neural codes are reflected in complex neural activation patterns. Conventional electroencephalography (EEG) decoding analyses summarize activations by averaging/down-sampling signals within the analysis window. This diminishes informative fine-grained patterns. While previous studies have proposed distinct statistical features capable of capturing variability-dependent neural codes, it has been suggested that the brain could use a combination of encoding protocols not reflected in any one mathematical feature alone. To check, we combined 30 features using state-of-the-art supervised and unsupervised feature selection procedures (n = 17). Across three datasets, we compared decoding of visual object category between these 17 sets of combined features, and between combined and individual features. Object category could be robustly decoded using the combined features from all of the 17 algorithms. However, the combination of features, which were equalized in dimension to the individual features, were outperformed across most of the time points by the multiscale feature of Wavelet coefficients. Moreover, the Wavelet coefficients also explained the behavioral performance more accurately than the combined features. These results suggest that a single but multiscale encoding protocol may capture the EEG neural codes better than any combination of protocols. Our findings put new constraints on the models of neural information encoding in EEG.
Collapse
Affiliation(s)
- Hamid Karimi-Rouzbahani
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Department of Cognitive Science, Perception in Action Research Centre, Macquarie University, Sydney, NSW, Australia
- Department of Computing, Macquarie University, Sydney, NSW, Australia
| | - Alexandra Woolgar
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Department of Cognitive Science, Perception in Action Research Centre, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
12
|
|
13
|
Kong NCL, Margalit E, Gardner JL, Norcia AM. Increasing neural network robustness improves match to macaque V1 eigenspectrum, spatial frequency preference and predictivity. PLoS Comput Biol 2022; 18:e1009739. [PMID: 34995280 PMCID: PMC8775238 DOI: 10.1371/journal.pcbi.1009739] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 01/20/2022] [Accepted: 01/04/2022] [Indexed: 11/18/2022] Open
Abstract
Task-optimized convolutional neural networks (CNNs) show striking similarities to the ventral visual stream. However, human-imperceptible image perturbations can cause a CNN to make incorrect predictions. Here we provide insight into this brittleness by investigating the representations of models that are either robust or not robust to image perturbations. Theory suggests that the robustness of a system to these perturbations could be related to the power law exponent of the eigenspectrum of its set of neural responses, where power law exponents closer to and larger than one would indicate a system that is less susceptible to input perturbations. We show that neural responses in mouse and macaque primary visual cortex (V1) obey the predictions of this theory, where their eigenspectra have power law exponents of at least one. We also find that the eigenspectra of model representations decay slowly relative to those observed in neurophysiology and that robust models have eigenspectra that decay slightly faster and have higher power law exponents than those of non-robust models. The slow decay of the eigenspectra suggests that substantial variance in the model responses is related to the encoding of fine stimulus features. We therefore investigated the spatial frequency tuning of artificial neurons and found that a large proportion of them preferred high spatial frequencies and that robust models had preferred spatial frequency distributions more aligned with the measured spatial frequency distribution of macaque V1 cells. Furthermore, robust models were quantitatively better models of V1 than non-robust models. Our results are consistent with other findings that there is a misalignment between human and machine perception. They also suggest that it may be useful to penalize slow-decaying eigenspectra or to bias models to extract features of lower spatial frequencies during task-optimization in order to improve robustness and V1 neural response predictivity.
Collapse
Affiliation(s)
- Nathan C. L. Kong
- Department of Psychology, Stanford University, Stanford, California, United States of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
| | - Eshed Margalit
- Neurosciences Graduate Program, Stanford University, Stanford, California, United States of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
| | - Justin L. Gardner
- Department of Psychology, Stanford University, Stanford, California, United States of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
| | - Anthony M. Norcia
- Department of Psychology, Stanford University, Stanford, California, United States of America
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
| |
Collapse
|
14
|
Ribeiro FL, Bollmann S, Puckett AM. Predicting the retinotopic organization of human visual cortex from anatomy using geometric deep learning. Neuroimage 2021; 244:118624. [PMID: 34607019 DOI: 10.1016/j.neuroimage.2021.118624] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 09/13/2021] [Accepted: 09/27/2021] [Indexed: 10/20/2022] Open
Abstract
Whether it be in a single neuron or a more complex biological system like the human brain, form and function are often directly related. The functional organization of human visual cortex, for instance, is tightly coupled with the underlying anatomy with cortical shape having been shown to be a useful predictor of the retinotopic organization in early visual cortex. Although the current state-of-the-art in predicting retinotopic maps is able to account for gross individual differences, such models are unable to account for any idiosyncratic differences in the structure-function relationship from anatomical information alone due to their initial assumption of a template. Here we developed a geometric deep learning model capable of exploiting the actual structure of the cortex to learn the complex relationship between brain function and anatomy in human visual cortex such that more realistic and idiosyncratic maps could be predicted. We show that our neural network was not only able to predict the functional organization throughout the visual cortical hierarchy, but that it was also able to predict nuanced variations across individuals. Although we demonstrate its utility for modeling the relationship between structure and function in human visual cortex, our approach is flexible and well-suited for a range of other applications involving data structured in non-Euclidean spaces.
Collapse
Affiliation(s)
- Fernanda L Ribeiro
- School of Psychology, The University of Queensland, Saint Lucia, Brisbane, QLD 4072, Australia; Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.
| | - Steffen Bollmann
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Alexander M Puckett
- School of Psychology, The University of Queensland, Saint Lucia, Brisbane, QLD 4072, Australia; Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
15
|
Hansen BC, Greene MR, Field DJ. Dynamic Electrode-to-Image (DETI) mapping reveals the human brain's spatiotemporal code of visual information. PLoS Comput Biol 2021; 17:e1009456. [PMID: 34570753 PMCID: PMC8496831 DOI: 10.1371/journal.pcbi.1009456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 10/07/2021] [Accepted: 09/16/2021] [Indexed: 11/18/2022] Open
Abstract
A number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI’s poor temporal resolution. In this study, we leveraged the high temporal resolution of EEG to develop an encoding technique based on the distribution of responses generated by a population of real-world scenes. This approach maps neural signals to each pixel within a given image and reveals location-specific transformations of the visual code, providing a spatiotemporal signature for the image at each electrode. Our analyses of the mapping results revealed that scenes undergo a series of nonuniform transformations that prioritize different spatial frequencies at different regions of scenes over time. This mapping technique offers a potential avenue for future studies to explore how dynamic feedforward and recurrent processes inform and refine high-level representations of our visual world. The visual information that we sample from our environment undergoes a series of neural modifications, with each modification state (or visual code) consisting of a unique distribution of responses across neurons along the visual pathway. However, current noninvasive neuroimaging techniques provide an account of that code that is coarse with respect to time or space. Here, we present dynamic electrode-to-image (DETI) mapping, an analysis technique that capitalizes on the high temporal resolution of EEG to map neural signals to each pixel within a given image to reveal location-specific modifications of the visual code. The DETI technique reveals maps of features that are associated with the neural signal at each pixel and at each time point. DETI mapping shows that real-world scenes undergo a series of nonuniform modifications over both space and time. Specifically, we find that the visual code varies in a location-specific manner, likely reflecting that neural processing prioritizes different features at different image locations over time. DETI mapping therefore offers a potential avenue for future studies to explore how each modification state informs and refines the conceptual meaning of our visual world.
Collapse
Affiliation(s)
- Bruce C. Hansen
- Colgate University, Department of Psychological & Brain Sciences, Neuroscience Program, Hamilton New York, United States of America
- * E-mail:
| | - Michelle R. Greene
- Bates College, Neuroscience Program, Lewiston, Maine, United States of America
| | - David J. Field
- Cornell University, Department of Psychology, Ithaca, New York, United States of America
| |
Collapse
|
16
|
Shi R, Zhao Y, Cao Z, Liu C, Kang Y, Zhang J. Categorizing objects from MEG signals using EEGNet. Cogn Neurodyn 2021; 16:365-377. [PMID: 35401863 PMCID: PMC8934895 DOI: 10.1007/s11571-021-09717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/09/2021] [Accepted: 09/02/2021] [Indexed: 11/25/2022] Open
Abstract
Magnetoencephalography (MEG) signals have demonstrated their practical application to reading human minds. Current neural decoding studies have made great progress to build subject-wise decoding models to extract and discriminate the temporal/spatial features in neural signals. In this paper, we used a compact convolutional neural network-EEGNet-to build a common decoder across subjects, which deciphered the categories of objects (faces, tools, animals, and scenes) from MEG data. This study investigated the influence of the spatiotemporal structure of MEG on EEGNet's classification performance. Furthermore, the EEGNet replaced its convolution layers with two sets of parallel convolution structures to extract the spatial and temporal features simultaneously. Our results showed that the organization of MEG data fed into the EEGNet has an effect on EEGNet classification accuracy, and the parallel convolution structures in EEGNet are beneficial to extracting and fusing spatial and temporal MEG features. The classification accuracy demonstrated that the EEGNet succeeds in building the common decoder model across subjects, and outperforms several state-of-the-art feature fusing methods.
Collapse
Affiliation(s)
- Ran Shi
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Yanyu Zhao
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Zhiyuan Cao
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Chunyu Liu
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Yi Kang
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Jiacai Zhang
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
- Engineering Research Center of Intelligent Technology and Educational Application, Ministry of Education, Beijing, 100875, China
| |
Collapse
|
17
|
Lindsay GW. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J Cogn Neurosci 2021; 33:2017-2031. [DOI: 10.1162/jocn_a_01544] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Abstract
Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.
Collapse
|
18
|
Lai X, Huang Q, Xin J, Yu H, Wen J, Huang S, Zhang H, Shen H, Tang Y. Identifying Methamphetamine Abstainers With Convolutional Neural Networks and Short-Time Fourier Transform. Front Psychol 2021; 12:684001. [PMID: 34456796 PMCID: PMC8385271 DOI: 10.3389/fpsyg.2021.684001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 07/12/2021] [Indexed: 11/13/2022] Open
Abstract
Few studies have investigated the functional patterns of methamphetamine abstainers. A better understanding of the underlying neurobiological mechanism in the brains of methamphetamine abstainers will help to explain their abnormal behaviors. Forty-two male methamphetamine abstainers, currently in a long-term abstinence status (for at least 14 months), and 32 male healthy controls were recruited. All subjects underwent functional MRI while responding to drug-associated cues. This study proposes to combine a convolutional neural network with a short-time Fourier transform to identify different brain patterns between methamphetamine abstainers and controls. The short-time Fourier transformation provides time-localized frequency information, while the convolutional neural network extracts the structural features of the time-frequency spectrograms. The results showed that the classifier achieved a satisfactory performance (98.9% accuracy) and could extract robust brain voxel information. The highly discriminative power voxels were mainly concentrated in the left inferior orbital frontal gyrus, the bilateral postcentral gyri, and the bilateral paracentral lobules. This study provides a novel insight into the different functional patterns between methamphetamine abstainers and healthy controls. It also elucidates the pathological mechanism of methamphetamine abstainers from the view of time-frequency spectrograms.
Collapse
Affiliation(s)
- Xin Lai
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Qiuping Huang
- National Clinical Research Center for Mental Disorders, Department of Psychiatry, The Second Xiangya Hospital of Central South University, Changsha, China.,Institute of Mental Health of Central South University, Chinese National Technology Institute on Mental Disorders, Hunan Key Laboratory of Psychiatry and Mental Health, Hunan Medical Center for Mental Health, Changsha, China
| | - Jiang Xin
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hufei Yu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jingxi Wen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Shucai Huang
- National Clinical Research Center for Mental Disorders, Department of Psychiatry, The Second Xiangya Hospital of Central South University, Changsha, China.,Institute of Mental Health of Central South University, Chinese National Technology Institute on Mental Disorders, Hunan Key Laboratory of Psychiatry and Mental Health, Hunan Medical Center for Mental Health, Changsha, China.,The Fourth People's Hospital of Wuhu, Wuhu, China
| | - Hao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hongxian Shen
- National Clinical Research Center for Mental Disorders, Department of Psychiatry, The Second Xiangya Hospital of Central South University, Changsha, China.,Institute of Mental Health of Central South University, Chinese National Technology Institute on Mental Disorders, Hunan Key Laboratory of Psychiatry and Mental Health, Hunan Medical Center for Mental Health, Changsha, China
| | - Yan Tang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
19
|
Kwon SW, Choi IJ, Kang JY, Jang WI, Lee GH, Lee MC. Ultrasonographic Thyroid Nodule Classification Using a Deep Convolutional Neural Network with Surgical Pathology. J Digit Imaging 2021; 33:1202-1208. [PMID: 32705433 DOI: 10.1007/s10278-020-00362-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Ultrasonography with fine-needle aspiration biopsy is commonly used to detect thyroid cancer. However, thyroid ultrasonography is prone to subjective interpretations and interobserver variabilities. The objective of this study was to develop a thyroid nodule classification system for ultrasonography using convolutional neural networks. Transverse and longitudinal ultrasonographic thyroid images of 762 patients were used to create a deep learning model. After surgical biopsy, 325 cases were confirmed to be benign and 437 cases were confirmed to be papillary thyroid carcinoma. Image annotation marks were removed, and missing regions were recovered using neighboring parenchyme. To reduce overfitting of the deep learning model, we applied data augmentation, global average pooling. And 4-fold cross-validation was performed to detect overfitting. We employed a transfer learning method with the pretrained deep learning model VGG16. The average area under the curve of the model was 0.916, and its specificity and sensitivity were 0.70 and 0.92, respectively. Positive and negative predictive values were 0.90 and 0.75, respectively. We introduced a new fine-tuned deep learning model for classifying thyroid nodules in ultrasonography. We expect that this model will help physicians diagnose thyroid nodules with ultrasonography.
Collapse
Affiliation(s)
- Soon Woo Kwon
- Radiation Medicine Clinical Research Division, Korea Institute of Radiological and Medical Sciences (KIRAMS), Seoul, South Korea
| | - Ik Joon Choi
- Department of Otorhinolaryngology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences (KIRAMS), 75 Nowon-gil, Nowon-gu, Seoul, 139-706, South Korea
| | - Ju Yong Kang
- Department of Otorhinolaryngology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences (KIRAMS), 75 Nowon-gil, Nowon-gu, Seoul, 139-706, South Korea
| | - Won Il Jang
- Radiation Oncology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences (KIRAMS), Seoul, South Korea
| | - Guk-Haeng Lee
- Department of Otorhinolaryngology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences (KIRAMS), 75 Nowon-gil, Nowon-gu, Seoul, 139-706, South Korea
| | - Myung-Chul Lee
- Department of Otorhinolaryngology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences (KIRAMS), 75 Nowon-gil, Nowon-gu, Seoul, 139-706, South Korea.
| |
Collapse
|
20
|
Cole ZJ, Kuntzelman KM, Dodd MD, Johnson MR. Convolutional neural networks can decode eye movement data: A black box approach to predicting task from eye movements. J Vis 2021; 21:9. [PMID: 34264288 PMCID: PMC8288051 DOI: 10.1167/jov.21.7.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Previous attempts to classify task from eye movement data have relied on model architectures designed to emulate theoretically defined cognitive processes and/or data that have been processed into aggregate (e.g., fixations, saccades) or statistical (e.g., fixation density) features. Black box convolutional neural networks (CNNs) are capable of identifying relevant features in raw and minimally processed data and images, but difficulty interpreting these model architectures has contributed to challenges in generalizing lab-trained CNNs to applied contexts. In the current study, a CNN classifier was used to classify task from two eye movement datasets (Exploratory and Confirmatory) in which participants searched, memorized, or rated indoor and outdoor scene images. The Exploratory dataset was used to tune the hyperparameters of the model, and the resulting model architecture was retrained, validated, and tested on the Confirmatory dataset. The data were formatted into timelines (i.e., x-coordinate, y-coordinate, pupil size) and minimally processed images. To further understand the informational value of each component of the eye movement data, the timeline and image datasets were broken down into subsets with one or more components systematically removed. Classification of the timeline data consistently outperformed the image data. The Memorize condition was most often confused with Search and Rate. Pupil size was the least uniquely informative component when compared with the x- and y-coordinates. The general pattern of results for the Exploratory dataset was replicated in the Confirmatory dataset. Overall, the present study provides a practical and reliable black box solution to classifying task from eye movement data.
Collapse
|
21
|
Liu C, Kang Y, Zhang L, Zhang J. Rapidly Decoding Image Categories From MEG Data Using a Multivariate Short-Time FC Pattern Analysis Approach. IEEE J Biomed Health Inform 2021; 25:1139-1150. [PMID: 32750957 DOI: 10.1109/jbhi.2020.3008731] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recent advances in the development of multivariate analysis methods have led to the application of multivariate pattern analysis (MVPA) to investigate the interactions between brain regions using graph theory (functional connectivity, FC) and decode visual categories from functional magnetic resonance imaging (fMRI) data from a continuous multicategory paradigm. To estimate stable FC patterns from fMRI data, previous studies required long periods in the order of several minutes, in comparison to the human brain that categories visual stimuli within hundreds of milliseconds. Constructing short-time dynamic FC patterns in the order of milliseconds and decoding visual categories is a relatively novel concept. In this study, we developed a multivariate decoding algorithm based on FC patterns and applied it to magnetoencephalography (MEG) data. MEG data were recorded from participants presented with image stimuli in four categories (faces, scenes, animals and tools). MEG data from 17 participants demonstrate that short-time dynamic FC patterns yield brain activity patterns that can be used to decode visual categories with high accuracy. Our results show that FC patterns change over the time window, and FC patterns extracted in the time window of 0∼200 ms after the stimulus onset were most stable. Further, the categorizing accuracy peaked (the mean binary accuracy is above 78.6% at individual level) in the FC patterns estimated within the 0∼200 ms interval. These findings elucidate the underlying connectivity information during visual category processing on a relatively smaller time scale and demonstrate that the contribution of FC patterns to categorization fluctuates over time.
Collapse
|
22
|
Cross L, Cockburn J, Yue Y, O'Doherty JP. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021; 109:724-738.e7. [PMID: 33326755 PMCID: PMC7897245 DOI: 10.1016/j.neuron.2020.11.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 10/15/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022]
Abstract
Humans possess an exceptional aptitude to efficiently make decisions from high-dimensional sensory observations. However, it is unknown how the brain compactly represents the current state of the environment to guide this process. The deep Q-network (DQN) achieves this by capturing highly nonlinear mappings from multivariate inputs to the values of potential actions. We deployed DQN as a model of brain activity and behavior in participants playing three Atari video games during fMRI. Hidden layers of DQN exhibited a striking resemblance to voxel activity in a distributed sensorimotor network, extending throughout the dorsal visual pathway into posterior parietal cortex. Neural state-space representations emerged from nonlinear transformations of the pixel space bridging perception to action and reward. These transformations reshape axes to reflect relevant high-level features and strip away information about task-irrelevant sensory features. Our findings shed light on the neural encoding of task representations for decision-making in real-world situations.
Collapse
Affiliation(s)
- Logan Cross
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, USA.
| | - Jeff Cockburn
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Yisong Yue
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
23
|
Tong S, Liang X, Kumada T, Iwaki S. Putative ratios of facial attractiveness in a deep neural network. Vision Res 2020; 178:86-99. [PMID: 33186876 DOI: 10.1016/j.visres.2020.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 08/25/2020] [Accepted: 10/02/2020] [Indexed: 12/01/2022]
Abstract
Empirical evidence has shown that there is an ideal arrangement of facial features (ideal ratios) that can optimize the attractiveness of a person's face. These putative ratios define facial attractiveness in terms of spatial relations and provide important rules for measuring the attractiveness of a face. In this paper, we show that a deep neural network (DNN) model can learn putative ratios from face images based only on categorical annotation when no annotated facial features for attractiveness are explicitly given. To this end, we conducted three experiments. In Experiment 1, we trained a DNN model to recognize the attractiveness (female/male × high/low attractiveness) of face in the images using four category-specific neurons (CSNs). In Experiment 2, face-like images were generated by reversing the DNN model (e.g., deconvolution). These images depict the intuitive attributes encoded in CSNs of the four categories of facial attractiveness and reveal certain consistencies with reported evidence on the putative ratios. In Experiment 3, simulated psychophysical experiments on face images with varying putative ratios reveal changes in the activity of the CSNs that are remarkably similar to those of human judgements reported in a previous study. These results show that the trained DNN model can learn putative ratios as key features for the representation of facial attractiveness. This finding advances our understanding of facial attractiveness via DNN-based perspective approaches.
Collapse
Affiliation(s)
- Song Tong
- IST, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
| | - Xuefeng Liang
- School of Artificial Intelligence, Xidian University, Xi'an, PR China.
| | - Takatsune Kumada
- IST, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
| | - Sunao Iwaki
- Information Technology and Human Factors, AIST, Tsukuba, Japan.
| |
Collapse
|
24
|
Cui Y, Zhang C, Qiao K, Wang L, Yan B, Tong L. Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation. Brain Sci 2020; 10:E602. [PMID: 32887405 PMCID: PMC7564968 DOI: 10.3390/brainsci10090602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/09/2020] [Accepted: 08/13/2020] [Indexed: 11/17/2022] Open
Abstract
Representation invariance plays a significant role in the performance of deep convolutional neural networks (CNNs) and human visual information processing in various complicated image-based tasks. However, there has been abounding confusion concerning the representation invariance mechanisms of the two sophisticated systems. To investigate their relationship under common conditions, we proposed a representation invariance analysis approach based on data augmentation technology. Firstly, the original image library was expanded by data augmentation. The representation invariances of CNNs and the ventral visual stream were then studied by comparing the similarities of the corresponding layer features of CNNs and the prediction performance of visual encoding models based on functional magnetic resonance imaging (fMRI) before and after data augmentation. Our experimental results suggest that the architecture of CNNs, combinations of convolutional and fully-connected layers, developed representation invariance of CNNs. Remarkably, we found representation invariance belongs to all successive stages of the ventral visual stream. Hence, the internal correlation between CNNs and the human visual system in representation invariance was revealed. Our study promotes the advancement of invariant representation of computer vision and deeper comprehension of the representation invariance mechanism of human visual information processing.
Collapse
Affiliation(s)
| | | | | | | | | | - Li Tong
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China; (Y.C.); (C.Z.); (K.Q.); (L.W.); (B.Y.)
| |
Collapse
|
25
|
Dijkstra N, Ambrogioni L, Vidaurre D, van Gerven M. Neural dynamics of perceptual inference and its reversal during imagery. eLife 2020; 9:e53588. [PMID: 32686645 PMCID: PMC7371419 DOI: 10.7554/elife.53588] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 06/30/2020] [Indexed: 12/27/2022] Open
Abstract
After the presentation of a visual stimulus, neural processing cascades from low-level sensory areas to increasingly abstract representations in higher-level areas. It is often hypothesised that a reversal in neural processing underlies the generation of mental images as abstract representations are used to construct sensory representations in the absence of sensory input. According to predictive processing theories, such reversed processing also plays a central role in later stages of perception. Direct experimental evidence of reversals in neural information flow has been missing. Here, we used a combination of machine learning and magnetoencephalography to characterise neural dynamics in humans. We provide direct evidence for a reversal of the perceptual feed-forward cascade during imagery and show that, during perception, such reversals alternate with feed-forward processing in an 11 Hz oscillatory pattern. Together, these results show how common feedback processes support both veridical perception and mental imagery.
Collapse
Affiliation(s)
- Nadine Dijkstra
- Donders Centre for Cognition, Radboud University, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
- Wellcome Centre for Human Neuroimaging, University College LondonLondonUnited Kingdom
| | - Luca Ambrogioni
- Donders Centre for Cognition, Radboud University, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Diego Vidaurre
- Oxford Centre for Human Brain Activity, Oxford UniversityOxfordUnited Kingdom
- Department of Clinical Health, Aarhus UniversityAarhusDenmark
| | - Marcel van Gerven
- Donders Centre for Cognition, Radboud University, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| |
Collapse
|
26
|
Clarke A. Dynamic activity patterns in the anterior temporal lobe represents object semantics. Cogn Neurosci 2020; 11:111-121. [PMID: 32249714 PMCID: PMC7446031 DOI: 10.1080/17588928.2020.1742678] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 02/07/2020] [Indexed: 02/07/2023]
Abstract
The anterior temporal lobe (ATL) is considered a crucial area for the representation of transmodal concepts. Recent evidence suggests that specific regions within the ATL support the representation of individual object concepts, as shown by studies combining multivariate analysis methods and explicit measures of semantic knowledge. This research looks to further our understanding by probing conceptual representations at a spatially and temporally resolved neural scale. Representational similarity analysis was applied to human intracranial recordings from anatomically defined lateral to medial ATL sub-regions. Neural similarity patterns were tested against semantic similarity measures, where semantic similarity was defined by a hybrid corpus-based and feature-based approach. Analyses show that the perirhinal cortex, in the medial ATL, significantly related to semantic effects around 200 to 400 ms, and were greater than more lateral ATL regions. Further, semantic effects were present in low frequency (theta and alpha) oscillatory phase signals. These results provide converging support that more medial regions of the ATL support the representation of basic-level visual object concepts within the first 400 ms, and provide a bridge between prior fMRI and MEG work by offering detailed evidence for the presence of conceptual representations within the ATL.
Collapse
Affiliation(s)
- Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
27
|
Wardle SG, Baker C. Recent advances in understanding object recognition in the human brain: deep neural networks, temporal dynamics, and context. F1000Res 2020; 9. [PMID: 32566136 PMCID: PMC7291077 DOI: 10.12688/f1000research.22296.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/08/2020] [Indexed: 12/17/2022] Open
Abstract
Object recognition is the ability to identify an object or category based on the combination of visual features observed. It is a remarkable feat of the human brain, given that the patterns of light received by the eye associated with the properties of a given object vary widely with simple changes in viewing angle, ambient lighting, and distance. Furthermore, different exemplars of a specific object category can vary widely in visual appearance, such that successful categorization requires generalization across disparate visual features. In this review, we discuss recent advances in understanding the neural representations underlying object recognition in the human brain. We highlight three current trends in the approach towards this goal within the field of cognitive neuroscience. Firstly, we consider the influence of deep neural networks both as potential models of object vision and in how their representations relate to those in the human brain. Secondly, we review the contribution that time-series neuroimaging methods have made towards understanding the temporal dynamics of object representations beyond their spatial organization within different brain regions. Finally, we argue that an increasing emphasis on the context (both visual and task) within which object recognition occurs has led to a broader conceptualization of what constitutes an object representation for the brain. We conclude by identifying some current challenges facing the experimental pursuit of understanding object recognition and outline some emerging directions that are likely to yield new insight into this complex cognitive process.
Collapse
Affiliation(s)
- Susan G Wardle
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Chris Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
28
|
Xie S, Kaiser D, Cichy RM. Visual Imagery and Perception Share Neural Representations in the Alpha Frequency Band. Curr Biol 2020; 30:2621-2627.e5. [PMID: 32531274 PMCID: PMC7342016 DOI: 10.1016/j.cub.2020.04.074] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 04/06/2020] [Accepted: 04/27/2020] [Indexed: 11/21/2022]
Abstract
To behave adaptively with sufficient flexibility, biological organisms must cognize beyond immediate reaction to a physically present stimulus. For this, humans use visual mental imagery [1, 2], the ability to conjure up a vivid internal experience from memory that stands in for the percept of the stimulus. Visually imagined contents subjectively mimic perceived contents, suggesting that imagery and perception share common neural mechanisms. Using multivariate pattern analysis on human electroencephalography (EEG) data, we compared the oscillatory time courses of mental imagery and perception of objects. We found that representations shared between imagery and perception emerged specifically in the alpha frequency band. These representations were present in posterior, but not anterior, electrodes, suggesting an origin in parieto-occipital cortex. Comparison of the shared representations to computational models using representational similarity analysis revealed a relationship to later layers of deep neural networks trained on object representations, but not auditory or semantic models, suggesting representations of complex visual features as the basis of commonality. Together, our results identify and characterize alpha oscillations as a cortical signature of representations shared between visual mental imagery and perception. Perception and imagery share neural representations in the alpha frequency band Shared representations stem from parieto-occipital sources Modeling suggests contents of shared representations are complex visual features
Collapse
Affiliation(s)
- Siying Xie
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee 45, Berlin 14195, Germany.
| | - Daniel Kaiser
- Department of Psychology, University of York, Heslington, York YO10 5DD, UK
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee 45, Berlin 14195, Germany; Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany; Bernstein Centre for Computational Neuroscience Berlin, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany.
| |
Collapse
|
29
|
Bone MB, Ahmad F, Buchsbaum BR. Feature-specific neural reactivation during episodic memory. Nat Commun 2020; 11:1945. [PMID: 32327642 PMCID: PMC7181630 DOI: 10.1038/s41467-020-15763-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 03/12/2020] [Indexed: 12/04/2022] Open
Abstract
We present a multi-voxel analytical approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations from a neural network to decode neural reactivation in fMRI data collected while participants performed an episodic visual recall task. We show that neural reactivation associated with low-level (e.g. edges), high-level (e.g. facial features), and semantic (e.g. “terrier”) features occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the contributions of low- and high-level features to the vividness of visual memories and challenge a strict interpretation the posterior-to-anterior visual hierarchy. Memory recollection involves reactivation of neural activity that occurred during the recalled experience. Here, the authors show that neural reactivation can be decomposed into visual-semantic features, is widely synchronized throughout the brain, and predicts memory vividness and accuracy.
Collapse
Affiliation(s)
- Michael B Bone
- Rotman Research Institute at Baycrest, Toronto, ON, M6A 2E1, Canada. .,Department of Psychology, University of Toronto, Toronto, ON, M5S 1A1, Canada.
| | - Fahad Ahmad
- Rotman Research Institute at Baycrest, Toronto, ON, M6A 2E1, Canada
| | - Bradley R Buchsbaum
- Rotman Research Institute at Baycrest, Toronto, ON, M6A 2E1, Canada.,Department of Psychology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| |
Collapse
|
30
|
Fritsche M, Lawrence SJD, de Lange FP. Temporal tuning of repetition suppression across the visual cortex. J Neurophysiol 2019; 123:224-233. [PMID: 31774368 DOI: 10.1152/jn.00582.2019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The visual system adapts to its recent history. A phenomenon related to this is repetition suppression (RS), a reduction in neural responses to repeated compared with nonrepeated visual input. An intriguing hypothesis is that the timescale over which RS occurs across the visual hierarchy is tuned to the temporal statistics of visual input features, which change rapidly in low-level areas but are more stable in higher level areas. Here, we tested this hypothesis by studying the influence of the temporal lag between successive visual stimuli on RS throughout the visual system using functional (f)MRI. Twelve human volunteers engaged in four fMRI sessions in which we characterized the blood oxygen level-dependent response to pairs of repeated and nonrepeated natural images with interstimulus intervals (ISI) ranging from 50 to 1,000 ms to quantify the temporal tuning of RS along the posterior-anterior axis of the visual system. As expected, RS was maximal for short ISIs and decayed with increasing ISI. Crucially, however, and against our hypothesis, RS decayed at a similar rate in early and late visual areas. This finding challenges the prevailing view that the timescale of RS increases along the posterior-anterior axis of the visual system and suggests that RS is not tuned to temporal input regularities.NEW & NOTEWORTHY Visual areas show reduced neural responses to repeated compared with nonrepeated visual input, a phenomenon termed repetition suppression (RS). Here we show that RS decays at a similar rate in low- and high-level visual areas, suggesting that the short-term decay of RS across the visual hierarchy is not tuned to temporal input regularities. This may limit the specificity with which the mechanisms underlying RS could optimize the processing of input features across the visual hierarchy.
Collapse
Affiliation(s)
- Matthias Fritsche
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Samuel J D Lawrence
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
31
|
Han K, Wen H, Shi J, Lu KH, Zhang Y, Fu D, Liu Z. Variational autoencoder: An unsupervised model for encoding and decoding fMRI activity in visual cortex. Neuroimage 2019; 198:125-136. [PMID: 31103784 PMCID: PMC6592726 DOI: 10.1016/j.neuroimage.2019.05.039] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Revised: 04/13/2019] [Accepted: 05/15/2019] [Indexed: 01/21/2023] Open
Abstract
Goal-driven and feedforward-only convolutional neural networks (CNN) have been shown to be able to predict and decode cortical responses to natural images or videos. Here, we explored an alternative deep neural network, variational auto-encoder (VAE), as a computational model of the visual cortex. We trained a VAE with a five-layer encoder and a five-layer decoder to learn visual representations from a diverse set of unlabeled images. Using the trained VAE, we predicted and decoded cortical activity observed with functional magnetic resonance imaging (fMRI) from three human subjects passively watching natural videos. Compared to CNN, VAE could predict the video-evoked cortical responses with comparable accuracy in early visual areas, but relatively lower accuracy in higher-order visual areas. The distinction between CNN and VAE in terms of encoding performance was primarily attributed to their different learning objectives, rather than their different model architecture or number of parameters. Despite lower encoding accuracies, VAE offered a more convenient strategy for decoding the fMRI activity to reconstruct the video input, by first converting the fMRI activity to the VAE's latent variables, and then converting the latent variables to the reconstructed video frames through the VAE's decoder. This strategy was more advantageous than alternative decoding methods, e.g. partial least squares regression, for being able to reconstruct both the spatial structure and color of the visual input. Such findings highlight VAE as an unsupervised model for learning visual representation, as well as its potential and limitations for explaining cortical responses and reconstructing naturalistic and diverse visual experiences.
Collapse
Affiliation(s)
- Kuan Han
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Haiguang Wen
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Junxing Shi
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Kun-Han Lu
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Yizhen Zhang
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Di Fu
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Zhongming Liu
- Weldon School of Biomedical Engineering, USA; School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
32
|
Brunton BW, Beyeler M. Data-driven models in human neuroscience and neuroengineering. Curr Opin Neurobiol 2019; 58:21-29. [PMID: 31325670 DOI: 10.1016/j.conb.2019.06.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 06/22/2019] [Indexed: 12/26/2022]
Abstract
Discoveries in modern human neuroscience are increasingly driven by quantitative understanding of complex data. Data-intensive approaches to modeling have promise to dramatically advance our understanding of the brain and critically enable neuroengineering capabilities. In this review, we provide an accessible primer to modern modeling approaches and highlight recent data-driven discoveries in the domains of neuroimaging, single-neuron and neuronal population responses, and device neuroengineering. Further, we suggest that meaningful progress requires the community to tackle open challenges in the realms of model interpretability and generalizability, training pipelines of data-fluent human neuroscientists, and integrated consideration of data ethics.
Collapse
Affiliation(s)
- Bingni W Brunton
- Department of Biology, University of Washington, Seattle, WA 98195, USA; Institute for Neuroengineering, University of Washington, Seattle, WA 98195, USA; eScience Institute, University of Washington, Seattle, WA 98195, USA
| | - Michael Beyeler
- Institute for Neuroengineering, University of Washington, Seattle, WA 98195, USA; eScience Institute, University of Washington, Seattle, WA 98195, USA; Department of Psychology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
33
|
Tripp B. Approximating the Architecture of Visual Cortex in a Convolutional Network. Neural Comput 2019; 31:1551-1591. [PMID: 31260392 DOI: 10.1162/neco_a_01211] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Deep convolutional neural networks (CNNs) have certain structural, mechanistic, representational, and functional parallels with primate visual cortex and also many differences. However, perhaps some of the differences can be reconciled. This study develops a cortex-like CNN architecture, via (1) a loss function that quantifies the consistency of a CNN architecture with neural data from tract tracing, cell reconstruction, and electrophysiology studies; (2) a hyperparameter-optimization approach for reducing this loss, and (3) heuristics for organizing units into convolutional-layer grids. The optimized hyperparameters are consistent with neural data. The cortex-like architecture differs from typical CNN architectures. In particular, it has longer skip connections, larger kernels and strides, and qualitatively different connection sparsity. Importantly, layers of the cortex-like network have one-to-one correspondences with cortical neuron populations. This should allow unambiguous comparison of model and brain representations in the future and, consequently, more precise measurement of progress toward more biologically realistic deep networks.
Collapse
Affiliation(s)
- Bryan Tripp
- Department of Systems Design Engineering and Centre for Theoretical Neuroscience, University of Waterloo, Waterloo, ON N2L 3G1
| |
Collapse
|
34
|
Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019; 16:036019. [PMID: 30831567 PMCID: PMC6822609 DOI: 10.1088/1741-2552/ab0c59] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Collapse
Affiliation(s)
- Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
35
|
Dijkstra N, Bosch SE, van Gerven MA. Shared Neural Mechanisms of Visual Perception and Imagery. Trends Cogn Sci 2019; 23:423-434. [DOI: 10.1016/j.tics.2019.02.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 02/07/2019] [Accepted: 02/20/2019] [Indexed: 12/16/2022]
|
36
|
Dima DC, Perry G, Singh KD. Spatial frequency supports the emergence of categorical representations in visual cortex during natural scene perception. Neuroimage 2018; 179:102-116. [PMID: 29902586 PMCID: PMC6057270 DOI: 10.1016/j.neuroimage.2018.06.033] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/01/2018] [Accepted: 06/09/2018] [Indexed: 11/22/2022] Open
Abstract
In navigating our environment, we rapidly process and extract meaning from visual cues. However, the relationship between visual features and categorical representations in natural scene perception is still not well understood. Here, we used natural scene stimuli from different categories and filtered at different spatial frequencies to address this question in a passive viewing paradigm. Using representational similarity analysis (RSA) and cross-decoding of magnetoencephalography (MEG) data, we show that categorical representations emerge in human visual cortex at ∼180 ms and are linked to spatial frequency processing. Furthermore, dorsal and ventral stream areas reveal temporally and spatially overlapping representations of low and high-level layer activations extracted from a feedforward neural network. Our results suggest that neural patterns from extrastriate visual cortex switch from low-level to categorical representations within 200 ms, highlighting the rapid cascade of processing stages essential in human visual perception.
Collapse
Affiliation(s)
- Diana C Dima
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom.
| | - Gavin Perry
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| | - Krish D Singh
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| |
Collapse
|
37
|
Lindsay GW, Miller KD. How biological attention mechanisms improve task performance in a large-scale visual system model. eLife 2018; 7:e38105. [PMID: 30272560 PMCID: PMC6207429 DOI: 10.7554/elife.38105] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Accepted: 09/28/2018] [Indexed: 11/13/2022] Open
Abstract
How does attentional modulation of neural activity enhance performance? Here we use a deep convolutional neural network as a large-scale model of the visual system to address this question. We model the feature similarity gain model of attention, in which attentional modulation is applied according to neural stimulus tuning. Using a variety of visual tasks, we show that neural modulations of the kind and magnitude observed experimentally lead to performance changes of the kind and magnitude observed experimentally. We find that, at earlier layers, attention applied according to tuning does not successfully propagate through the network, and has a weaker impact on performance than attention applied according to values computed for optimally modulating higher areas. This raises the question of whether biological attention might be applied at least in part to optimize function rather than strictly according to tuning. We suggest a simple experiment to distinguish these alternatives.
Collapse
Affiliation(s)
- Grace W Lindsay
- Center for Theoretical Neuroscience, College of Physicians and SurgeonsColumbia UniversityNew YorkUnited States
- Mortimer B. Zuckerman Mind Brain Behaviour InstituteColumbia UniversityNew YorkUnited States
| | - Kenneth D Miller
- Center for Theoretical Neuroscience, College of Physicians and SurgeonsColumbia UniversityNew YorkUnited States
- Mortimer B. Zuckerman Mind Brain Behaviour InstituteColumbia UniversityNew YorkUnited States
- Swartz Program in Theoretical NeuroscienceKavli Institute for Brain ScienceNew YorkUnited States
- Department of NeuroscienceColumbia UniversityNew YorkUnited States
| |
Collapse
|
38
|
|
39
|
Kuzovkin I, Vicente R, Petton M, Lachaux JP, Baciu M, Kahane P, Rheims S, Vidal JR, Aru J. Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex. Commun Biol 2018; 1:107. [PMID: 30271987 PMCID: PMC6123818 DOI: 10.1038/s42003-018-0110-y] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 07/15/2018] [Indexed: 11/08/2022] Open
Abstract
Recent advances in the field of artificial intelligence have revealed principles about neural processing, in particular about vision. Previous work demonstrated a direct correspondence between the hierarchy of the human visual areas and layers of deep convolutional neural networks (DCNN) trained on visual object recognition. We use DCNN to investigate which frequency bands correlate with feature transformations of increasing complexity along the ventral visual pathway. By capitalizing on intracranial depth recordings from 100 patients we assess the alignment between the DCNN and signals at different frequency bands. We find that gamma activity (30-70 Hz) matches the increasing complexity of visual feature representations in DCNN. These findings show that the activity of the DCNN captures the essential characteristics of biological object recognition not only in space and time, but also in the frequency domain. These results demonstrate the potential that artificial intelligence algorithms have in advancing our understanding of the brain.
Collapse
Affiliation(s)
- Ilya Kuzovkin
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, 51005, Estonia.
| | - Raul Vicente
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, 51005, Estonia.
| | - Mathilde Petton
- INSERM U1028, CNRS UMR5292, Brain Dynamics and Cognition Team, Lyon Neuroscience Research Center, Bron, 69500, France
- Université Claude Bernard, Lyon, France
| | - Jean-Philippe Lachaux
- INSERM U1028, CNRS UMR5292, Brain Dynamics and Cognition Team, Lyon Neuroscience Research Center, Bron, 69500, France
- Université Claude Bernard, Lyon, France
| | - Monica Baciu
- University Grenoble Alpes, LPNC, F-38040, Grenoble, France
- CNRS, LPNC UMR 5105, F38040, Grenoble, France
| | - Philippe Kahane
- Inserm, U1216, F-38000, Grenoble, France
- Neurology Department, CHU de Grenoble, Hôpital Michallon, F-38000, Grenoble, France
| | - Sylvain Rheims
- INSERM U1028, CNRS UMR5292, TIGER Team, Lyon Neuroscience Research Center, Bron, 69500, France
- Department of Functional Neurology and Epileptology, Hospices Civils de Lyon, Bron, 69500, France
- Epilepsy Institute, Bron, 69500, France
| | - Juan R Vidal
- University Grenoble Alpes, LPNC, F-38040, Grenoble, France
- CNRS, LPNC UMR 5105, F38040, Grenoble, France
- Catholic University of Lyon, Lyon, 69002, France
| | - Jaan Aru
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, 51005, Estonia.
- Department of Penal Law, School of Law, University of Tartu, Tallinn, 10119, Estonia.
| |
Collapse
|
40
|
Wen H, Shi J, Chen W, Liu Z. Transferring and generalizing deep-learning-based neural encoding models across subjects. Neuroimage 2018; 176:152-163. [PMID: 29705690 PMCID: PMC5976558 DOI: 10.1016/j.neuroimage.2018.04.053] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Accepted: 04/23/2018] [Indexed: 12/11/2022] Open
Abstract
Recent studies have shown the value of using deep learning models for mapping and characterizing how the brain represents and organizes information for natural vision. However, modeling the relationship between deep learning models and the brain (or encoding models), requires measuring cortical responses to large and diverse sets of natural visual stimuli from single subjects. This requirement limits prior studies to few subjects, making it difficult to generalize findings across subjects or for a population. In this study, we developed new methods to transfer and generalize encoding models across subjects. To train encoding models specific to a target subject, the models trained for other subjects were used as the prior models and were refined efficiently using Bayesian inference with a limited amount of data from the target subject. To train encoding models for a population, the models were progressively trained and updated with incremental data from different subjects. For the proof of principle, we applied these methods to functional magnetic resonance imaging (fMRI) data from three subjects watching tens of hours of naturalistic videos, while a deep residual neural network driven by image recognition was used to model visual cortical processing. Results demonstrate that the methods developed herein provide an efficient and effective strategy to establish both subject-specific and population-wide predictive models of cortical representations of high-dimensional and hierarchical visual features.
Collapse
Affiliation(s)
- Haiguang Wen
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Junxing Shi
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Wei Chen
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Zhongming Liu
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA; School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
41
|
Gruber LZ, Haruvi A, Basri R, Irani M. Perceptual Dominance in Brief Presentations of Mixed Images: Human Perception vs. Deep Neural Networks. Front Comput Neurosci 2018; 12:57. [PMID: 30087604 PMCID: PMC6066547 DOI: 10.3389/fncom.2018.00057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 07/03/2018] [Indexed: 11/23/2022] Open
Abstract
Visual perception involves continuously choosing the most prominent inputs while suppressing others. Neuroscientists induce visual competitions in various ways to study why and how the brain makes choices of what to perceive. Recently deep neural networks (DNNs) have been used as models of the ventral stream of the visual system, due to similarities in both accuracy and hierarchy of feature representation. In this study we created non-dynamic visual competitions for humans by briefly presenting mixtures of two images. We then tested feed-forward DNNs with similar mixtures and examined their behavior. We found that both humans and DNNs tend to perceive only one image when presented with a mixture of two. We revealed image parameters which predict this perceptual dominance and compared their predictability for the two visual systems. Our findings can be used to both improve DNNs as models, as well as potentially improve their performance by imitating biological behaviors.
Collapse
Affiliation(s)
- Liron Z Gruber
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
| | - Aia Haruvi
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
| | - Ronen Basri
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Michal Irani
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
42
|
Shared spatiotemporal category representations in biological and artificial deep neural networks. PLoS Comput Biol 2018; 14:e1006327. [PMID: 30040821 PMCID: PMC6075788 DOI: 10.1371/journal.pcbi.1006327] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 08/03/2018] [Accepted: 06/26/2018] [Indexed: 11/24/2022] Open
Abstract
Visual scene category representations emerge very rapidly, yet the computational transformations that enable such invariant categorizations remain elusive. Deep convolutional neural networks (CNNs) perform visual categorization at near human-level accuracy using a feedforward architecture, providing neuroscientists with the opportunity to assess one successful series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential scene category representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs). We found correspondence both over time and across the scalp: earlier (0–200 ms) ERP activity was best explained by early CNN layers at all electrodes. Although later activity at most electrode sites corresponded to earlier CNN layers, activity in right occipito-temporal electrodes was best explained by the later, fully-connected layers of the CNN around 225 ms post-stimulus, along with similar patterns in frontal electrodes. Taken together, these results suggest that the emergence of scene category representations develop through a dynamic interplay between early activity over occipital electrodes as well as later activity over temporal and frontal electrodes. We categorize visual scenes rapidly and effortlessly, but still have little insight into the neural processing stages that enable this feat. In a parallel development, deep convolutional neural networks (CNNs) have been developed that perform visual categorization with human-like accuracy. We hypothesized that the stages of processing in a CNN may parallel the stages of processing in the human brain. We found that this is indeed the case, with early brain signals best explained by early stages of the CNN and later brain signals explained by later CNN layers. We also found that category-specific information seems to first emerge in sensory cortex and is then rapidly fed up to frontal areas. The similarities between biological brains and artificial neural networks provide neuroscientists with the opportunity to better understand the process of categorization by studying the artificial systems.
Collapse
|
43
|
Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J Neurosci 2018; 38:7255-7269. [PMID: 30006365 DOI: 10.1523/jneurosci.0388-18.2018] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 06/06/2018] [Accepted: 07/08/2018] [Indexed: 11/21/2022] Open
Abstract
Primates, including humans, can typically recognize objects in visual images at a glance despite naturally occurring identity-preserving image transformations (e.g., changes in viewpoint). A primary neuroscience goal is to uncover neuron-level mechanistic models that quantitatively explain this behavior by predicting primate performance for each and every image. Here, we applied this stringent behavioral prediction test to the leading mechanistic models of primate vision (specifically, deep, convolutional, artificial neural networks; ANNs) by directly comparing their behavioral signatures against those of humans and rhesus macaque monkeys. Using high-throughput data collection systems for human and monkey psychophysics, we collected more than one million behavioral trials from 1472 anonymous humans and five male macaque monkeys for 2400 images over 276 binary object discrimination tasks. Consistent with previous work, we observed that state-of-the-art deep, feedforward convolutional ANNs trained for visual categorization (termed DCNNIC models) accurately predicted primate patterns of object-level confusion. However, when we examined behavioral performance for individual images within each object discrimination task, we found that all tested DCNNIC models were significantly nonpredictive of primate performance and that this prediction failure was not accounted for by simple image attributes nor rescued by simple model modifications. These results show that current DCNNIC models cannot account for the image-level behavioral patterns of primates and that new ANN models are needed to more precisely capture the neural mechanisms underlying primate object vision. To this end, large-scale, high-resolution primate behavioral benchmarks such as those obtained here could serve as direct guides for discovering such models.SIGNIFICANCE STATEMENT Recently, specific feedforward deep convolutional artificial neural networks (ANNs) models have dramatically advanced our quantitative understanding of the neural mechanisms underlying primate core object recognition. In this work, we tested the limits of those ANNs by systematically comparing the behavioral responses of these models with the behavioral responses of humans and monkeys at the resolution of individual images. Using these high-resolution metrics, we found that all tested ANN models significantly diverged from primate behavior. Going forward, these high-resolution, large-scale primate behavioral benchmarks could serve as direct guides for discovering better ANN models of the primate visual system.
Collapse
|
44
|
Dijkstra N, Mostert P, Lange FPD, Bosch S, van Gerven MA. Differential temporal dynamics during visual imagery and perception. eLife 2018; 7:33904. [PMID: 29807570 PMCID: PMC5973830 DOI: 10.7554/elife.33904] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 04/30/2018] [Indexed: 11/13/2022] Open
Abstract
Visual perception and imagery rely on similar representations in the visual cortex. During perception, visual activity is characterized by distinct processing stages, but the temporal dynamics underlying imagery remain unclear. Here, we investigated the dynamics of visual imagery in human participants using magnetoencephalography. Firstly, we show that, compared to perception, imagery decoding becomes significant later and representations at the start of imagery already overlap with later time points. This suggests that during imagery, the entire visual representation is activated at once or that there are large differences in the timing of imagery between trials. Secondly, we found consistent overlap between imagery and perceptual processing around 160 ms and from 300 ms after stimulus onset. This indicates that the N170 gets reactivated during imagery and that imagery does not rely on early perceptual representations. Together, these results provide important insights for our understanding of the neural mechanisms of visual imagery.
Collapse
Affiliation(s)
- Nadine Dijkstra
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pim Mostert
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Sander Bosch
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Marcel Aj van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
45
|
State-of-the-Art Mobile Intelligence: Enabling Robots to Move Like Humans by Estimating Mobility with Artificial Intelligence. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8030379] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
46
|
Automatic Detection of Acromegaly From Facial Photographs Using Machine Learning Methods. EBioMedicine 2017; 27:94-102. [PMID: 29269039 PMCID: PMC5828367 DOI: 10.1016/j.ebiom.2017.12.015] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 12/06/2017] [Accepted: 12/14/2017] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Automatic early detection of acromegaly is theoretically possible from facial photographs, which can lessen the prevalence and increase the cure probability. METHODS In this study, several popular machine learning algorithms were used to train a retrospective development dataset consisting of 527 acromegaly patients and 596 normal subjects. We firstly used OpenCV to detect the face bounding rectangle box, and then cropped and resized it to the same pixel dimensions. From the detected faces, locations of facial landmarks which were the potential clinical indicators were extracted. Frontalization was then adopted to synthesize frontal facing views to improve the performance. Several popular machine learning methods including LM, KNN, SVM, RT, CNN, and EM were used to automatically identify acromegaly from the detected facial photographs, extracted facial landmarks, and synthesized frontal faces. The trained models were evaluated using a separate dataset, of which half were diagnosed as acromegaly by growth hormone suppression test. RESULTS The best result of our proposed methods showed a PPV of 96%, a NPV of 95%, a sensitivity of 96% and a specificity of 96%. CONCLUSIONS Artificial intelligence can automatically early detect acromegaly with a high sensitivity and specificity.
Collapse
|