1
|
Rolls ET, Zhang C, Feng J. Slow semantic learning in the cerebral cortex, and its relation to the hippocampal episodic memory system. Cereb Cortex 2025; 35:bhaf107. [PMID: 40347159 DOI: 10.1093/cercor/bhaf107] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 04/08/2025] [Accepted: 04/08/2025] [Indexed: 05/12/2025] Open
Abstract
A key question is how new semantic representations are formed in the human brain and how this may benefit from the hippocampal episodic memory system. Here, we describe the major effective connectivity between the hippocampal memory system and the anterior temporal lobe (ATL) semantic memory system in humans. Then, we present and model a theory of how semantic representations may be formed in the human ATL using slow associative learning in semantic attractor networks that receive inputs from the hippocampal episodic memory system. The hypothesis is that if one category of semantic representations is being processed for several seconds, then a slow short-term memory trace associative biologically plausible learning rule will enable all the components during that time to be associated together in a semantic attractor network. This benefits from the binding of components provided by the hippocampal episodic memory system. The theory is modeled in a four-layer network for view-invariant visual object recognition, followed by a semantic attractor network layer that utilizes a temporal trace associative learning rule to form semantic categories based on the inputs that occur close together in time, using inputs from the hippocampal system or from the world.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Chenfei Zhang
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| |
Collapse
|
2
|
Rolls ET. A Theory and Model of Scene Representations With Hippocampal Spatial View Cells. Hippocampus 2025; 35:e70013. [PMID: 40296500 PMCID: PMC12038316 DOI: 10.1002/hipo.70013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2025] [Revised: 03/28/2025] [Accepted: 04/21/2025] [Indexed: 04/30/2025]
Abstract
A theory and network model are presented of how scene representations are built by forming spatial view cells in the ventromedial visual cortical scene pathway to the hippocampus in primates including humans. Layer 1, corresponding to V1-V4, connects to Layer 2 in the retrosplenial scene area and uses competitive learning to form visual feature combination neurons for the part of the scene being fixated, a visual fixation scene patch. In Layer 3, corresponding to the parahippocampal scene area and hippocampus, the visual fixation scene patches are stitched together to form whole scene representations. This is performed with a continuous attractor network for a whole scene made from the overlapping Gaussian receptive fields of the neurons as the head rotates to view the whole scene. In addition, in Layer 3, gain modulation by gaze direction maps visual fixation scene patches to the correct part of the whole scene representation when saccades are made. Each neuron in Layer 3 is thus a spatial view cell that responds to a location in a viewed scene based on visual features in a part of the scene. The novel conceptual advances are that this theory shows how scene representations may be built in primates, including humans, based on features in spatial scenes that anchor the scene representation to the world being viewed (to allocentric, world-based, space); and how gaze direction contributes to this. This offers a revolutionary approach to understanding the spatial representations for navigation and episodic memory in primates, including humans.
Collapse
Affiliation(s)
- Edmund T. Rolls
- Oxford Centre for Computational NeuroscienceOxfordUK
- Department of Computer ScienceUniversity of WarwickCoventryUK
| |
Collapse
|
3
|
Rolls ET, Turova TS. Visual cortical networks for "What" and "Where" to the human hippocampus revealed with dynamical graphs. Cereb Cortex 2025; 35:bhaf106. [PMID: 40347158 DOI: 10.1093/cercor/bhaf106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 05/12/2025] Open
Abstract
Key questions for understanding hippocampal function in memory and navigation in humans are the type and source of visual information that reaches the human hippocampus. We measured bidirectional pairwise effective connectivity with functional magnetic resonance imaging between 360 cortical regions while 956 Human Connectome Project participants viewed scenes, faces, tools, or body parts. We developed a method using deterministic dynamical graphs to define whole cortical networks and the flow in both directions between their cortical regions over timesteps after signal is applied to V1. We revealed that a ventromedial cortical visual "Where" network from V1 via the retrosplenial and medial parahippocampal scene areas reaches the hippocampus when scenes are viewed. A ventrolateral "What" visual cortical network reaches the hippocampus from V1 via V2-V4, the fusiform face cortex, and lateral parahippocampal region TF when faces/objects are viewed. There are major implications for understanding the computations of the human vs rodent hippocampus in memory and navigation: primates with their fovea and highly developed cortical visual processing networks process information about the location of faces, objects, and landmarks in viewed scenes, whereas in rodents the representations in the hippocampal system are mainly about the place where the individual is located and self-motion between places.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute for the Science and Technology of Brain Inspired Intelligence, Fudan University, China
| | | |
Collapse
|
4
|
Altavini TS, Chen M, Astorga G, Yan Y, Li W, Freiwald W, Gilbert CD. Expectation-dependent stimulus selectivity in the ventral visual cortical pathway. Proc Natl Acad Sci U S A 2025; 122:e2406684122. [PMID: 40146852 PMCID: PMC12002251 DOI: 10.1073/pnas.2406684122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 02/13/2025] [Indexed: 03/29/2025] Open
Abstract
The hierarchical view of the ventral object recognition pathway is primarily based on feedforward mechanisms, starting from a fixed basis set of object primitives and ending on a representation of whole objects in the inferotemporal cortex. Here, we provide a different view. Rather than being a fixed "labeled line" for a specific feature, neurons are continually changing their stimulus selectivities on a moment-to-moment basis, as dictated by top-down influences of object expectation and perceptual task. Here, we also derive the selectivity for stimulus features from an ethologically curated stimulus set, based on a delayed match-to-sample task, that finds components that are informative for object recognition in addition to full objects, though the top-down effects were seen for both informative and uninformative components. Cortical areas responding to these stimuli were identified with functional MRI in order to guide placement of chronically implanted electrode arrays.
Collapse
Affiliation(s)
- Tiago S. Altavini
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Minggui Chen
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Guadalupe Astorga
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Yin Yan
- Beijing Normal University, Beijing100875, China
| | - Wu Li
- Beijing Normal University, Beijing100875, China
| | - Winrich Freiwald
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| | - Charles D. Gilbert
- Laboratory of Neurobiology, The Rockefeller University, New York, NY10065
| |
Collapse
|
5
|
Rolls ET. Hippocampal Discoveries: Spatial View Cells, Connectivity, and Computations for Memory and Navigation, in Primates Including Humans. Hippocampus 2025; 35:e23666. [PMID: 39690918 DOI: 10.1002/hipo.23666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/19/2024] [Accepted: 11/26/2024] [Indexed: 12/19/2024]
Abstract
Two key series of discoveries about the hippocampus are described. One is the discovery of hippocampal spatial view cells in primates. This discovery opens the way to a much better understanding of human episodic memory, for episodic memory prototypically involves a memory of where people or objects or rewards have been seen in locations "out there" which could never be implemented by the place cells that encode the location of a rat or mouse. Further, spatial view cells are valuable for navigation using vision and viewed landmarks, and provide for much richer, vision-based, navigation than the place to place self-motion update performed by rats and mice who live in dark underground tunnels. Spatial view cells thus offer a revolution in our understanding of the functions of the hippocampus in memory and navigation in humans and other primates with well-developed foveate vision. The second discovery describes a computational theory of the hippocampal-neocortical memory system that includes the only quantitative theory of how information is recalled from the hippocampus to the neocortex. It is shown how foundations for this research were the discovery of reward neurons for food reward, and non-reward, in the primate orbitofrontal cortex, and representations of value including of monetary value in the human orbitofrontal cortex; and the discovery of face identity and face expression cells in the primate inferior temporal visual cortex and how they represent transform-invariant information. This research illustrates how in order to understand a brain computation, a whole series of integrated interdisciplinary discoveries is needed to build a theory of the operation of each neural system.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK
- Department of Computer Science, University of Warwick, Coventry, UK
| |
Collapse
|
6
|
Susan S. Neuroscientific insights about computer vision models: a concise review. BIOLOGICAL CYBERNETICS 2024; 118:331-348. [PMID: 39382577 DOI: 10.1007/s00422-024-00998-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 09/12/2024] [Indexed: 10/10/2024]
Abstract
The development of biologically-inspired computational models has been the focus of study ever since the artificial neuron was introduced by McCulloch and Pitts in 1943. However, a scrutiny of literature reveals that most attempts to replicate the highly efficient and complex biological visual system have been futile or have met with limited success. The recent state-of the-art computer vision models, such as pre-trained deep neural networks and vision transformers, may not be biologically inspired per se. Nevertheless, certain aspects of biological vision are still found embedded, knowingly or unknowingly, in the architecture and functioning of these models. This paper explores several principles related to visual neuroscience and the biological visual pathway that resonate, in some manner, in the architectural design and functioning of contemporary computer vision models. The findings of this survey can provide useful insights for building futuristic bio-inspired computer vision models. The survey is conducted from a historical perspective, tracing the biological connections of computer vision models starting with the basic artificial neuron to modern technologies such as deep convolutional neural network (CNN) and spiking neural networks (SNN). One spotlight of the survey is a discussion on biologically plausible neural networks and bio-inspired unsupervised learning mechanisms adapted for computer vision tasks in recent times.
Collapse
Affiliation(s)
- Seba Susan
- Department of Information Technology, Delhi Technological University, Delhi, India.
| |
Collapse
|
7
|
Wood JN, Pandey L, Wood SMW. Digital Twin Studies for Reverse Engineering the Origins of Visual Intelligence. Annu Rev Vis Sci 2024; 10:145-170. [PMID: 39292554 DOI: 10.1146/annurev-vision-101322-103628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
What are the core learning algorithms in brains? Nativists propose that intelligence emerges from innate domain-specific knowledge systems, whereas empiricists propose that intelligence emerges from domain-general systems that learn domain-specific knowledge from experience. We address this debate by reviewing digital twin studies designed to reverse engineer the learning algorithms in newborn brains. In digital twin studies, newborn animals and artificial agents are raised in the same environments and tested with the same tasks, permitting direct comparison of their learning abilities. Supporting empiricism, digital twin studies show that domain-general algorithms learn animal-like object perception when trained on the first-person visual experiences of newborn animals. Supporting nativism, digital twin studies show that domain-general algorithms produce innate domain-specific knowledge when trained on prenatal experiences (retinal waves). We argue that learning across humans, animals, and machines can be explained by a universal principle, which we call space-time fitting. Space-time fitting explains both empiricist and nativist phenomena, providing a unified framework for understanding the origins of intelligence.
Collapse
Affiliation(s)
- Justin N Wood
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
- Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, USA
- Neuroscience Department, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lalit Pandey
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
| | - Samantha M W Wood
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
- Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, USA
- Neuroscience Department, Indiana University Bloomington, Bloomington, Indiana, USA
| |
Collapse
|
8
|
Rolls ET, Yan X, Deco G, Zhang Y, Jousmaki V, Feng J. A ventromedial visual cortical 'Where' stream to the human hippocampus for spatial scenes revealed with magnetoencephalography. Commun Biol 2024; 7:1047. [PMID: 39183244 PMCID: PMC11345434 DOI: 10.1038/s42003-024-06719-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 08/12/2024] [Indexed: 08/27/2024] Open
Abstract
The primate including the human hippocampus implicated in episodic memory and navigation represents a spatial view, very different from the place representations in rodents. To understand this system in humans, and the computations performed, the pathway for this spatial view information to reach the hippocampus was analysed in humans. Whole-brain effective connectivity was measured with magnetoencephalography between 30 visual cortical regions and 150 other cortical regions using the HCP-MMP1 atlas in 21 participants while performing a 0-back scene memory task. In a ventromedial visual stream, V1-V4 connect to the ProStriate region where the retrosplenial scene area is located. The ProStriate region has connectivity to ventromedial visual regions VMV1-3 and VVC. These ventromedial regions connect to the medial parahippocampal region PHA1-3, which, with the VMV regions, include the parahippocampal scene area. The medial parahippocampal regions have effective connectivity to the entorhinal cortex, perirhinal cortex, and hippocampus. In contrast, when viewing faces, the effective connectivity was more through a ventrolateral visual cortical stream via the fusiform face cortex to the inferior temporal visual cortex regions TE2p and TE2a. A ventromedial visual cortical 'Where' stream to the hippocampus for spatial scenes was supported by diffusion topography in 171 HCP participants at 7 T.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK.
- Department of Computer Science, University of Warwick, Coventry, UK.
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai, China.
| | - Xiaoqian Yan
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai, China
| | - Gustavo Deco
- Department of Information and Communication Technologies, Center for Brain and Cognition, Computational Neuroscience Group, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de la Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona, Spain
| | - Yi Zhang
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai, China
| | - Veikko Jousmaki
- Aalto NeuroImaging, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry, UK
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai, China
| |
Collapse
|
9
|
Rolls ET. Two what, two where, visual cortical streams in humans. Neurosci Biobehav Rev 2024; 160:105650. [PMID: 38574782 DOI: 10.1016/j.neubiorev.2024.105650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 03/25/2024] [Accepted: 03/31/2024] [Indexed: 04/06/2024]
Abstract
ROLLS, E. T. Two What, Two Where, Visual Cortical Streams in Humans. NEUROSCI BIOBEHAV REV 2024. Recent cortical connectivity investigations lead to new concepts about 'What' and 'Where' visual cortical streams in humans, and how they connect to other cortical systems. A ventrolateral 'What' visual stream leads to the inferior temporal visual cortex for object and face identity, and provides 'What' information to the hippocampal episodic memory system, the anterior temporal lobe semantic system, and the orbitofrontal cortex emotion system. A superior temporal sulcus (STS) 'What' visual stream utilising connectivity from the temporal and parietal visual cortex responds to moving objects and faces, and face expression, and connects to the orbitofrontal cortex for emotion and social behaviour. A ventromedial 'Where' visual stream builds feature combinations for scenes, and provides 'Where' inputs via the parahippocampal scene area to the hippocampal episodic memory system that are also useful for landmark-based navigation. The dorsal 'Where' visual pathway to the parietal cortex provides for actions in space, but also provides coordinate transforms to provide inputs to the parahippocampal scene area for self-motion update of locations in scenes in the dark or when the view is obscured.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK; Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK; Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China.
| |
Collapse
|
10
|
Matteucci G, Piasini E, Zoccolan D. Unsupervised learning of mid-level visual representations. Curr Opin Neurobiol 2024; 84:102834. [PMID: 38154417 DOI: 10.1016/j.conb.2023.102834] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/30/2023]
Abstract
Recently, a confluence between trends in neuroscience and machine learning has brought a renewed focus on unsupervised learning, where sensory processing systems learn to exploit the statistical structure of their inputs in the absence of explicit training targets or rewards. Sophisticated experimental approaches have enabled the investigation of the influence of sensory experience on neural self-organization and its synaptic bases. Meanwhile, novel algorithms for unsupervised and self-supervised learning have become increasingly popular both as inspiration for theories of the brain, particularly for the function of intermediate visual cortical areas, and as building blocks of real-world learning machines. Here we review some of these recent developments, placing them in historical context and highlighting some research lines that promise exciting breakthroughs in the near future.
Collapse
Affiliation(s)
- Giulio Matteucci
- Department of Basic Neurosciences, University of Geneva, Geneva, 1206, Switzerland. https://twitter.com/giulio_matt
| | - Eugenio Piasini
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy
| | - Davide Zoccolan
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy.
| |
Collapse
|
11
|
Deng Z, Xie W, Zhang C, Wang C, Zhu F, Xie R, Chen J. Development of the mirror-image sensitivity for different object categories-Evidence from the mirror costs of object images in children and adults. J Vis 2023; 23:9. [PMID: 37971767 PMCID: PMC10664729 DOI: 10.1167/jov.23.13.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 10/21/2023] [Indexed: 11/19/2023] Open
Abstract
Object recognition relies on a multitude of factors, including size, orientation, and so on. Mirrored orientation, particularly due to children's mirror confusion in reading, holds special significance among various object orientations. Brain imaging studies suggest that the visual ventral and dorsal streams exhibit distinct orientation sensitivity across diverse object categories. Yet, it remains unclear whether mirror orientation sensitivity also varies among these categories during development at the behavioral level. Here, we explored the mirror sensitivity of children and adults across five distinct categories, which encompass tools that activate both the visual ventral stream for function information and the dorsal stream for manipulation information, and animals and faces that mainly activate the ventral stream. Two types of symbols, letters and Chinese characters, were also included. Mirror sensitivity was assessed through mirror costs-that is, the additional reaction time or error rate in the mirrored versus the same orientation condition when judging the identity of object pairs. The mirror costs in reaction times and error rates consistently revealed that children exhibited null mirror costs for tools, and the mirror costs for tools in adults were minimal, if any, and were smaller than those for letters and characters. The mirror costs reflected in absolute reaction time and error rate were similar across adults and children, but when the overall difference in reaction times was considered, adults showed a larger mirror cost than children. Overall, our investigation unveils categorical distinctions and development in mirror sensitivity of object recognition across the ventral and dorsal streams.
Collapse
Affiliation(s)
- Zhiqing Deng
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Weili Xie
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Can Zhang
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Can Wang
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Fuying Zhu
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Ran Xie
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
| | - Juan Chen
- Center for the Study of Applied Psychology, Guangdong Key Laboratory of Mental Health and Cognitive Science, and the School of Psychology, South China Normal University, Guangzhou, Guangdong Province, China
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, Guangdong Province, China
| |
Collapse
|
12
|
Rolls ET, Deco G, Zhang Y, Feng J. Hierarchical organization of the human ventral visual streams revealed with magnetoencephalography. Cereb Cortex 2023; 33:10686-10701. [PMID: 37689834 DOI: 10.1093/cercor/bhad318] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/17/2023] [Accepted: 08/17/2023] [Indexed: 09/11/2023] Open
Abstract
The hierarchical organization between 25 ventral stream visual cortical regions and 180 cortical regions was measured with magnetoencephalography using the Human Connectome Project Multimodal Parcellation atlas in 83 Human Connectome Project participants performing a visual memory task. The aim was to reveal the hierarchical organization using a whole-brain model based on generative effective connectivity with this fast neuroimaging method. V1-V4 formed a first group of interconnected regions. Especially V4 had connectivity to a ventrolateral visual stream: V8, the fusiform face cortex, and posterior inferior temporal cortex PIT. These regions in turn had effectivity connectivity to inferior temporal cortex visual regions TE2p and TE1p. TE2p and TE1p then have connectivity to anterior temporal lobe regions TE1a, TE1m, TE2a, and TGv, which are multimodal. In a ventromedial visual stream, V1-V4 connect to ventromedial regions VMV1-3 and VVC. VMV1-3 and VVC connect to the medial parahippocampal gyrus PHA1-3, which, with the VMV regions, include the parahippocampal scene area. The medial parahippocampal PHA1-3 regions have connectivity to the hippocampal system regions the perirhinal cortex, entorhinal cortex, and hippocampus. These effective connectivities of two ventral visual cortical streams measured with magnetoencephalography provide support to the hierarchical organization of brain systems measured with fMRI, and new evidence on directionality.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Gustavo Deco
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain
- Brain and Cognition, Pompeu Fabra University, Barcelona 08018, Spain
- Institució Catalana de la Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona 08010, Spain
| | - Yi Zhang
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| |
Collapse
|
13
|
Farahat A, Effenberger F, Vinck M. A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations. Neural Netw 2023; 167:400-414. [PMID: 37673027 PMCID: PMC7616855 DOI: 10.1016/j.neunet.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 07/07/2023] [Accepted: 08/13/2023] [Indexed: 09/08/2023]
Abstract
Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.
Collapse
Affiliation(s)
- Amr Farahat
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands.
| | - Felix Effenberger
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Frankfurt Institute for Advanced Studies, Frankfurt, Germany
| | - Martin Vinck
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Donders Centre for Neuroscience, Department of Neuroinformatics, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
14
|
Pusch R, Clark W, Rose J, Güntürkün O. Visual categories and concepts in the avian brain. Anim Cogn 2023; 26:153-173. [PMID: 36352174 PMCID: PMC9877096 DOI: 10.1007/s10071-022-01711-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 11/11/2022]
Abstract
Birds are excellent model organisms to study perceptual categorization and concept formation. The renewed focus on avian neuroscience has sparked an explosion of new data in the field. At the same time, our understanding of sensory and particularly visual structures in the avian brain has shifted fundamentally. These recent discoveries have revealed how categorization is mediated in the avian brain and has generated a theoretical framework that goes beyond the realm of birds. We review the contribution of avian categorization research-at the methodical, behavioral, and neurobiological levels. To this end, we first introduce avian categorization from a behavioral perspective and the common elements model of categorization. Second, we describe the functional and structural organization of the avian visual system, followed by an overview of recent anatomical discoveries and the new perspective on the avian 'visual cortex'. Third, we focus on the neurocomputational basis of perceptual categorization in the bird's visual system. Fourth, an overview of the avian prefrontal cortex and the prefrontal contribution to perceptual categorization is provided. The fifth section outlines how asymmetries of the visual system contribute to categorization. Finally, we present a mechanistic view of the neural principles of avian visual categorization and its putative extension to concept learning.
Collapse
Affiliation(s)
- Roland Pusch
- Biopsychology, Faculty of Psychology, Ruhr University Bochum, 44780, Bochum, Germany
| | - William Clark
- Neural Basis of Learning, Faculty of Psychology, Ruhr University Bochum, 44780, Bochum, Germany
| | - Jonas Rose
- Neural Basis of Learning, Faculty of Psychology, Ruhr University Bochum, 44780, Bochum, Germany
| | - Onur Güntürkün
- Biopsychology, Faculty of Psychology, Ruhr University Bochum, 44780, Bochum, Germany.
| |
Collapse
|
15
|
Tesileanu T, Piasini E, Balasubramanian V. Efficient processing of natural scenes in visual cortex. Front Cell Neurosci 2022; 16:1006703. [PMID: 36545653 PMCID: PMC9760692 DOI: 10.3389/fncel.2022.1006703] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
Neural circuits in the periphery of the visual, auditory, and olfactory systems are believed to use limited resources efficiently to represent sensory information by adapting to the statistical structure of the natural environment. This "efficient coding" principle has been used to explain many aspects of early visual circuits including the distribution of photoreceptors, the mosaic geometry and center-surround structure of retinal receptive fields, the excess OFF pathways relative to ON pathways, saccade statistics, and the structure of simple cell receptive fields in V1. We know less about the extent to which such adaptations may occur in deeper areas of cortex beyond V1. We thus review recent developments showing that the perception of visual textures, which depends on processing in V2 and beyond in mammals, is adapted in rats and humans to the multi-point statistics of luminance in natural scenes. These results suggest that central circuits in the visual brain are adapted for seeing key aspects of natural scenes. We conclude by discussing how adaptation to natural temporal statistics may aid in learning and representing visual objects, and propose two challenges for the future: (1) explaining the distribution of shape sensitivity in the ventral visual stream from the statistics of object shape in natural images, and (2) explaining cell types of the vertebrate retina in terms of feature detectors that are adapted to the spatio-temporal structures of natural stimuli. We also discuss how new methods based on machine learning may complement the normative, principles-based approach to theoretical neuroscience.
Collapse
Affiliation(s)
- Tiberiu Tesileanu
- Center for Computational Neuroscience, Flatiron Institute, New York, NY, United States
| | - Eugenio Piasini
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Vijay Balasubramanian
- Department of Physics and Astronomy, David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, PA, United States
- Santa Fe Institute, Santa Fe, NM, United States
| |
Collapse
|
16
|
Schyns PG, Snoek L, Daube C. Degrees of algorithmic equivalence between the brain and its DNN models. Trends Cogn Sci 2022; 26:1090-1102. [PMID: 36216674 DOI: 10.1016/j.tics.2022.09.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 11/11/2022]
Abstract
Deep neural networks (DNNs) have become powerful and increasingly ubiquitous tools to model human cognition, and often produce similar behaviors. For example, with their hierarchical, brain-inspired organization of computations, DNNs apparently categorize real-world images in the same way as humans do. Does this imply that their categorization algorithms are also similar? We have framed the question with three embedded degrees that progressively constrain algorithmic similarity evaluations: equivalence of (i) behavioral/brain responses, which is current practice, (ii) the stimulus features that are processed to produce these outcomes, which is more constraining, and (iii) the algorithms that process these shared features, the ultimate goal. To improve DNNs as models of cognition, we develop for each degree an increasingly constrained benchmark that specifies the epistemological conditions for the considered equivalence.
Collapse
Affiliation(s)
- Philippe G Schyns
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK.
| | - Lukas Snoek
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK
| | - Christoph Daube
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK
| |
Collapse
|
17
|
Cheon J, Baek S, Paik SB. Invariance of object detection in untrained deep neural networks. Front Comput Neurosci 2022; 16:1030707. [DOI: 10.3389/fncom.2022.1030707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
The ability to perceive visual objects with various types of transformations, such as rotation, translation, and scaling, is crucial for consistent object recognition. In machine learning, invariant object detection for a network is often implemented by augmentation with a massive number of training images, but the mechanism of invariant object detection in biological brains—how invariance arises initially and whether it requires visual experience—remains elusive. Here, using a model neural network of the hierarchical visual pathway of the brain, we show that invariance of object detection can emerge spontaneously in the complete absence of learning. First, we found that units selective to a particular object class arise in randomly initialized networks even before visual training. Intriguingly, these units show robust tuning to images of each object class under a wide range of image transformation types, such as viewpoint rotation. We confirmed that this “innate” invariance of object selectivity enables untrained networks to perform an object-detection task robustly, even with images that have been significantly modulated. Our computational model predicts that invariant object tuning originates from combinations of non-invariant units via random feedforward projections, and we confirmed that the predicted profile of feedforward projections is observed in untrained networks. Our results suggest that invariance of object detection is an innate characteristic that can emerge spontaneously in random feedforward networks.
Collapse
|
18
|
Zhang M, Armendariz M, Xiao W, Rose O, Bendtz K, Livingstone M, Ponce C, Kreiman G. Look twice: A generalist computational model predicts return fixations across tasks and species. PLoS Comput Biol 2022; 18:e1010654. [PMID: 36413523 PMCID: PMC9681066 DOI: 10.1371/journal.pcbi.1010654] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 10/13/2022] [Indexed: 11/23/2022] Open
Abstract
Primates constantly explore their surroundings via saccadic eye movements that bring different parts of an image into high resolution. In addition to exploring new regions in the visual field, primates also make frequent return fixations, revisiting previously foveated locations. We systematically studied a total of 44,328 return fixations out of 217,440 fixations. Return fixations were ubiquitous across different behavioral tasks, in monkeys and humans, both when subjects viewed static images and when subjects performed natural behaviors. Return fixations locations were consistent across subjects, tended to occur within short temporal offsets, and typically followed a 180-degree turn in saccadic direction. To understand the origin of return fixations, we propose a proof-of-principle, biologically-inspired and image-computable neural network model. The model combines five key modules: an image feature extractor, bottom-up saliency cues, task-relevant visual features, finite inhibition-of-return, and saccade size constraints. Even though there are no free parameters that are fine-tuned for each specific task, species, or condition, the model produces fixation sequences resembling the universal properties of return fixations. These results provide initial steps towards a mechanistic understanding of the trade-off between rapid foveal recognition and the need to scrutinize previous fixation locations.
Collapse
Affiliation(s)
- Mengmi Zhang
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- CFAR and I2R, Agency for Science, Technology and Research, Singapore
| | - Marcelo Armendariz
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium
| | - Will Xiao
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Olivia Rose
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Katarina Bendtz
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
| | - Margaret Livingstone
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Carlos Ponce
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gabriel Kreiman
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
19
|
Bowen EFW, Rodriguez AM, Sowinski DR, Granger R. Visual stream connectivity predicts assessments of image quality. J Vis 2022; 22:4. [PMID: 36219145 PMCID: PMC9580224 DOI: 10.1167/jov.22.11.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Despite extensive study of early vision, new and unexpected mechanisms continue to be identified. We introduce a novel formal treatment of the psychophysics of image similarity, derived directly from straightforward connectivity patterns in early visual pathways. The resulting differential geometry formulation is shown to provide accurate and explanatory accounts of human perceptual similarity judgments. The direct formal predictions are then shown to be further improved via simple regression on human behavioral reports, which in turn are used to construct more elaborate hypothesized neural connectivity patterns. It is shown that the predictive approaches introduced here outperform a standard successful published measure of perceived image fidelity; moreover, the approach provides clear explanatory principles of these similarity findings.
Collapse
Affiliation(s)
- Elijah F W Bowen
- Brain Engineering Laboratory, Department of Psychological and Brain Sciences, Dartmouth, Hanover, NH, USA.,
| | - Antonio M Rodriguez
- Brain Engineering Laboratory, Department of Psychological and Brain Sciences, Dartmouth, Hanover, NH, USA.,
| | - Damian R Sowinski
- Brain Engineering Laboratory, Department of Psychological and Brain Sciences, Dartmouth, Hanover, NH, USA.,
| | - Richard Granger
- Brain Engineering Laboratory, Department of Psychological and Brain Sciences, Dartmouth, Hanover, NH, USA.,
| |
Collapse
|
20
|
Rolls ET, Deco G, Huang CC, Feng J. Multiple cortical visual streams in humans. Cereb Cortex 2022; 33:3319-3349. [PMID: 35834308 DOI: 10.1093/cercor/bhac276] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/16/2022] [Accepted: 06/17/2022] [Indexed: 11/14/2022] Open
Abstract
The effective connectivity between 55 visual cortical regions and 360 cortical regions was measured in 171 HCP participants using the HCP-MMP atlas, and complemented with functional connectivity and diffusion tractography. A Ventrolateral Visual "What" Stream for object and face recognition projects hierarchically to the inferior temporal visual cortex, which projects to the orbitofrontal cortex for reward value and emotion, and to the hippocampal memory system. A Ventromedial Visual "Where" Stream for scene representations connects to the parahippocampal gyrus and hippocampus. An Inferior STS (superior temporal sulcus) cortex Semantic Stream receives from the Ventrolateral Visual Stream, from visual inferior parietal PGi, and from the ventromedial-prefrontal reward system and connects to language systems. A Dorsal Visual Stream connects via V2 and V3A to MT+ Complex regions (including MT and MST), which connect to intraparietal regions (including LIP, VIP and MIP) involved in visual motion and actions in space. It performs coordinate transforms for idiothetic update of Ventromedial Stream scene representations. A Superior STS cortex Semantic Stream receives visual inputs from the Inferior STS Visual Stream, PGi, and STV, and auditory inputs from A5, is activated by face expression, motion and vocalization, and is important in social behaviour, and connects to language systems.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom.,Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Gustavo Deco
- Computational Neuroscience Group, Department of Information and Communication Technologies, Center for Brain and Cognition, Universitat Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain.,Brain and Cognition, Pompeu Fabra University, Barcelona 08018, Spain.,Institució Catalana de la Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona 08010, Spain
| | - Chu-Chung Huang
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200602, China.,Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 200602, China
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom.,Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| |
Collapse
|
21
|
Pegado F. Written Language Acquisition Is Both Shaped by and Has an Impact on Brain Functioning and Cognition. Front Hum Neurosci 2022; 16:819956. [PMID: 35754773 PMCID: PMC9226919 DOI: 10.3389/fnhum.2022.819956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 05/06/2022] [Indexed: 11/18/2022] Open
Abstract
Spoken language is a distinctive trace of our species and it is naturally acquired during infancy. Written language, in contrast, is artificial, and the correspondences between arbitrary visual symbols and the spoken language for reading and writing should be explicitly learned with external help. In this paper, I present several examples of how written language acquisition is both shaped by and has an impact on brain function and cognition. They show in one hand how our phylogenetic legacy influences education and on the other hand how ontogenetic needs for education can rapidly subdue deeply rooted neurocognitive mechanisms. The understanding of this bidirectional influences provides a more dynamic view of how plasticity interfaces phylogeny and ontogeny in human learning, with implications for both neurosciences and education.
Collapse
Affiliation(s)
- Felipe Pegado
- Aix-Marseille University, CNRS, LPC, Marseille, France
| |
Collapse
|
22
|
Benucci A. Motor-related signals support localization invariance for stable visual perception. PLoS Comput Biol 2022; 18:e1009928. [PMID: 35286305 PMCID: PMC8947590 DOI: 10.1371/journal.pcbi.1009928] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 03/24/2022] [Accepted: 02/16/2022] [Indexed: 11/19/2022] Open
Abstract
Our ability to perceive a stable visual world in the presence of continuous movements of the body, head, and eyes has puzzled researchers in the neuroscience field for a long time. We reformulated this problem in the context of hierarchical convolutional neural networks (CNNs)-whose architectures have been inspired by the hierarchical signal processing of the mammalian visual system-and examined perceptual stability as an optimization process that identifies image-defining features for accurate image classification in the presence of movements. Movement signals, multiplexed with visual inputs along overlapping convolutional layers, aided classification invariance of shifted images by making the classification faster to learn and more robust relative to input noise. Classification invariance was reflected in activity manifolds associated with image categories emerging in late CNN layers and with network units acquiring movement-associated activity modulations as observed experimentally during saccadic eye movements. Our findings provide a computational framework that unifies a multitude of biological observations on perceptual stability under optimality principles for image classification in artificial neural networks.
Collapse
Affiliation(s)
- Andrea Benucci
- RIKEN Center for Brain Science, Wako-shi, Japan
- University of Tokyo, Graduate School of Information Science and Technology, Department of Mathematical Informatics, Tokyo, Japan
| |
Collapse
|
23
|
|
24
|
Wood SMW, Wood JN. Distorting Face Representations in Newborn Brains. Cogn Sci 2021; 45:e13021. [PMID: 34379331 DOI: 10.1111/cogs.13021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 06/08/2021] [Accepted: 06/24/2021] [Indexed: 11/29/2022]
Abstract
What role does experience play in the development of face recognition? A growing body of evidence indicates that newborn brains need slowly changing visual experiences to develop accurate visual recognition abilities. All of the work supporting this "slowness constraint" on visual development comes from studies testing basic-level object recognition. Here, we present the results of controlled-rearing experiments that provide evidence for a slowness constraint on the development of face recognition, a prototypical subordinate-level object recognition task. We found that (1) newborn chicks can rapidly develop view-invariant face recognition and (2) the development of this ability relies on experience with slowly moving faces. When chicks were reared with quickly moving faces, they built distorted face representations that largely lacked invariance to viewpoint changes, effectively "breaking" their face recognition abilities. These results provide causal evidence that slowly changing visual experiences play a critical role in the development of face recognition, akin to basic-level object recognition. Thus, face recognition is not a hardwired property of vision but is learned rapidly as the visual system adapts to the temporal structure of the animal's visual environment.
Collapse
Affiliation(s)
| | - Justin N Wood
- Informatics Department, Indiana University.,Center for the Integrated Study of Animal Behavior, Indiana University.,Cognitive Science Program, Indiana University.,Department of Neuroscience, Indiana University
| |
Collapse
|
25
|
Rolls ET. The connections of neocortical pyramidal cells can implement the learning of new categories, attractor memory, and top-down recall and attention. Brain Struct Funct 2021; 226:2523-2536. [PMID: 34347165 PMCID: PMC8448704 DOI: 10.1007/s00429-021-02347-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 07/19/2021] [Indexed: 11/17/2022]
Abstract
Neocortical pyramidal cells have three key classes of excitatory input: forward inputs from the previous cortical area (or thalamus); recurrent collateral synapses from nearby pyramidal cells; and backprojection inputs from the following cortical area. The neocortex performs three major types of computation: (1) unsupervised learning of new categories, by allocating neurons to respond to combinations of inputs from the preceding cortical stage, which can be performed using competitive learning; (2) short-term memory, which can be performed by an attractor network using the recurrent collaterals; and (3) recall of what has been learned by top–down backprojections from the following cortical area. There is only one type of excitatory neuron involved, pyramidal cells, with these three types of input. It is proposed, and tested by simulations of a neuronal network model, that pyramidal cells can implement all three types of learning simultaneously, and can subsequently usefully categorise the forward inputs; keep them active in short-term memory; and later recall the representations using the backprojection input. This provides a new approach to understanding how one type of excitatory neuron in the neocortex can implement these three major types of computation, and provides a conceptual advance in understanding how the cerebral neocortex may work.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK. .,Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK.
| |
Collapse
|
26
|
Piasini E, Soltuzu L, Muratore P, Caramellino R, Vinken K, Op de Beeck H, Balasubramanian V, Zoccolan D. Temporal stability of stimulus representation increases along rodent visual cortical hierarchies. Nat Commun 2021; 12:4448. [PMID: 34290247 PMCID: PMC8295255 DOI: 10.1038/s41467-021-24456-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 06/14/2021] [Indexed: 11/09/2022] Open
Abstract
Cortical representations of brief, static stimuli become more invariant to identity-preserving transformations along the ventral stream. Likewise, increased invariance along the visual hierarchy should imply greater temporal persistence of temporally structured dynamic stimuli, possibly complemented by temporal broadening of neuronal receptive fields. However, such stimuli could engage adaptive and predictive processes, whose impact on neural coding dynamics is unknown. By probing the rat analog of the ventral stream with movies, we uncovered a hierarchy of temporal scales, with deeper areas encoding visual information more persistently. Furthermore, the impact of intrinsic dynamics on the stability of stimulus representations grew gradually along the hierarchy. A database of recordings from mouse showed similar trends, additionally revealing dependencies on the behavioral state. Overall, these findings show that visual representations become progressively more stable along rodent visual processing hierarchies, with an important contribution provided by intrinsic processing.
Collapse
Affiliation(s)
- Eugenio Piasini
- Computational Neuroscience Initiative, University of Pennsylvania, Philadelphia, PA, United States
| | - Liviu Soltuzu
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
- Blue Brain Project, École polytechnique fédérale de Lausanne (EPFL), Campus Biotech, Geneva, Switzerland
| | - Paolo Muratore
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
| | - Riccardo Caramellino
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
| | - Kasper Vinken
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Hans Op de Beeck
- Department of Brain and Cognition, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Vijay Balasubramanian
- Computational Neuroscience Initiative, University of Pennsylvania, Philadelphia, PA, United States
| | - Davide Zoccolan
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy.
| |
Collapse
|
27
|
Rolls ET. Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning. Front Comput Neurosci 2021; 15:686239. [PMID: 34366818 PMCID: PMC8335547 DOI: 10.3389/fncom.2021.686239] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
First, neurophysiological evidence for the learning of invariant representations in the inferior temporal visual cortex is described. This includes object and face representations with invariance for position, size, lighting, view and morphological transforms in the temporal lobe visual cortex; global object motion in the cortex in the superior temporal sulcus; and spatial view representations in the hippocampus that are invariant with respect to eye position, head direction, and place. Second, computational mechanisms that enable the brain to learn these invariant representations are proposed. For the ventral visual system, one key adaptation is the use of information available in the statistics of the environment in slow unsupervised learning to learn transform-invariant representations of objects. This contrasts with deep supervised learning in artificial neural networks, which uses training with thousands of exemplars forced into different categories by neuronal teachers. Similar slow learning principles apply to the learning of global object motion in the dorsal visual system leading to the cortex in the superior temporal sulcus. The learning rule that has been explored in VisNet is an associative rule with a short-term memory trace. The feed-forward architecture has four stages, with convergence from stage to stage. This type of slow learning is implemented in the brain in hierarchically organized competitive neuronal networks with convergence from stage to stage, with only 4-5 stages in the hierarchy. Slow learning is also shown to help the learning of coordinate transforms using gain modulation in the dorsal visual system extending into the parietal cortex and retrosplenial cortex. Representations are learned that are in allocentric spatial view coordinates of locations in the world and that are independent of eye position, head direction, and the place where the individual is located. This enables hippocampal spatial view cells to use idiothetic, self-motion, signals for navigation when the view details are obscured for short periods.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
28
|
Bornet A, Doerig A, Herzog MH, Francis G, Van der Burg E. Shrinking Bouma's window: How to model crowding in dense displays. PLoS Comput Biol 2021; 17:e1009187. [PMID: 34228703 PMCID: PMC8284675 DOI: 10.1371/journal.pcbi.1009187] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 07/16/2021] [Accepted: 06/16/2021] [Indexed: 11/22/2022] Open
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Traditionally, it is thought that visual crowding obeys Bouma's law, i.e., all elements within a certain distance interfere with the target, and that adding more elements always leads to stronger crowding. Crowding is predominantly studied using sparse displays (a target surrounded by a few flankers). However, many studies have shown that this approach leads to wrong conclusions about human vision. Van der Burg and colleagues proposed a paradigm to measure crowding in dense displays using genetic algorithms. Displays were selected and combined over several generations to maximize human performance. In contrast to Bouma's law, only the target's nearest neighbours affected performance. Here, we tested various models to explain these results. We used the same genetic algorithm, but instead of selecting displays based on human performance we selected displays based on the model's outputs. We found that all models based on the traditional feedforward pooling framework of vision were unable to reproduce human behaviour. In contrast, all models involving a dedicated grouping stage explained the results successfully. We show how traditional models can be improved by adding a grouping stage.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Gregory Francis
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Erik Van der Burg
- TNO, Human Factors, Soesterberg, The Netherlands
- Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
29
|
Abstract
Cognition is often defined as a dual process of physical and non-physical mechanisms. This duality originated from past theory on the constituent parts of the natural world. Even though material causation is not an explanation for all natural processes, phenomena at the cellular level of life are modeled by physical causes. These phenomena include explanations for the function of organ systems, including the nervous system and information processing in the cerebrum. This review restricts the definition of cognition to a mechanistic process and enlists studies that support an abstract set of proximate mechanisms. Specifically, this process is approached from a large-scale perspective, the flow of information in a neural system. Study at this scale further constrains the possible explanations for cognition since the information flow is amenable to theory, unlike a lower-level approach where the problem becomes intractable. These possible hypotheses include stochastic processes for explaining the processes of cognition along with principles that support an abstract format for the encoded information.
Collapse
|
30
|
Parr T, Sajid N, Da Costa L, Mirza MB, Friston KJ. Generative Models for Active Vision. Front Neurorobot 2021; 15:651432. [PMID: 33927605 PMCID: PMC8076554 DOI: 10.3389/fnbot.2021.651432] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/15/2021] [Indexed: 11/13/2022] Open
Abstract
The active visual system comprises the visual cortices, cerebral attention networks, and oculomotor system. While fascinating in its own right, it is also an important model for sensorimotor networks in general. A prominent approach to studying this system is active inference-which assumes the brain makes use of an internal (generative) model to predict proprioceptive and visual input. This approach treats action as ensuring sensations conform to predictions (i.e., by moving the eyes) and posits that visual percepts are the consequence of updating predictions to conform to sensations. Under active inference, the challenge is to identify the form of the generative model that makes these predictions-and thus directs behavior. In this paper, we provide an overview of the generative models that the brain must employ to engage in active vision. This means specifying the processes that explain retinal cell activity and proprioceptive information from oculomotor muscle fibers. In addition to the mechanics of the eyes and retina, these processes include our choices about where to move our eyes. These decisions rest upon beliefs about salient locations, or the potential for information gain and belief-updating. A key theme of this paper is the relationship between "looking" and "seeing" under the brain's implicit generative model of the visual world.
Collapse
Affiliation(s)
- Thomas Parr
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| | - Noor Sajid
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - M. Berk Mirza
- Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Karl J. Friston
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| |
Collapse
|
31
|
Nam Y, Sato T, Uchida G, Malakhova E, Ullman S, Tanifuji M. View-tuned and view-invariant face encoding in IT cortex is explained by selected natural image fragments. Sci Rep 2021; 11:7827. [PMID: 33837223 PMCID: PMC8035202 DOI: 10.1038/s41598-021-86842-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 03/08/2021] [Indexed: 11/24/2022] Open
Abstract
Humans recognize individual faces regardless of variation in the facial view. The view-tuned face neurons in the inferior temporal (IT) cortex are regarded as the neural substrate for view-invariant face recognition. This study approximated visual features encoded by these neurons as combinations of local orientations and colors, originated from natural image fragments. The resultant features reproduced the preference of these neurons to particular facial views. We also found that faces of one identity were separable from the faces of other identities in a space where each axis represented one of these features. These results suggested that view-invariant face representation was established by combining view sensitive visual features. The face representation with these features suggested that, with respect to view-invariant face representation, the seemingly complex and deeply layered ventral visual pathway can be approximated via a shallow network, comprised of layers of low-level processing for local orientations and colors (V1/V2-level) and the layers which detect particular sets of low-level elements derived from natural image fragments (IT-level).
Collapse
Affiliation(s)
- Yunjun Nam
- Laboratory for Integrative Neural Systems, RIKEN Center for Brain Science, Wako-shi, Saitama, Japan
| | - Takayuki Sato
- Research Promotion Division, Fukushima University, Fukushima, Japan
| | - Go Uchida
- Laboratory for Integrative Neural Systems, RIKEN Center for Brain Science, Wako-shi, Saitama, Japan
| | - Ekaterina Malakhova
- Lab. Physiology of Vision, Pavlov Institute of Physiology, Saint-Petersburg, Russia
| | - Shimon Ullman
- Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel
| | - Manabu Tanifuji
- Laboratory for Integrative Neural Systems, RIKEN Center for Brain Science, Wako-shi, Saitama, Japan. .,Department of Life Science and Medical Bio-Science, Faculty of Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan.
| |
Collapse
|
32
|
Wang RH, Dai L, Okamura JY, Fuchida T, Wang G. Object discrimination performance and dynamics evaluated by inferotemporal cell population activity. IBRO Neurosci Rep 2021; 10:171-177. [PMID: 33842920 PMCID: PMC8019996 DOI: 10.1016/j.ibneur.2021.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 02/24/2021] [Indexed: 11/15/2022] Open
Abstract
We have previously reported an increase in response tolerance of inferotemporal cells around trained views. However, an inferotemporal cell usually displays different response patterns in an initial response phase immediately after the stimulus onset and in a late phase from approximately 260 ms after stimulus onset. This study aimed to understand the difference between the two time periods and their involvement in the view-invariant object recognition. Responses to object images with and without prior experience of object discrimination across views, recorded by microelectrodes, were pooled together from our previous experiments. With a machine learning algorithm, we trained to build classifiers for object discrimination. In the early phase, the performance of classifiers created based on data of responses to the object images with prior training of object discrimination across views did not significantly differ from that based on data of responses to the object images without prior experience of object discrimination across views. However, the performance was significantly better in the late phase. Furthermore, compared to the preferred stimulus image in the early phase, we found 2/3 of cells changed their preference in the late phase. For object images with prior experience of training with object discrimination across views, a significant higher percentage of cells responded in the late phase to the same objects as in the early phase, but under different views. The results demonstrate the dynamics of selectivity changes and suggest the involvement of the late phase in the view-invariant object recognition rather than that of the early phase. Inferotemporal cells respond to the presentation of stimulus image in two phases. Two thirds of cells change their stimulus preferences between the two phases. Object discrimination leads better performance of classifier in the late phase.
Collapse
Affiliation(s)
- Ridey H Wang
- Dept. of Bioengineering, Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-0065, Japan
| | - Lulin Dai
- Dept. of Bioengineering, Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-0065, Japan
| | - Jun-Ya Okamura
- Dept. of Bioengineering, Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-0065, Japan
| | - Takayasu Fuchida
- Dept. of Bioengineering, Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-0065, Japan
| | - Gang Wang
- Dept. of Bioengineering, Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-0065, Japan
| |
Collapse
|
33
|
Friedman R. Themes of advanced information processing in the primate brain. AIMS Neurosci 2020; 7:373-388. [PMID: 33263076 PMCID: PMC7701368 DOI: 10.3934/neuroscience.2020023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 10/09/2020] [Indexed: 11/30/2022] Open
Abstract
Here is a review of several empirical examples of information processing that occur in the primate cerebral cortex. These include visual processing, object identification and perception, information encoding, and memory. Also, there is a discussion of the higher scale neural organization, mainly theoretical, which suggests hypotheses on how the brain internally represents objects. Altogether they support the general attributes of the mechanisms of brain computation, such as efficiency, resiliency, data compression, and a modularization of neural function and their pathways. Moreover, the specific neural encoding schemes are expectedly stochastic, abstract and not easily decoded by theoretical or empirical approaches.
Collapse
Affiliation(s)
- Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
34
|
Spoerer CJ, Kietzmann TC, Mehrer J, Charest I, Kriegeskorte N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput Biol 2020; 16:e1008215. [PMID: 33006992 PMCID: PMC7556458 DOI: 10.1371/journal.pcbi.1008215] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/14/2020] [Accepted: 08/03/2020] [Indexed: 11/18/2022] Open
Abstract
Deep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model's reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.
Collapse
Affiliation(s)
- Courtney J. Spoerer
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Tim C. Kietzmann
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Johannes Mehrer
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Ian Charest
- School of Psychology and Centre for Human Brain Health, University of Birmingham, United Kingdom
| | - Nikolaus Kriegeskorte
- Department of Psychology, Department of Neuroscience, Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| |
Collapse
|
35
|
Deep learning and cognitive science. Cognition 2020; 203:104365. [PMID: 32563082 DOI: 10.1016/j.cognition.2020.104365] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 11/22/2022]
Abstract
In recent years, the family of algorithms collected under the term "deep learning" has revolutionized artificial intelligence, enabling machines to reach human-like performances in many complex cognitive tasks. Although deep learning models are grounded in the connectionist paradigm, their recent advances were basically developed with engineering goals in mind. Despite of their applied focus, deep learning models eventually seem fruitful for cognitive purposes. This can be thought as a kind of biological exaptation, where a physiological structure becomes applicable for a function different from that for which it was selected. In this paper, it will be argued that it is time for cognitive science to seriously come to terms with deep learning, and we try to spell out the reasons why this is the case. First, the path of the evolution of deep learning from the connectionist project is traced, demonstrating the remarkable continuity, and the differences as well. Then, it will be considered how deep learning models can be useful for many cognitive topics, especially those where it has achieved performance comparable to humans, from perception to language. It will be maintained that deep learning poses questions that cognitive sciences should try to answer. One of such questions is the reasons why deep convolutional models that are disembodied, inactive, unaware of context, and static, are by far the closest to the patterns of activation in the brain visual system.
Collapse
|
36
|
Wardle SG, Baker C. Recent advances in understanding object recognition in the human brain: deep neural networks, temporal dynamics, and context. F1000Res 2020; 9. [PMID: 32566136 PMCID: PMC7291077 DOI: 10.12688/f1000research.22296.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/08/2020] [Indexed: 12/17/2022] Open
Abstract
Object recognition is the ability to identify an object or category based on the combination of visual features observed. It is a remarkable feat of the human brain, given that the patterns of light received by the eye associated with the properties of a given object vary widely with simple changes in viewing angle, ambient lighting, and distance. Furthermore, different exemplars of a specific object category can vary widely in visual appearance, such that successful categorization requires generalization across disparate visual features. In this review, we discuss recent advances in understanding the neural representations underlying object recognition in the human brain. We highlight three current trends in the approach towards this goal within the field of cognitive neuroscience. Firstly, we consider the influence of deep neural networks both as potential models of object vision and in how their representations relate to those in the human brain. Secondly, we review the contribution that time-series neuroimaging methods have made towards understanding the temporal dynamics of object representations beyond their spatial organization within different brain regions. Finally, we argue that an increasing emphasis on the context (both visual and task) within which object recognition occurs has led to a broader conceptualization of what constitutes an object representation for the brain. We conclude by identifying some current challenges facing the experimental pursuit of understanding object recognition and outline some emerging directions that are likely to yield new insight into this complex cognitive process.
Collapse
Affiliation(s)
- Susan G Wardle
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Chris Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
37
|
Matteucci G, Zoccolan D. Unsupervised experience with temporal continuity of the visual environment is causally involved in the development of V1 complex cells. SCIENCE ADVANCES 2020; 6:eaba3742. [PMID: 32523998 PMCID: PMC7259963 DOI: 10.1126/sciadv.aba3742] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 03/27/2020] [Indexed: 06/11/2023]
Abstract
Unsupervised adaptation to the spatiotemporal statistics of visual experience is a key computational principle that has long been assumed to govern postnatal development of visual cortical tuning, including orientation selectivity of simple cells and position tolerance of complex cells in primary visual cortex (V1). Yet, causal empirical evidence supporting this hypothesis is scant. Here, we show that degrading the temporal continuity of visual experience during early postnatal life leads to a sizable reduction of the number of complex cells and to an impairment of their functional properties while fully sparing the development of simple cells. This causally implicates adaptation to the temporal structure of the visual input in the development of transformation tolerance but not of shape tuning, thus tightly constraining computational models of unsupervised cortical learning.
Collapse
Affiliation(s)
- Giulio Matteucci
- Visual Neuroscience Laboratory, International School for Advanced Studies (SISSA), Trieste, Italy
| | | |
Collapse
|
38
|
An investigation of the effect of temporal contiguity training on size-tolerant representations in object-selective cortex. Neuroimage 2020; 217:116881. [PMID: 32353487 DOI: 10.1016/j.neuroimage.2020.116881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 04/17/2020] [Accepted: 04/23/2020] [Indexed: 02/06/2023] Open
Abstract
The human visual system has a remarkable ability to reliably identify objects across variations in appearance, such as variations in viewpoint, lighting and size. Here we used fMRI in humans to test whether temporal contiguity training with natural and altered image dynamics can respectively build and break neural size tolerance for objects. Participants (N = 23) were presented with sequences of images of "growing" and "shrinking" objects. In half of the trials, the object also changed identity when the size change happened. According to the temporal contiguity hypothesis, and studies with a similar paradigm in monkeys, this training process should alter size tolerance. After the training phase, BOLD responses to each of the object images were measured in the scanner. Neural patterns in LOC and V1 contained information on size, similarity and identity. In LOC, the representation of object identity was partially invariant to changes in size. However, temporal contiguity training did not affect size tolerance in LOC. Size tolerance in human object-selective cortex is more robust to variations in input statistics than expected based on prior work in monkeys supporting the temporal contiguity hypothesis.
Collapse
|
39
|
|
40
|
Rolls ET. Spatial coordinate transforms linking the allocentric hippocampal and egocentric parietal primate brain systems for memory, action in space, and navigation. Hippocampus 2019; 30:332-353. [PMID: 31697002 DOI: 10.1002/hipo.23171] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 10/05/2019] [Accepted: 10/09/2019] [Indexed: 01/03/2023]
Abstract
A theory and model of spatial coordinate transforms in the dorsal visual system through the parietal cortex that enable an interface via posterior cingulate and related retrosplenial cortex to allocentric spatial representations in the primate hippocampus is described. First, a new approach to coordinate transform learning in the brain is proposed, in which the traditional gain modulation is complemented by temporal trace rule competitive network learning. It is shown in a computational model that the new approach works much more precisely than gain modulation alone, by enabling neurons to represent the different combinations of signal and gain modulator more accurately. This understanding may have application to many brain areas where coordinate transforms are learned. Second, a set of coordinate transforms is proposed for the dorsal visual system/parietal areas that enables a representation to be formed in allocentric spatial view coordinates. The input stimulus is merely a stimulus at a given position in retinal space, and the gain modulation signals needed are eye position, head direction, and place, all of which are present in the primate brain. Neurons that encode the bearing to a landmark are involved in the coordinate transforms. Part of the importance here is that the coordinates of the allocentric view produced in this model are the same as those of spatial view cells that respond to allocentric view recorded in the primate hippocampus and parahippocampal cortex. The result is that information from the dorsal visual system can be used to update the spatial input to the hippocampus in the appropriate allocentric coordinate frame, including providing for idiothetic update to allow for self-motion. It is further shown how hippocampal spatial view cells could be useful for the transform from hippocampal allocentric coordinates to egocentric coordinates useful for actions in space and for navigation.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK.,Department of Computer Science, University of Warwick, Coventry, UK
| |
Collapse
|
41
|
Crijns E, Kaliukhovich DA, Vankelecom L, Op de Beeck H. Unsupervised Temporal Contiguity Experience Does Not Break the Invariance of Orientation Selectivity Across Spatial Frequency. Front Syst Neurosci 2019; 13:22. [PMID: 31231196 PMCID: PMC6558410 DOI: 10.3389/fnsys.2019.00022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 04/30/2019] [Indexed: 11/28/2022] Open
Abstract
The images projected onto the retina can vary widely for a single object. Despite these transformations primates can quickly and reliably recognize objects. At the neural level, transformation tolerance in monkey inferotemporal cortex is affected by the temporal contiguity statistics of the visual input. Here we investigated whether temporal contiguity learning also influences the basic feature detectors in lower levels of the visual hierarchy, in particular the independent coding of orientation and spatial frequency (SF) in primary visual cortex. Eight male Long Evans rats were repeatedly exposed to a temporal transition between two gratings that changed in SF and had either the same (control SF) or a different (swap SF) orientation. Electrophysiological evidence showed that the responses of single neurons during this exposure were sensitive to the change in orientation. Nevertheless, the tolerance of orientation selectivity for changes in SF was unaffected by the temporal contiguity manipulation, as observed in 239 single neurons isolated pre-exposure and 234 post-exposure. Temporal contiguity learning did not affect orientation selectivity in V1. The basic filter mechanisms that characterize V1 processing seem unaffected by temporal contiguity manipulations.
Collapse
Affiliation(s)
- Els Crijns
- Laboratory of Biological Psychology, Department of Brain and Cognition, KU Leuven, Leuven, Belgium.,Leuven Brain Institute, Leuven, Belgium
| | - Dzmitry A Kaliukhovich
- Laboratory of Biological Psychology, Department of Brain and Cognition, KU Leuven, Leuven, Belgium
| | - Lara Vankelecom
- Laboratory of Biological Psychology, Department of Brain and Cognition, KU Leuven, Leuven, Belgium
| | - Hans Op de Beeck
- Laboratory of Biological Psychology, Department of Brain and Cognition, KU Leuven, Leuven, Belgium.,Leuven Brain Institute, Leuven, Belgium
| |
Collapse
|
42
|
Prasad A, Wood SMW, Wood JN. Using automated controlled rearing to explore the origins of object permanence. Dev Sci 2019; 22:e12796. [PMID: 30589167 DOI: 10.1111/desc.12796] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 11/07/2018] [Accepted: 11/12/2018] [Indexed: 01/13/2023]
Abstract
What are the origins of object permanence? Despite widespread interest in this question, methodological barriers have prevented detailed analysis of how experience shapes the development of object permanence in newborn organisms. Here, we introduce an automated controlled-rearing method for studying the emergence of object permanence in strictly controlled virtual environments. We used newborn chicks as an animal model and recorded their behavior continuously (24/7) from the onset of vision. Across four experiments, we found that object permanence can develop rapidly, within the first few days of life. This ability developed even when chicks were reared in impoverished visual environments containing no object occlusion events. Object permanence failed to develop, however, when chicks were reared in environments containing temporally non-smooth objects (objects moving on discontinuous spatiotemporal paths). These results suggest that experience with temporally smooth objects facilitates the development of object permanence, confirming a key prediction of temporal learning models in computational neuroscience.
Collapse
Affiliation(s)
- Aditya Prasad
- Department of Psychology, University of Southern California, Los Angeles, California
| | - Samantha M W Wood
- Department of Psychology, University of Southern California, Los Angeles, California
| | - Justin N Wood
- Department of Psychology, University of Southern California, Los Angeles, California
| |
Collapse
|
43
|
What do neurons really want? The role of semantics in cortical representations. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
44
|
Navarro DM, Mender BMW, Smithson HE, Stringer SM. Self-organising coordinate transformation with peaked and monotonic gain modulation in the primate dorsal visual pathway. PLoS One 2018; 13:e0207961. [PMID: 30496225 PMCID: PMC6264903 DOI: 10.1371/journal.pone.0207961] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 11/08/2018] [Indexed: 11/20/2022] Open
Abstract
We study a self-organising neural network model of how visual representations in the primate dorsal visual pathway are transformed from an eye-centred to head-centred frame of reference. The model has previously been shown to robustly develop head-centred output neurons with a standard trace learning rule, but only under limited conditions. Specifically it fails when incorporating visual input neurons with monotonic gain modulation by eye-position. Since eye-centred neurons with monotonic gain modulation are so common in the dorsal visual pathway, it is an important challenge to show how efferent synaptic connections from these neurons may self-organise to produce head-centred responses in a subpopulation of postsynaptic neurons. We show for the first time how a variety of modified, yet still biologically plausible, versions of the standard trace learning rule enable the model to perform a coordinate transformation from eye-centred to head-centred reference frames when the visual input neurons have monotonic gain modulation by eye-position.
Collapse
Affiliation(s)
- Daniel M. Navarro
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, Oxfordshire, United Kingdom
- Oxford Perception Lab, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, Oxfordshire, United Kingdom
| | - Bedeho M. W. Mender
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, Oxfordshire, United Kingdom
| | - Hannah E. Smithson
- Oxford Perception Lab, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, Oxfordshire, United Kingdom
| | - Simon M. Stringer
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, Oxfordshire, United Kingdom
| |
Collapse
|
45
|
Zhang M, Feng J, Ma KT, Lim JH, Zhao Q, Kreiman G. Finding any Waldo with zero-shot invariant and efficient visual search. Nat Commun 2018; 9:3730. [PMID: 30213937 PMCID: PMC6137219 DOI: 10.1038/s41467-018-06217-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 08/10/2018] [Indexed: 11/11/2022] Open
Abstract
Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work on visual search has focused on searching for perfect matches of a target after extensive category-specific training. Here, we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and which can generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes.
Collapse
Affiliation(s)
- Mengmi Zhang
- Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, 138632, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 138632, Singapore
- Visual Intelligence Unit, Image/Video Analytics Dept, A*STAR, Singapore, 138632, Singapore
| | - Jiashi Feng
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 138632, Singapore
| | - Keng Teck Ma
- Artificial Intelligence Program, Agency for Science, Technology and Research, Singapore, 138632, Singapore
| | - Joo Hwee Lim
- Visual Intelligence Unit, Image/Video Analytics Dept, A*STAR, Singapore, 138632, Singapore
| | - Qi Zhao
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, 55455, USA
| | - Gabriel Kreiman
- Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
46
|
Greene E. New encoding concepts for shape recognition are needed. AIMS Neurosci 2018; 5:162-178. [PMID: 32341959 PMCID: PMC7179345 DOI: 10.3934/neuroscience.2018.3.162] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 02/26/2018] [Indexed: 11/18/2022] Open
Abstract
Models designed to explain how shapes are perceived and stored by the nervous system commonly emphasize encoding of contour features, especially orientation, curvature, and linear extent. A number of experiments from my laboratory provide evidence that contours deliver a multitude of location markers, and shapes can be identified when relatively few of the markers are displayed. The emphasis on filtering for orientation and other contour features has directed attention away from full and effective examination of how the location information is registered and used for summarizing shapes. Neural network (connectionist) models try to deal with location information by modifying linkage among neuronal populations through training trials. Connections that are initially diffuse and not useful in achieving recognition get eliminated or changed in strength, resulting in selective response to a given shape. But results from my laboratory, reviewed here, demonstrate that unknown shapes that are displayed only once can be identified using a matching task. These findings show that our visual system can immediately encode shape information with no requirement for training trials. This encoding might be accomplished by neuronal circuits in the retina.
Collapse
Affiliation(s)
- Ernest Greene
- Laboratory for Neurometric Research, Department of Psychology, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
47
|
Isbister JB, Eguchi A, Ahmad N, Galeazzi JM, Buckley MJ, Stringer S. A new approach to solving the feature-binding problem in primate vision. Interface Focus 2018; 8:20180021. [PMID: 29951198 PMCID: PMC6015810 DOI: 10.1098/rsfs.2018.0021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2018] [Indexed: 12/02/2022] Open
Abstract
We discuss a recently proposed approach to solve the classic feature-binding problem in primate vision that uses neural dynamics known to be present within the visual cortex. Broadly, the feature-binding problem in the visual context concerns not only how a hierarchy of features such as edges and objects within a scene are represented, but also the hierarchical relationships between these features at every spatial scale across the visual field. This is necessary for the visual brain to be able to make sense of its visuospatial world. Solving this problem is an important step towards the development of artificial general intelligence. In neural network simulation studies, it has been found that neurons encoding the binding relations between visual features, known as binding neurons, emerge during visual training when key properties of the visual cortex are incorporated into the models. These biological network properties include (i) bottom-up, lateral and top-down synaptic connections, (ii) spiking neuronal dynamics, (iii) spike timing-dependent plasticity, and (iv) a random distribution of axonal transmission delays (of the order of several milliseconds) in the propagation of spikes between neurons. After training the network on a set of visual stimuli, modelling studies have reported observing the gradual emergence of polychronization through successive layers of the network, in which subpopulations of neurons have learned to emit their spikes in regularly repeating spatio-temporal patterns in response to specific visual stimuli. Such a subpopulation of neurons is known as a polychronous neuronal group (PNG). Some neurons embedded within these PNGs receive convergent inputs from neurons representing lower- and higher-level visual features, and thus appear to encode the hierarchical binding relationship between features. Neural activity with this kind of spatio-temporal structure robustly emerges in the higher network layers even when neurons in the input layer represent visual stimuli with spike timings that are randomized according to a Poisson distribution. The resulting hierarchical representation of visual scenes in such models, including the representation of hierarchical binding relations between lower- and higher-level visual features, is consistent with the hierarchical phenomenology or subjective experience of primate vision and is distinct from approaches interested in segmenting a visual scene into a finite set of objects.
Collapse
Affiliation(s)
- James B Isbister
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford, Oxford OX2 6GG, UK
| | - Akihiro Eguchi
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford, Oxford OX2 6GG, UK
| | - Nasir Ahmad
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford, Oxford OX2 6GG, UK
| | - Juan M Galeazzi
- Oxford Brain and Behaviour Group, Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, UK
| | - Mark J Buckley
- Oxford Brain and Behaviour Group, Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, UK
| | - Simon Stringer
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford, Oxford OX2 6GG, UK
| |
Collapse
|
48
|
Homma T. Hand Recognition Obtained by Simulation of Hand Regard. Front Psychol 2018; 9:729. [PMID: 29867687 PMCID: PMC5962778 DOI: 10.3389/fpsyg.2018.00729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 04/26/2018] [Indexed: 11/13/2022] Open
Abstract
Eye-hand coordination of an infant is observed during the early months of their development. Hand regard, which is an example of this coordination, occurs at about 2 months. It is considered that after experiencing hand regard, an infant may recognize their own hands. However, it is unknown how an infant recognizes their hands through hand regard. Accordingly, the process by which an infant recognizes their hands and distinguishes between their hands and other objects was simulated. A simple neural network was trained with a modified real-time recurrent learning (RTRL) algorithm to deal with time-varying input and output during hand regard. The simulation results show that information about recognition of the modeled hands of an infant is stored in cell assemblies, which were self-organized. Cell assemblies appear during the phase of U-shaped developments of hand regard, and the configuration of the cell assemblies changes with each U-shaped development. Furthermore, movements like general movements (GMs) appear during the phase of U-shaped developments of hand regard.
Collapse
Affiliation(s)
- Takahiro Homma
- Center for Industrial and Governmental Relations, University of Electro-Communications, Tokyo, Japan
| |
Collapse
|
49
|
Nordberg H, Hautus MJ, Greene E. Visual encoding of partial unknown shape boundaries. AIMS Neurosci 2018; 5:132-147. [PMID: 32341957 PMCID: PMC7181889 DOI: 10.3934/neuroscience.2018.2.132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 05/10/2018] [Indexed: 12/21/2022] Open
Abstract
Prior research has found that known shapes and letters can be recognized from a sparse sampling of dots that mark locations on their boundaries. Further, unknown shapes that are displayed only once can be identified by a matching protocol, and here also, above-chance performance requires very few boundary markers. The present work examines whether partial boundaries can be identified under similar low-information conditions. Several experiments were conducted that used a match-recognition task, with initial display of a target shape followed quickly by a comparison shape. The comparison shape was either derived from the target shape or was based on a different shape, and the respondent was asked for a matching judgment, i.e., did it "match" the target shape. Stimulus treatments included establishing how density affected the probability of a correct decision, followed by assessment of how much positioning of boundary dots affected this probability. Results indicate that correct judgments were possible when partial boundaries were displayed with a sparse sampling of dots. We argue for a process that quickly registers the locations of boundary markers and distills that information into a shape summary that can be used to identify the shape even when only a portion of the boundary is represented.
Collapse
Affiliation(s)
- Hannah Nordberg
- Department of Psychology, University of Southern California, Los Angeles, California USA
| | - Michael J Hautus
- The School of Psychology, University of Auckland, Auckland New Zealand, California USA
| | - Ernest Greene
- Department of Psychology, University of Southern California, Los Angeles, California USA
| |
Collapse
|
50
|
Rolls ET, Mills WPC. Non-accidental properties, metric invariance, and encoding by neurons in a model of ventral stream visual object recognition, VisNet. Neurobiol Learn Mem 2018; 152:20-31. [PMID: 29723671 DOI: 10.1016/j.nlm.2018.04.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 04/02/2018] [Accepted: 04/27/2018] [Indexed: 11/18/2022]
Abstract
When objects transform into different views, some properties are maintained, such as whether the edges are convex or concave, and these non-accidental properties are likely to be important in view-invariant object recognition. The metric properties, such as the degree of curvature, may change with different views, and are less likely to be useful in object recognition. It is shown that in a model of invariant visual object recognition in the ventral visual stream, VisNet, non-accidental properties are encoded much more than metric properties by neurons. Moreover, it is shown how with the temporal trace rule training in VisNet, non-accidental properties of objects become encoded by neurons, and how metric properties are treated invariantly. We also show how VisNet can generalize between different objects if they have the same non-accidental property, because the metric properties are likely to overlap. VisNet is a 4-layer unsupervised model of visual object recognition trained by competitive learning that utilizes a temporal trace learning rule to implement the learning of invariance using views that occur close together in time. A second crucial property of this model of object recognition is, when neurons in the level corresponding to the inferior temporal visual cortex respond selectively to objects, whether neurons in the intermediate layers can respond to combinations of features that may be parts of two or more objects. In an investigation using the four sides of a square presented in every possible combination, it was shown that even though different layer 4 neurons are tuned to encode each feature or feature combination orthogonally, neurons in the intermediate layers can respond to features or feature combinations present is several objects. This property is an important part of the way in which high capacity can be achieved in the four-layer ventral visual cortical pathway. These findings concerning non-accidental properties and the use of neurons in intermediate layers of the hierarchy help to emphasise fundamental underlying principles of the computations that may be implemented in the ventral cortical visual stream used in object recognition.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, UK.
| | - W Patrick C Mills
- University of Warwick, Department of Computer Science, Coventry, UK. http://www.oxcns.org
| |
Collapse
|