1
|
Rolls ET. Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning. Front Comput Neurosci 2021; 15:686239. [PMID: 34366818 PMCID: PMC8335547 DOI: 10.3389/fncom.2021.686239] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
First, neurophysiological evidence for the learning of invariant representations in the inferior temporal visual cortex is described. This includes object and face representations with invariance for position, size, lighting, view and morphological transforms in the temporal lobe visual cortex; global object motion in the cortex in the superior temporal sulcus; and spatial view representations in the hippocampus that are invariant with respect to eye position, head direction, and place. Second, computational mechanisms that enable the brain to learn these invariant representations are proposed. For the ventral visual system, one key adaptation is the use of information available in the statistics of the environment in slow unsupervised learning to learn transform-invariant representations of objects. This contrasts with deep supervised learning in artificial neural networks, which uses training with thousands of exemplars forced into different categories by neuronal teachers. Similar slow learning principles apply to the learning of global object motion in the dorsal visual system leading to the cortex in the superior temporal sulcus. The learning rule that has been explored in VisNet is an associative rule with a short-term memory trace. The feed-forward architecture has four stages, with convergence from stage to stage. This type of slow learning is implemented in the brain in hierarchically organized competitive neuronal networks with convergence from stage to stage, with only 4-5 stages in the hierarchy. Slow learning is also shown to help the learning of coordinate transforms using gain modulation in the dorsal visual system extending into the parietal cortex and retrosplenial cortex. Representations are learned that are in allocentric spatial view coordinates of locations in the world and that are independent of eye position, head direction, and the place where the individual is located. This enables hippocampal spatial view cells to use idiothetic, self-motion, signals for navigation when the view details are obscured for short periods.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
2
|
Born J, Galeazzi JM, Stringer SM. Hebbian learning of hand-centred representations in a hierarchical neural network model of the primate visual system. PLoS One 2017; 12:e0178304. [PMID: 28562618 PMCID: PMC5451055 DOI: 10.1371/journal.pone.0178304] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 05/10/2017] [Indexed: 12/05/2022] Open
Abstract
A subset of neurons in the posterior parietal and premotor areas of the primate brain respond to the locations of visual targets in a hand-centred frame of reference. Such hand-centred visual representations are thought to play an important role in visually-guided reaching to target locations in space. In this paper we show how a biologically plausible, Hebbian learning mechanism may account for the development of localized hand-centred representations in a hierarchical neural network model of the primate visual system, VisNet. The hand-centered neurons developed in the model use an invariance learning mechanism known as continuous transformation (CT) learning. In contrast to previous theoretical proposals for the development of hand-centered visual representations, CT learning does not need a memory trace of recent neuronal activity to be incorporated in the synaptic learning rule. Instead, CT learning relies solely on a Hebbian learning rule, which is able to exploit the spatial overlap that naturally occurs between successive images of a hand-object configuration as it is shifted across different retinal locations due to saccades. Our simulations show how individual neurons in the network model can learn to respond selectively to target objects in particular locations with respect to the hand, irrespective of where the hand-object configuration occurs on the retina. The response properties of these hand-centred neurons further generalise to localised receptive fields in the hand-centred space when tested on novel hand-object configurations that have not been explored during training. Indeed, even when the network is trained with target objects presented across a near continuum of locations around the hand during training, the model continues to develop hand-centred neurons with localised receptive fields in hand-centred space. With the help of principal component analysis, we provide the first theoretical framework that explains the behavior of Hebbian learning in VisNet.
Collapse
Affiliation(s)
- Jannis Born
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxfordshire, United Kingdom
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Juan M. Galeazzi
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxfordshire, United Kingdom
| | - Simon M. Stringer
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxfordshire, United Kingdom
| |
Collapse
|
3
|
Galeazzi JM, Navajas J, Mender BMW, Quian Quiroga R, Minini L, Stringer SM. The visual development of hand-centered receptive fields in a neural network model of the primate visual system trained with experimentally recorded human gaze changes. NETWORK (BRISTOL, ENGLAND) 2016; 27:29-51. [PMID: 27253452 PMCID: PMC4926791 DOI: 10.1080/0954898x.2016.1187311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Revised: 05/04/2016] [Accepted: 05/04/2016] [Indexed: 06/05/2023]
Abstract
Neurons have been found in the primate brain that respond to objects in specific locations in hand-centered coordinates. A key theoretical challenge is to explain how such hand-centered neuronal responses may develop through visual experience. In this paper we show how hand-centered visual receptive fields can develop using an artificial neural network model, VisNet, of the primate visual system when driven by gaze changes recorded from human test subjects as they completed a jigsaw. A camera mounted on the head captured images of the hand and jigsaw, while eye movements were recorded using an eye-tracking device. This combination of data allowed us to reconstruct the retinal images seen as humans undertook the jigsaw task. These retinal images were then fed into the neural network model during self-organization of its synaptic connectivity using a biologically plausible trace learning rule. A trace learning mechanism encourages neurons in the model to learn to respond to input images that tend to occur in close temporal proximity. In the data recorded from human subjects, we found that the participant's gaze often shifted through a sequence of locations around a fixed spatial configuration of the hand and one of the jigsaw pieces. In this case, trace learning should bind these retinal images together onto the same subset of output neurons. The simulation results consequently confirmed that some cells learned to respond selectively to the hand and a jigsaw piece in a fixed spatial configuration across different retinal views.
Collapse
Affiliation(s)
- Juan M. Galeazzi
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Joaquín Navajas
- Institute of Cognitive Neuroscience, University College London, London, UK
- Centre for Systems Neuroscience, University of Leicester, Leicester, UK
| | - Bedeho M. W. Mender
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxford, UK
| | | | - Loredana Minini
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Simon M. Stringer
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxford, UK
| |
Collapse
|
4
|
Galeazzi JM, Minini L, Stringer SM. The Development of Hand-Centered Visual Representations in the Primate Brain: A Computer Modeling Study Using Natural Visual Scenes. Front Comput Neurosci 2015; 9:147. [PMID: 26696876 PMCID: PMC4678233 DOI: 10.3389/fncom.2015.00147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 11/23/2015] [Indexed: 11/23/2022] Open
Abstract
Neurons that respond to visual targets in a hand-centered frame of reference have been found within various areas of the primate brain. We investigate how hand-centered visual representations may develop in a neural network model of the primate visual system called VisNet, when the model is trained on images of the hand seen against natural visual scenes. The simulations show how such neurons may develop through a biologically plausible process of unsupervised competitive learning and self-organization. In an advance on our previous work, the visual scenes consisted of multiple targets presented simultaneously with respect to the hand. Three experiments are presented. First, VisNet was trained with computerized images consisting of a realistic image of a hand and a variety of natural objects, presented in different textured backgrounds during training. The network was then tested with just one textured object near the hand in order to verify if the output cells were capable of building hand-centered representations with a single localized receptive field. We explain the underlying principles of the statistical decoupling that allows the output cells of the network to develop single localized receptive fields even when the network is trained with multiple objects. In a second simulation we examined how some of the cells with hand-centered receptive fields decreased their shape selectivity and started responding to a localized region of hand-centered space as the number of objects presented in overlapping locations during training increases. Lastly, we explored the same learning principles training the network with natural visual scenes collected by volunteers. These results provide an important step in showing how single, localized, hand-centered receptive fields could emerge under more ecologically realistic visual training conditions.
Collapse
Affiliation(s)
- Juan M. Galeazzi
- Department of Experimental Psychology, Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, University of OxfordOxford, UK
| | | | | |
Collapse
|
5
|
Robinson L, Rolls ET. Invariant visual object recognition: biologically plausible approaches. BIOLOGICAL CYBERNETICS 2015; 109:505-35. [PMID: 26335743 PMCID: PMC4572081 DOI: 10.1007/s00422-015-0658-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 08/14/2015] [Indexed: 06/01/2023]
Abstract
Key properties of inferior temporal cortex neurons are described, and then, the biological plausibility of two leading approaches to invariant visual object recognition in the ventral visual system is assessed to investigate whether they account for these properties. Experiment 1 shows that VisNet performs object classification with random exemplars comparably to HMAX, except that the final layer C neurons of HMAX have a very non-sparse representation (unlike that in the brain) that provides little information in the single-neuron responses about the object class. Experiment 2 shows that VisNet forms invariant representations when trained with different views of each object, whereas HMAX performs poorly when assessed with a biologically plausible pattern association network, as HMAX has no mechanism to learn view invariance. Experiment 3 shows that VisNet neurons do not respond to scrambled images of faces, and thus encode shape information. HMAX neurons responded with similarly high rates to the unscrambled and scrambled faces, indicating that low-level features including texture may be relevant to HMAX performance. Experiment 4 shows that VisNet can learn to recognize objects even when the view provided by the object changes catastrophically as it transforms, whereas HMAX has no learning mechanism in its S-C hierarchy that provides for view-invariant learning. This highlights some requirements for the neurobiological mechanisms of high-level vision, and how some different approaches perform, in order to help understand the fundamental underlying principles of invariant visual object recognition in the ventral visual stream.
Collapse
Affiliation(s)
- Leigh Robinson
- Department of Computer Science, University of Warwick, Coventry, UK
| | - Edmund T Rolls
- Department of Computer Science, University of Warwick, Coventry, UK.
- Oxford Centre for Computational Neuroscience, Oxford, UK.
| |
Collapse
|
6
|
Mender BMW, Stringer SM. A self-organizing model of perisaccadic visual receptive field dynamics in primate visual and oculomotor system. Front Comput Neurosci 2015; 9:17. [PMID: 25717301 PMCID: PMC4324147 DOI: 10.3389/fncom.2015.00017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 01/30/2015] [Indexed: 11/13/2022] Open
Abstract
We propose and examine a model for how perisaccadic visual receptive field dynamics, observed in a range of primate brain areas such as LIP, FEF, SC, V3, V3A, V2, and V1, may develop through a biologically plausible process of unsupervised visually guided learning. These dynamics are associated with remapping, which is the phenomenon where receptive fields anticipate the consequences of saccadic eye movements. We find that a neural network model using a local associative synaptic learning rule, when exposed to visual scenes in conjunction with saccades, can account for a range of associated phenomena. In particular, our model demonstrates predictive and pre-saccadic remapping, responsiveness shifts around the time of saccades, and remapping from multiple directions.
Collapse
Affiliation(s)
- Bedeho M W Mender
- Department of Experimental Psychology, Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford Oxford, UK
| | - Simon M Stringer
- Department of Experimental Psychology, Centre for Theoretical Neuroscience and Artificial Intelligence, University of Oxford Oxford, UK
| |
Collapse
|
7
|
Rolls ET, Webb TJ. Finding and recognizing objects in natural scenes: complementary computations in the dorsal and ventral visual systems. Front Comput Neurosci 2014; 8:85. [PMID: 25161619 PMCID: PMC4130325 DOI: 10.3389/fncom.2014.00085] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/16/2014] [Indexed: 01/09/2023] Open
Abstract
Searching for and recognizing objects in complex natural scenes is implemented by multiple saccades until the eyes reach within the reduced receptive field sizes of inferior temporal cortex (IT) neurons. We analyze and model how the dorsal and ventral visual streams both contribute to this. Saliency detection in the dorsal visual system including area LIP is modeled by graph-based visual saliency, and allows the eyes to fixate potential objects within several degrees. Visual information at the fixated location subtending approximately 9° corresponding to the receptive fields of IT neurons is then passed through a four layer hierarchical model of the ventral cortical visual system, VisNet. We show that VisNet can be trained using a synaptic modification rule with a short-term memory trace of recent neuronal activity to capture both the required view and translation invariances to allow in the model approximately 90% correct object recognition for 4 objects shown in any view across a range of 135° anywhere in a scene. The model was able to generalize correctly within the four trained views and the 25 trained translations. This approach analyses the principles by which complementary computations in the dorsal and ventral visual cortical streams enable objects to be located and recognized in complex natural scenes.
Collapse
Affiliation(s)
- Edmund T. Rolls
- Department of Computer Science, University of WarwickCoventry, UK
- Oxford Centre for Computational NeuroscienceOxford, UK
| | - Tristan J. Webb
- Department of Computer Science, University of WarwickCoventry, UK
| |
Collapse
|
8
|
Mender BMW, Stringer SM. Self-organization of head-centered visual responses under ecological training conditions. NETWORK (BRISTOL, ENGLAND) 2014; 25:116-136. [PMID: 24992518 DOI: 10.3109/0954898x.2014.918671] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We have studied the development of head-centered visual responses in an unsupervised self-organizing neural network model which was trained under ecological training conditions. Four independent spatio-temporal characteristics of the training stimuli were explored to investigate the feasibility of the self-organization under more ecological conditions. First, the number of head-centered visual training locations was varied over a broad range. Model performance improved as the number of training locations approached the continuous sampling of head-centered space. Second, the model depended on periods of time where visual targets remained stationary in head-centered space while it performed saccades around the scene, and the severity of this constraint was explored by introducing increasing levels of random eye movement and stimulus dynamics. Model performance was robust over a range of randomization. Third, the model was trained on visual scenes where multiple simultaneous targets where always visible. Model self-organization was successful, despite never being exposed to a visual target in isolation. Fourth, the duration of fixations during training were made stochastic. With suitable changes to the learning rule, it self-organized successfully. These findings suggest that the fundamental learning mechanism upon which the model rests is robust to the many forms of stimulus variability under ecological training conditions.
Collapse
|
9
|
Webb TJ, Rolls ET. Deformation-specific and deformation-invariant visual object recognition: pose vs. identity recognition of people and deforming objects. Front Comput Neurosci 2014; 8:37. [PMID: 24744725 PMCID: PMC3978248 DOI: 10.3389/fncom.2014.00037] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
When we see a human sitting down, standing up, or walking, we can recognize one of these poses independently of the individual, or we can recognize the individual person, independently of the pose. The same issues arise for deforming objects. For example, if we see a flag deformed by the wind, either blowing out or hanging languidly, we can usually recognize the flag, independently of its deformation; or we can recognize the deformation independently of the identity of the flag. We hypothesize that these types of recognition can be implemented by the primate visual system using temporo-spatial continuity as objects transform as a learning principle. In particular, we hypothesize that pose or deformation can be learned under conditions in which large numbers of different people are successively seen in the same pose, or objects in the same deformation. We also hypothesize that person-specific representations that are independent of pose, and object-specific representations that are independent of deformation and view, could be built, when individual people or objects are observed successively transforming from one pose or deformation and view to another. These hypotheses were tested in a simulation of the ventral visual system, VisNet, that uses temporal continuity, implemented in a synaptic learning rule with a short-term memory trace of previous neuronal activity, to learn invariant representations. It was found that depending on the statistics of the visual input, either pose-specific or deformation-specific representations could be built that were invariant with respect to individual and view; or that identity-specific representations could be built that were invariant with respect to pose or deformation and view. We propose that this is how pose-specific and pose-invariant, and deformation-specific and deformation-invariant, perceptual representations are built in the brain.
Collapse
Affiliation(s)
- Tristan J. Webb
- Department of Computer Science, University of WarwickCoventry, UK
| | - Edmund T. Rolls
- Department of Computer Science, University of WarwickCoventry, UK
- Oxford Centre for Computational NeuroscienceOxford, UK
| |
Collapse
|
10
|
Galeazzi JM, Mender BMW, Paredes M, Tromans JM, Evans BD, Minini L, Stringer SM. A self-organizing model of the visual development of hand-centred representations. PLoS One 2013; 8:e66272. [PMID: 23799086 PMCID: PMC3683017 DOI: 10.1371/journal.pone.0066272] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 05/02/2013] [Indexed: 11/19/2022] Open
Abstract
We show how hand-centred visual representations could develop in the primate posterior parietal and premotor cortices during visually guided learning in a self-organizing neural network model. The model incorporates trace learning in the feed-forward synaptic connections between successive neuronal layers. Trace learning encourages neurons to learn to respond to input images that tend to occur close together in time. We assume that sequences of eye movements are performed around individual scenes containing a fixed hand-object configuration. Trace learning will then encourage individual cells to learn to respond to particular hand-object configurations across different retinal locations. The plausibility of this hypothesis is demonstrated in computer simulations.
Collapse
Affiliation(s)
- Juan M Galeazzi
- Department of Experimental Psychology, University of Oxford, Oxford, Oxfordshire, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
11
|
Evans BD, Stringer SM. Transformation-invariant visual representations in self-organizing spiking neural networks. Front Comput Neurosci 2012; 6:46. [PMID: 22848199 PMCID: PMC3404434 DOI: 10.3389/fncom.2012.00046] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 06/25/2012] [Indexed: 12/22/2022] Open
Abstract
The ventral visual pathway achieves object and face recognition by building transformation-invariant representations from elementary visual features. In previous computer simulation studies with rate-coded neural networks, the development of transformation-invariant representations has been demonstrated using either of two biologically plausible learning mechanisms, Trace learning and Continuous Transformation (CT) learning. However, it has not previously been investigated how transformation-invariant representations may be learned in a more biologically accurate spiking neural network. A key issue is how the synaptic connection strengths in such a spiking network might self-organize through Spike-Time Dependent Plasticity (STDP) where the change in synaptic strength is dependent on the relative times of the spikes emitted by the presynaptic and postsynaptic neurons rather than simply correlated activity driving changes in synaptic efficacy. Here we present simulations with conductance-based integrate-and-fire (IF) neurons using a STDP learning rule to address these gaps in our understanding. It is demonstrated that with the appropriate selection of model parameters and training regime, the spiking network model can utilize either Trace-like or CT-like learning mechanisms to achieve transform-invariant representations.
Collapse
Affiliation(s)
- Benjamin D. Evans
- Department of Experimental Psychology, Centre for Theoretical Neuroscience and Artificial Intelligence, University of OxfordOxford, UK
| | | |
Collapse
|
12
|
Tromans JM, Higgins I, Stringer SM. Learning view invariant recognition with partially occluded objects. Front Comput Neurosci 2012; 6:48. [PMID: 22848200 PMCID: PMC3404435 DOI: 10.3389/fncom.2012.00048] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 06/27/2012] [Indexed: 11/13/2022] Open
Abstract
This paper investigates how a neural network model of the ventral visual pathway, VisNet, can form separate view invariant representations of a number of objects seen rotating together. In particular, in the current work one of the rotating objects is always partially occluded by the other objects present during training. A key challenge for the model is to link together the separate partial views of the occluded object into a single view invariant representation of that object. We show how this can be achieved by Continuous Transformation (CT) learning, which relies on spatial similarity between successive views of each object. After training, the network had developed cells in the output layer which had learned to respond invariantly to particular objects over most or all views, with each cell responding to only one object. All objects, including the partially occluded object, were individually represented by a unique subset of output cells.
Collapse
Affiliation(s)
- James M. Tromans
- Experimental Psychology, Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence, University of OxfordOxford, UK
| | | | | |
Collapse
|
13
|
Baeck A, Windey I, Op de Beeck HP. The transfer of object learning across exemplars and their orientation is related to perceptual similarity. Vision Res 2012; 68:40-7. [PMID: 22819729 DOI: 10.1016/j.visres.2012.06.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Revised: 06/06/2012] [Accepted: 06/29/2012] [Indexed: 11/19/2022]
Abstract
Recognition of objects improves after training. The exact characteristics of this visual learning process remain unclear. We examined to which extent object learning depends on the exact exemplar and orientation used during training. Participants were trained to name object pictures at as short a picture presentation time as possible. The required presentation time diminished over training. After training participants were tested with a completely new set of objects as well as with two variants of the trained object set, namely an orientation change and a change of the exact exemplar shown. Both manipulations led to a decrease in performance compared to the original picture set. Nevertheless, performance with the manipulated versions of the trained stimuli was better than performance with the completely new set, at least when only one manipulation was performed. Amount of transfer to new images of an object was related to perceptual similarity, but not to pixel overlap or to measurements of similarity in the different layers of a popular hierarchical object recognition model (HMAX). Thus, object learning generalizes only partially over changes in exemplars and orientation, which is consistent with the tuning properties of neurons in object-selective cortical regions and the role of perceptual similarity in these representations.
Collapse
Affiliation(s)
- Annelies Baeck
- Laboratory of Biological Psychology, University of Leuven (KU Leuven), Tiensestraat 102, 3000 Leuven, Belgium.
| | | | | |
Collapse
|
14
|
Hegdé J, Thompson SK, Brady M, Kersten D. Object recognition in clutter: cortical responses depend on the type of learning. Front Hum Neurosci 2012; 6:170. [PMID: 22723774 PMCID: PMC3378082 DOI: 10.3389/fnhum.2012.00170] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 05/24/2012] [Indexed: 11/28/2022] Open
Abstract
Theoretical studies suggest that the visual system uses prior knowledge of visual objects to recognize them in visual clutter, and posit that the strategies for recognizing objects in clutter may differ depending on whether or not the object was learned in clutter to begin with. We tested this hypothesis using functional magnetic resonance imaging (fMRI) of human subjects. We trained subjects to recognize naturalistic, yet novel objects in strong or weak clutter. We then tested subjects' recognition performance for both sets of objects in strong clutter. We found many brain regions that were differentially responsive to objects during object recognition depending on whether they were learned in strong or weak clutter. In particular, the responses of the left fusiform gyrus (FG) reliably reflected, on a trial-to-trial basis, subjects' object recognition performance for objects learned in the presence of strong clutter. These results indicate that the visual system does not use a single, general-purpose mechanism to cope with clutter. Instead, there are two distinct spatial patterns of activation whose responses are attributable not to the visual context in which the objects were seen, but to the context in which the objects were learned.
Collapse
Affiliation(s)
- Jay Hegdé
- Department of Ophthalmology, Vision Discovery Institute, Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta GA, USA
| | | | | | | |
Collapse
|
15
|
Rolls ET. Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet. Front Comput Neurosci 2012; 6:35. [PMID: 22723777 PMCID: PMC3378046 DOI: 10.3389/fncom.2012.00035] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 05/23/2012] [Indexed: 11/13/2022] Open
Abstract
Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus.
Collapse
Affiliation(s)
- Edmund T. Rolls
- Oxford Centre for Computational NeuroscienceOxford, UK
- Department of Computer Science, University of WarwickCoventry, UK
| |
Collapse
|
16
|
Teichmann M, Wiltschut J, Hamker F. Learning Invariance from Natural Images Inspired by Observations in the Primary Visual Cortex. Neural Comput 2012; 24:1271-96. [DOI: 10.1162/neco_a_00268] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The human visual system has the remarkable ability to largely recognize objects invariant of their position, rotation, and scale. A good interpretation of neurobiological findings involves a computational model that simulates signal processing of the visual cortex. In part, this is likely achieved step by step from early to late areas of visual perception. While several algorithms have been proposed for learning feature detectors, only few studies at hand cover the issue of biologically plausible learning of such invariance. In this study, a set of Hebbian learning rules based on calcium dynamics and homeostatic regulations of single neurons is proposed. Their performance is verified within a simple model of the primary visual cortex to learn so-called complex cells, based on a sequence of static images. As a result, the learned complex-cell responses are largely invariant to phase and position.
Collapse
Affiliation(s)
| | - Jan Wiltschut
- Chemnitz University of Technology, Chemnitz 01907, Germany, and Westfälische Wilhelms-Universität Münster, 48149 Münster, Germany
| | - Fred Hamker
- Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
17
|
Tromans JM, Page HJI, Stringer SM. Learning separate visual representations of independently rotating objects. NETWORK (BRISTOL, ENGLAND) 2012; 23:1-23. [PMID: 22364581 DOI: 10.3109/0954898x.2011.651520] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Individual cells that respond preferentially to particular objects have been found in the ventral visual pathway. How the brain is able to develop neurons that exhibit these object selective responses poses a significant challenge for computational models of object recognition. Typically, many objects make up a complex natural scene and are never presented in isolation. Nonetheless, the visual system is able to build invariant object selective responses. In this paper, we present a model of the ventral visual stream, VisNet, which can solve the problem of learning object selective representations even when multiple objects are always present during training. Past research with the VisNet model has shown that the network can operate successfully in a similar training paradigm, but only when training comprises many different object pairs. Numerous pairings are required for statistical decoupling between objects. In this research, we show for the first time that VisNet is capable of utilizing the statistics inherent in independent rotation to form object selective representations when training with just two objects, always presented together. Crucially, our results show that in a dependent rotation paradigm, the model fails to build object selective representations and responds as if the two objects are in fact one. If the objects begin to rotate independently, the network forms representations for each object separately.
Collapse
Affiliation(s)
- James Matthew Tromans
- Department of Experimental Psychology, University of Oxford, Experimental Psychology, South Parks Road, Oxford, OX1 3UD, UK.
| | | | | |
Collapse
|
18
|
Tromans JM, Harris M, Stringer SM. A computational model of the development of separate representations of facial identity and expression in the primate visual system. PLoS One 2011; 6:e25616. [PMID: 21998673 PMCID: PMC3188551 DOI: 10.1371/journal.pone.0025616] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 09/07/2011] [Indexed: 11/18/2022] Open
Abstract
Experimental studies have provided evidence that the visual processing areas of the primate brain represent facial identity and facial expression within different subpopulations of neurons. For example, in non-human primates there is evidence that cells within the inferior temporal gyrus (TE) respond primarily to facial identity, while cells within the superior temporal sulcus (STS) respond to facial expression. More recently, it has been found that the orbitofrontal cortex (OFC) of non-human primates contains some cells that respond exclusively to changes in facial identity, while other cells respond exclusively to facial expression. How might the primate visual system develop physically separate representations of facial identity and expression given that the visual system is always exposed to simultaneous combinations of facial identity and expression during learning? In this paper, a biologically plausible neural network model, VisNet, of the ventral visual pathway is trained on a set of carefully-designed cartoon faces with different identities and expressions. The VisNet model architecture is composed of a hierarchical series of four Self-Organising Maps (SOMs), with associative learning in the feedforward synaptic connections between successive layers. During learning, the network develops separate clusters of cells that respond exclusively to either facial identity or facial expression. We interpret the performance of the network in terms of the learning properties of SOMs, which are able to exploit the statistical indendependence between facial identity and expression.
Collapse
Affiliation(s)
- James Matthew Tromans
- Department of Experimental Psychology, University of Oxford, Oxford, Oxfordshire, United Kingdom
- * E-mail:
| | - Mitchell Harris
- Department of Experimental Psychology, University of Oxford, Oxford, Oxfordshire, United Kingdom
| | - Simon Maitland Stringer
- Department of Experimental Psychology, University of Oxford, Oxford, Oxfordshire, United Kingdom
| |
Collapse
|
19
|
Masquelier T. Relative spike time coding and STDP-based orientation selectivity in the early visual system in natural continuous and saccadic vision: a computational model. J Comput Neurosci 2011; 32:425-41. [PMID: 21938439 DOI: 10.1007/s10827-011-0361-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 09/05/2011] [Accepted: 09/08/2011] [Indexed: 10/17/2022]
Abstract
We have built a phenomenological spiking model of the cat early visual system comprising the retina, the Lateral Geniculate Nucleus (LGN) and V1's layer 4, and established four main results (1) When exposed to videos that reproduce with high fidelity what a cat experiences under natural conditions, adjacent Retinal Ganglion Cells (RGCs) have spike-time correlations at a short timescale (~30 ms), despite neuronal noise and possible jitter accumulation. (2) In accordance with recent experimental findings, the LGN filters out some noise. It thus increases the spike reliability and temporal precision, the sparsity, and, importantly, further decreases down to ~15 ms adjacent cells' correlation timescale. (3) Downstream simple cells in V1's layer 4, if equipped with Spike Timing-Dependent Plasticity (STDP), may detect these fine-scale cross-correlations, and thus connect principally to ON- and OFF-centre cells with Receptive Fields (RF) aligned in the visual space, and thereby become orientation selective, in accordance with Hubel and Wiesel (Journal of Physiology 160:106-154, 1962) classic model. Up to this point we dealt with continuous vision, and there was no absolute time reference such as a stimulus onset, yet information was encoded and decoded in the relative spike times. (4) We then simulated saccades to a static image and benchmarked relative spike time coding and time-to-first spike coding w.r.t. to saccade landing in the context of orientation representation. In both the retina and the LGN, relative spike times are more precise, less affected by pre-landing history and global contrast than absolute ones, and lead to robust contrast invariant orientation representations in V1.
Collapse
Affiliation(s)
- Timothée Masquelier
- Unit for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
20
|
Spratling M. Learning Posture Invariant Spatial Representations Through Temporal Correlations. ACTA ACUST UNITED AC 2009. [DOI: 10.1109/tamd.2009.2038494] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
21
|
Rolls ET, Tromans JM, Stringer SM. Spatial scene representations formed by self-organizing learning in a hippocampal extension of the ventral visual system. Eur J Neurosci 2009; 28:2116-27. [PMID: 19046392 DOI: 10.1111/j.1460-9568.2008.06486.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We show in a unifying computational approach that representations of spatial scenes can be formed by adding an additional self-organizing layer of processing beyond the inferior temporal visual cortex in the ventral visual stream without the introduction of new computational principles. The invariant representations of objects by neurons in the inferior temporal visual cortex can be modelled by a multilayer feature hierarchy network with feedforward convergence from stage to stage, and an associative learning rule with a short-term memory trace to capture the invariant statistical properties of objects as they transform over short time periods in the world. If an additional layer is added to this architecture, training now with whole scenes that consist of a set of objects in a given fixed spatial relation to each other results in neurons in the added layer that respond to one of the trained whole scenes but do not respond if the objects in the scene are rearranged to make a new scene from the same objects. The formation of these scene-specific representations in the added layer is related to the fact that in the inferior temporal cortex and, we show, in the VisNet model, the receptive fields of inferior temporal cortex neurons shrink and become asymmetric when multiple objects are present simultaneously in a natural scene. This reduced size and asymmetry of the receptive fields of inferior temporal cortex neurons also provides a solution to the representation of multiple objects, and their relative spatial positions, in complex natural scenes.
Collapse
Affiliation(s)
- Edmund T Rolls
- Department of Experimental Psychology, Centre for Computational Neuroscience, Oxford University, Oxford, UK.
| | | | | |
Collapse
|
22
|
Rolls ET. Top-down control of visual perception: attention in natural vision. Perception 2008; 37:333-54. [PMID: 18491712 DOI: 10.1068/p5877] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Top-down perceptual influences can bias (or pre-empt) perception. In natural scenes, the receptive fields of neurons in the inferior temporal visual cortex (IT) shrink to become close to the size of objects. This facilitates the read-out of information from the ventral visual system, because the information is primarily about the object at the fovea. Top-down attentional influences are much less evident in natural scenes than when objects are shown against blank backgrounds, though are still present. It is suggested that the reduced receptive-field size in natural scenes, and the effects of top-down attention contribute to change blindness. The receptive fields of IT neurons in complex scenes, though including the fovea, are frequently asymmetric around the fovea, and it is proposed that this is the solution the IT uses to represent multiple objects and their relative spatial positions in a scene. Networks that implement probabilistic decision-making are described, and it is suggested that, when in perceptual systems they take decisions (or 'test hypotheses'), they influence lower-level networks to bias visual perception. Finally, it is shown that similar processes extend to systems involved in the processing of emotion-provoking sensory stimuli, in that word-level cognitive states provide top-down biasing that reaches as far down as the orbitofrontal cortex, where, at the first stage of affective representations, olfactory, taste, flavour, and touch processing is biased (or pre-empted) in humans.
Collapse
Affiliation(s)
- Edmund T Rolls
- Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford OX1 3UD, UK.
| |
Collapse
|
23
|
Stringer SM, Rolls ET. Learning transform invariant object recognition in the visual system with multiple stimuli present during training. Neural Netw 2008; 21:888-903. [PMID: 18440774 DOI: 10.1016/j.neunet.2007.11.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2006] [Accepted: 11/07/2007] [Indexed: 11/30/2022]
Abstract
Over successive stages, the visual system develops neurons that respond with view, size and position invariance to objects or faces. A number of computational models have been developed to explain how transform-invariant cells could develop in the visual system. However, a major limitation of computer modelling studies to date has been that the visual stimuli are typically presented one at a time to the network during training. In this paper, we investigate how vision models may self-organize when multiple stimuli are presented together within each visual image during training. We show that as the number of independent stimuli grows large enough, standard competitive neural networks can suddenly switch from learning representations of the multi-stimulus input patterns to representing the individual stimuli. Furthermore, the competitive networks can learn transform (e.g. position or view) invariant representations of the individual stimuli if the network is presented with input patterns containing multiple transforming stimuli during training. Finally, we extend these results to a multi-layer hierarchical network model (VisNet) of the ventral visual system. The network is trained on input images containing multiple rotating 3D objects. We show that the network is able to develop view-invariant representations of the individual objects.
Collapse
Affiliation(s)
- S M Stringer
- Oxford University, Centre for Computational Neuroscience, Department of Experimental Psychology, South Parks Road, Oxford OX1 3UD, England, United Kingdom.
| | | |
Collapse
|
24
|
Masquelier T, Thorpe SJ. Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput Biol 2007; 3:e31. [PMID: 17305422 PMCID: PMC1797822 DOI: 10.1371/journal.pcbi.0030031] [Citation(s) in RCA: 205] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2006] [Accepted: 01/02/2007] [Indexed: 11/18/2022] Open
Abstract
Spike timing dependent plasticity (STDP) is a learning rule that modifies synaptic strength as a function of the relative timing of pre- and postsynaptic spikes. When a neuron is repeatedly presented with similar inputs, STDP is known to have the effect of concentrating high synaptic weights on afferents that systematically fire early, while postsynaptic spike latencies decrease. Here we use this learning rule in an asynchronous feedforward spiking neural network that mimics the ventral visual pathway and shows that when the network is presented with natural images, selectivity to intermediate-complexity visual features emerges. Those features, which correspond to prototypical patterns that are both salient and consistently present in the images, are highly informative and enable robust object recognition, as demonstrated on various classification tasks. Taken together, these results show that temporal codes may be a key to understanding the phenomenal processing speed achieved by the visual system and that STDP can lead to fast and selective responses.
Collapse
Affiliation(s)
- Timothée Masquelier
- Centre de Recherche Cerveau et Cognition, Centre National de la Recherche Scientifique, Université Paul Sabatier, Faculté de Médecine de Rangueil, Toulouse, France.
| | | |
Collapse
|
25
|
Rolls ET, Stringer SM. Invariant Global Motion Recognition in the Dorsal Visual System: A Unifying Theory. Neural Comput 2007; 19:139-69. [PMID: 17134320 DOI: 10.1162/neco.2007.19.1.139] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The motion of an object (such as a wheel rotating) is seen as consistent independent of its position and size on the retina. Neurons in higher cortical visual areas respond to these global motion stimuli invariantly, but neurons in early cortical areas with small receptive fields cannot represent this motion, not only because of the aperture problem but also because they do not have invariant representations. In a unifying hypothesis with the design of the ventral cortical visual system, we propose that the dorsal visual system uses a hierarchical feedforward network architecture (V1, V2, MT, MSTd, parietal cortex) with training of the connections with a short-term memory trace associative synaptic modification rule to capture what is invariant at each stage. Simulations show that the proposal is computationally feasible, in that invariant representations of the motion flow fields produced by objects self-organize in the later layers of the architecture. The model produces invariant representations of the motion flow fields produced by global in-plane motion of an object, in-plane rotational motion, looming versus receding of the object, and object-based rotation about a principal axis. Thus, the dorsal and ventral visual systems may share some similar computational principles.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford University, Centre for Computational Neuroscience, Department of Experimental Psychology, Oxford OX1 3UD, England.
| | | |
Collapse
|
26
|
Rolls ET, Deco G. Attention in natural scenes: Neurophysiological and computational bases. Neural Netw 2006; 19:1383-94. [PMID: 17011749 DOI: 10.1016/j.neunet.2006.08.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/01/2006] [Indexed: 10/24/2022]
Abstract
How does attention operate in natural scenes? We show that the receptive fields of inferior temporal cortex neurons that implement object representations become small and located at the fovea in complex natural scenes. This facilitates the readout of information about an object that may be reward or punishment associated, and may be the target for action. Top-down biased competition to implement attention has a much weaker effect in complex natural scenes than in otherwise blank scenes with two objects. Part of the solution to the binding problem is thus that competition and the foveal cortical magnification factor emphasize what is present at the fovea, and limit the binding problem. Part of the solution to the binding problem is that neurons respond to combinations of features present in the correct relative spatial positions. Stimulus-dependent neuronal synchrony does not appear to be quantitatively important in feature binding, and in attention, in natural visual scenes, at least in the inferior temporal visual cortex, as shown by information theoretic analyses. The perception of multiple objects in a scene is facilitated by the fact that inferior temporal visual cortex neurons have asymmetrical receptive fields with respect to the fovea in complex scenes. Computational models of this processing are described.
Collapse
Affiliation(s)
- Edmund T Rolls
- University of Oxford, Department of Experimental Psychology, South Parks Road, Oxford OX1 3UD, United Kingdom.
| | | |
Collapse
|
27
|
Abstract
How are invariant representations of objects formed in the visual cortex? We describe a neurophysiological and computational approach which focusses on a feature hierarchy model in which invariant representations can be built by self-organizing learning based on the statistics of the visual input. The model can use temporal continuity in an associative synaptic learning rule with a short term memory trace, and/or it can use spatial continuity in Continuous Transformation learning. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and in this paper we show also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in for example spatial and object search tasks. The model has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford University, Centre for Computational Neuroscience, Department of Experimental Psychology, South Parks Road, Oxford OX1 3UD, England, United Kingdom.
| | | |
Collapse
|
28
|
Gowen E, Abadi RV, Poliakoff E. Paying attention to saccadic intrusions. ACTA ACUST UNITED AC 2005; 25:810-25. [PMID: 16256318 DOI: 10.1016/j.cogbrainres.2005.09.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2004] [Revised: 07/04/2005] [Accepted: 09/14/2005] [Indexed: 10/25/2022]
Abstract
Fixation to a target in primary gaze is invariably interrupted by physiological conjugate saccadic intrusions (SI). These small idiosyncratic eye movements (usually <1 degrees in amplitude) take the form of an initial horizontal fast eye movement away from the desired eye position, followed after a variable duration by a return saccade or drift. As the aetiology of SI is still unclear, it was the aim of this study to investigate whether SI are related to exogenous or endogenous attentional processes. This was achieved by varying (a) the "bottom-up" target viewing conditions (target presence, servo control of the target, target background, target size) and (b) the 'top-down' attentional state (instruction change--'look' or 'hold eyes steady' and passive fixation versus active--'respond to change' fixation) in 13 subjects (the number of participants in each task varied between 7 and 11). We also manipulated the orientation of pure exogenous attention through a cue-target task, during which subjects were required to respond to a target, preceded by a non-informative cue by either pressing a button or making a saccade towards the target. SI amplitude, duration, frequency and direction were measured. SI amplitude was found to be significantly higher when the target was absent and SI frequency significantly lower during open loop conditions. Target size and background influenced SI behaviour in an idiosyncratic manner, although there was a trend for subjects to exhibit lower SI frequencies and amplitudes when a patterned background was present and larger SI amplitudes with larger target sizes. SI frequency decreased during the "hold eyes steady" passive command as well as during active fixation but SI direction was not influenced by the exogenous cue-target task. These results suggest that SI are related to endogenous rather than exogenous attention mechanisms. Our experiments lead us to propose that SI represent shifts in endogenous attention that reflect a baseline attention state present during laboratory fixation tasks and may prove to be a useful tool to explore higher cortical control of fixation.
Collapse
Affiliation(s)
- E Gowen
- Behavioural Brain Sciences, School of Psychology, Hills Building, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | | | | |
Collapse
|
29
|
Spratling MW. Learning viewpoint invariant perceptual representations from cluttered images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2005; 27:753-61. [PMID: 15875796 DOI: 10.1109/tpami.2005.105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations.
Collapse
|
30
|
Borisyuk RM, Kazanovich YB. Oscillatory model of attention-guided object selection and novelty detection. Neural Netw 2004; 17:899-915. [PMID: 15312834 DOI: 10.1016/j.neunet.2004.03.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2002] [Revised: 03/12/2004] [Accepted: 03/12/2004] [Indexed: 10/26/2022]
Abstract
We develop a new oscillatory model that combines consecutive selection of objects and discrimination between new and familiar objects. The model works with visual information and fulfils the following operations: (1) separation of different objects according to their spatial connectivity; (2) consecutive selection of objects located in the visual field into the attention focus; (3) extraction of features; (4) representation of objects in working memory; (5) novelty detection of objects. The functioning of the model is based on two main principles: the synchronization of oscillators through phase-locking and resonant increase of the amplitudes of oscillators if they work in-phase with other oscillators. The results of computer simulation of the model are described for visual stimuli representing printed words.
Collapse
Affiliation(s)
- Roman M Borisyuk
- Centre for Theoretical & Computational Neuroscience, University of Plymouth, Plymouth PL4 8AA, UK.
| | | |
Collapse
|
31
|
Altmann CF, Deubelius A, Kourtzi Z. Shape Saliency Modulates Contextual Processing in the Human Lateral Occipital Complex. J Cogn Neurosci 2004; 16:794-804. [PMID: 15200707 DOI: 10.1162/089892904970825] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Visual context influences our perception of target objects in natural scenes. However, little is known about the analysis of context information and its role in shape perception in the human brain. We investigated whether the human lateral occipital complex (LOC), known to be involved in the visual analysis of shapes, also processes information about the context of shapes within cluttered scenes. We employed an fMRI adaptation paradigm in which fMRI responses are lower for two identical than for two different stimuli presented consecutively. The stimuli consisted of closed target contours defined by aligned Gabor elements embedded in a background of randomly oriented Gabors. We measured fMRI adaptation in the LOC across changes in the context of the target shapes by manipulating the position and orientation of the background elements. No adaptation was observed across context changes when the background elements were presented in the same plane as the target elements. However, adaptation was observed when the grouping of the target elements was enhanced in a bottom-up (i.e., grouping by disparity or motion) or top-down (i.e., shape priming) manner and thus the saliency of the target shape increased. These findings suggest that the LOC processes information not only about shapes, but also about their context. This processing of context information in the LOC is modulated by figure–ground segmentation and grouping processes. That is, neural populations in the LOC encode context information when relevant to the perception of target shapes, but represent salient targets independent of context changes.
Collapse
|
32
|
Abstract
We describe a model of invariant visual object recognition in the brain that incorporates feedback biasing effects of top-down attentional mechanisms on a hierarchically organized set of visual cortical areas with convergent forward connectivity, reciprocal feedback connections, and local intra-area competition. The model displays space-based and object-based covert visual search by using attentional top-down feedback from either the posterior parietal or the inferior temporal cortex (IT) modules, and interactions between the two processing streams occurring in V1 and V2. The model explains the gradually increasing magnitude of the attentional modulation that is found in fMRI experiments from earlier visual areas (V1, V2) to higher ventral stream visual areas (V4, IT); how the effective size of the receptive fields of IT neurons becomes smaller in natural cluttered scenes; and makes predictions about interactions between stimuli in their receptive fields.
Collapse
Affiliation(s)
- Gustavo Deco
- Department of Technology, Computational Neuroscience, Instituciõ Catalana de Recerca i Estudis Avançats, Universitat Pompeu Fabra, Passeig de Circumval.lació, 08003 Barcelona, Spain
| | | |
Collapse
|
33
|
Who Opened Pandora's Box? Cortex 2004. [DOI: 10.1016/s0010-9452(08)70146-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
34
|
Novelty Detection in Video Surveillance Using Hierarchical Neural Networks. ARTIFICIAL NEURAL NETWORKS — ICANN 2002 2002. [DOI: 10.1007/3-540-46084-5_202] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
35
|
Abstract
A learning paradigm of a new biophysical vision model (BVM) is presented. It incorporates anatomical and physiological evidence from micro- and macroscopic research on vision as reported in the literature during the past five years. Anatomical and physiological vision research tends to drift away from the technological foundations of encoding and reproducing size-defined images of real ongoing life scenarios. White and color light waves reflecting life scenarios are converted by the retina to encoded electrical train pulses with attached real information to be decoded by cortical vision neurons. The BVM paradigm is based on the ideas that: (1) cinema technology reproduces real-life scenes just as the human eye sees them; (2) virtual reality and robotics are computerized replications of categorized human vision faculties in operation. We believe that vision-related technology may extend our knowledge about vision and direct vision research into new horizons. The biophysical vision model has three prerequisites: (1) The faculties of human vision must be categorized. (2) Logic circuits of the 'hardware' of neuronal vision must be present. (3) Vision faculties are operated by self-induced 'software'. Vision research may be enhanced with devices constructed according to BVM that would enable biophysical vision experiments in both humans and animals.
Collapse
Affiliation(s)
- Y Naisberg
- Kfar Yidud Rehabilitation Center, Netanya, Israel.
| |
Collapse
|
36
|
Rolls ET. Functions of the primate temporal lobe cortical visual areas in invariant visual object and face recognition. Neuron 2000; 27:205-18. [PMID: 10985342 DOI: 10.1016/s0896-6273(00)00030-1] [Citation(s) in RCA: 214] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- E T Rolls
- University of Oxford, Department of Experimental Psychology, United Kingdom.
| |
Collapse
|