1
|
Lindeberg T. Orientation selectivity properties for the affine Gaussian derivative and the affine Gabor models for visual receptive fields. J Comput Neurosci 2025; 53:61-98. [PMID: 39878929 PMCID: PMC11868404 DOI: 10.1007/s10827-024-00888-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 11/08/2024] [Accepted: 12/11/2024] [Indexed: 01/31/2025]
Abstract
This paper presents an in-depth theoretical analysis of the orientation selectivity properties of simple cells and complex cells, that can be well modelled by the generalized Gaussian derivative model for visual receptive fields, with the purely spatial component of the receptive fields determined by oriented affine Gaussian derivatives for different orders of spatial differentiation. A detailed mathematical analysis is presented for the three different cases of either: (i) purely spatial receptive fields, (ii) space-time separable spatio-temporal receptive fields and (iii) velocity-adapted spatio-temporal receptive fields. Closed-form theoretical expressions for the orientation selectivity curves for idealized models of simple and complex cells are derived for all these main cases, and it is shown that the orientation selectivity of the receptive fields becomes more narrow, as a scale parameter ratio κ , defined as the ratio between the scale parameters in the directions perpendicular to vs. parallel with the preferred orientation of the receptive field, increases. It is also shown that the orientation selectivity becomes more narrow with increasing order of spatial differentiation in the underlying affine Gaussian derivative operators over the spatial domain. A corresponding theoretical orientation selectivity analysis is also presented for purely spatial receptive fields according to an affine Gabor model, showing that: (i) the orientation selectivity becomes more narrow when making the receptive fields wider in the direction perpendicular to the preferred orientation of the receptive field; while (ii) an additional degree of freedom in the affine Gabor model does, however, also strongly affect the orientation selectivity properties.
Collapse
Affiliation(s)
- Tony Lindeberg
- Computational Brain Science Lab, Division of Computational Science and Technology, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden.
| |
Collapse
|
2
|
Almasi A, Sun SH, Jung YJ, Ibbotson M, Meffin H. Data-driven modelling of visual receptive fields: comparison between the generalized quadratic model and the nonlinear input model. J Neural Eng 2024; 21:046014. [PMID: 38941988 DOI: 10.1088/1741-2552/ad5d15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 06/28/2024] [Indexed: 06/30/2024]
Abstract
Objective: Neurons in primary visual cortex (V1) display a range of sensitivity in their response to translations of their preferred visual features within their receptive field: from high specificity to a precise position through to complete invariance. This visual feature selectivity and invariance is frequently modeled by applying a selection of linear spatial filters to the input image, that define the feature selectivity, followed by a nonlinear function that combines the filter outputs, that defines the invariance, to predict the neural response. We compare two such classes of model, that are both popular and parsimonious, the generalized quadratic model (GQM) and the nonlinear input model (NIM). These two classes of model differ primarily in that the NIM can accommodate a greater diversity in the form of nonlinearity that is applied to the outputs of the filters.Approach: We compare the two model types by applying them to data from multielectrode recordings from cat primary visual cortex in response to spatially white Gaussian noise After fitting both classes of model to a database of 342 single units (SUs), we analyze the qualitative and quantitative differences in the visual feature processing performed by the two models and their ability to predict neural response.Main results: We find that the NIM predicts response rates on a held-out data at least as well as the GQM for 95% of SUs. Superior performance occurs predominantly for those units with above average spike rates and is largely due to the NIMs ability to capture aspects of the model's nonlinear function cannot be captured with the GQM rather than differences in the visual features being processed by the two different models.Significance: These results can help guide model choice for data-driven receptive field modelling.
Collapse
Affiliation(s)
- Ali Almasi
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Shi H Sun
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Young Jun Jung
- National Vision Research Institute, Carlton, VIC 3053, Australia
| | - Michael Ibbotson
- National Vision Research Institute, Carlton, VIC 3053, Australia
- Department of Optometry and Vision Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Hamish Meffin
- National Vision Research Institute, Carlton, VIC 3053, Australia
- Department of Biomedical Engineering, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
3
|
Matteucci G, Piasini E, Zoccolan D. Unsupervised learning of mid-level visual representations. Curr Opin Neurobiol 2024; 84:102834. [PMID: 38154417 DOI: 10.1016/j.conb.2023.102834] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/30/2023]
Abstract
Recently, a confluence between trends in neuroscience and machine learning has brought a renewed focus on unsupervised learning, where sensory processing systems learn to exploit the statistical structure of their inputs in the absence of explicit training targets or rewards. Sophisticated experimental approaches have enabled the investigation of the influence of sensory experience on neural self-organization and its synaptic bases. Meanwhile, novel algorithms for unsupervised and self-supervised learning have become increasingly popular both as inspiration for theories of the brain, particularly for the function of intermediate visual cortical areas, and as building blocks of real-world learning machines. Here we review some of these recent developments, placing them in historical context and highlighting some research lines that promise exciting breakthroughs in the near future.
Collapse
Affiliation(s)
- Giulio Matteucci
- Department of Basic Neurosciences, University of Geneva, Geneva, 1206, Switzerland. https://twitter.com/giulio_matt
| | - Eugenio Piasini
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy
| | - Davide Zoccolan
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy.
| |
Collapse
|
4
|
Benucci A. Motor-related signals support localization invariance for stable visual perception. PLoS Comput Biol 2022; 18:e1009928. [PMID: 35286305 PMCID: PMC8947590 DOI: 10.1371/journal.pcbi.1009928] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 03/24/2022] [Accepted: 02/16/2022] [Indexed: 11/19/2022] Open
Abstract
Our ability to perceive a stable visual world in the presence of continuous movements of the body, head, and eyes has puzzled researchers in the neuroscience field for a long time. We reformulated this problem in the context of hierarchical convolutional neural networks (CNNs)-whose architectures have been inspired by the hierarchical signal processing of the mammalian visual system-and examined perceptual stability as an optimization process that identifies image-defining features for accurate image classification in the presence of movements. Movement signals, multiplexed with visual inputs along overlapping convolutional layers, aided classification invariance of shifted images by making the classification faster to learn and more robust relative to input noise. Classification invariance was reflected in activity manifolds associated with image categories emerging in late CNN layers and with network units acquiring movement-associated activity modulations as observed experimentally during saccadic eye movements. Our findings provide a computational framework that unifies a multitude of biological observations on perceptual stability under optimality principles for image classification in artificial neural networks.
Collapse
Affiliation(s)
- Andrea Benucci
- RIKEN Center for Brain Science, Wako-shi, Japan
- University of Tokyo, Graduate School of Information Science and Technology, Department of Mathematical Informatics, Tokyo, Japan
| |
Collapse
|
5
|
Piasini E, Soltuzu L, Muratore P, Caramellino R, Vinken K, Op de Beeck H, Balasubramanian V, Zoccolan D. Temporal stability of stimulus representation increases along rodent visual cortical hierarchies. Nat Commun 2021; 12:4448. [PMID: 34290247 PMCID: PMC8295255 DOI: 10.1038/s41467-021-24456-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 06/14/2021] [Indexed: 11/09/2022] Open
Abstract
Cortical representations of brief, static stimuli become more invariant to identity-preserving transformations along the ventral stream. Likewise, increased invariance along the visual hierarchy should imply greater temporal persistence of temporally structured dynamic stimuli, possibly complemented by temporal broadening of neuronal receptive fields. However, such stimuli could engage adaptive and predictive processes, whose impact on neural coding dynamics is unknown. By probing the rat analog of the ventral stream with movies, we uncovered a hierarchy of temporal scales, with deeper areas encoding visual information more persistently. Furthermore, the impact of intrinsic dynamics on the stability of stimulus representations grew gradually along the hierarchy. A database of recordings from mouse showed similar trends, additionally revealing dependencies on the behavioral state. Overall, these findings show that visual representations become progressively more stable along rodent visual processing hierarchies, with an important contribution provided by intrinsic processing.
Collapse
Affiliation(s)
- Eugenio Piasini
- Computational Neuroscience Initiative, University of Pennsylvania, Philadelphia, PA, United States
| | - Liviu Soltuzu
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
- Blue Brain Project, École polytechnique fédérale de Lausanne (EPFL), Campus Biotech, Geneva, Switzerland
| | - Paolo Muratore
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
| | - Riccardo Caramellino
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy
| | - Kasper Vinken
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Hans Op de Beeck
- Department of Brain and Cognition, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Vijay Balasubramanian
- Computational Neuroscience Initiative, University of Pennsylvania, Philadelphia, PA, United States
| | - Davide Zoccolan
- Visual Neuroscience Lab, International School for Advanced Studies (SISSA), Trieste, Italy.
| |
Collapse
|
6
|
Lian Y, Almasi A, Grayden DB, Kameneva T, Burkitt AN, Meffin H. Learning receptive field properties of complex cells in V1. PLoS Comput Biol 2021; 17:e1007957. [PMID: 33651790 PMCID: PMC7954310 DOI: 10.1371/journal.pcbi.1007957] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 03/12/2021] [Accepted: 02/09/2021] [Indexed: 11/24/2022] Open
Abstract
There are two distinct classes of cells in the primary visual cortex (V1): simple cells and complex cells. One defining feature of complex cells is their spatial phase invariance; they respond strongly to oriented grating stimuli with a preferred orientation but with a wide range of spatial phases. A classical model of complete spatial phase invariance in complex cells is the energy model, in which the responses are the sum of the squared outputs of two linear spatially phase-shifted filters. However, recent experimental studies have shown that complex cells have a diverse range of spatial phase invariance and only a subset can be characterized by the energy model. While several models have been proposed to explain how complex cells could learn to be selective to orientation but invariant to spatial phase, most existing models overlook many biologically important details. We propose a biologically plausible model for complex cells that learns to pool inputs from simple cells based on the presentation of natural scene stimuli. The model is a three-layer network with rate-based neurons that describes the activities of LGN cells (layer 1), V1 simple cells (layer 2), and V1 complex cells (layer 3). The first two layers implement a recently proposed simple cell model that is biologically plausible and accounts for many experimental phenomena. The neural dynamics of the complex cells is modeled as the integration of simple cells inputs along with response normalization. Connections between LGN and simple cells are learned using Hebbian and anti-Hebbian plasticity. Connections between simple and complex cells are learned using a modified version of the Bienenstock, Cooper, and Munro (BCM) rule. Our results demonstrate that the learning rule can describe a diversity of complex cells, similar to those observed experimentally. Many cortical functions originate from the learning ability of the brain. How the properties of cortical cells are learned is vital for understanding how the brain works. There are many models that explain how V1 simple cells can be learned. However, how V1 complex cells are learned still remains unclear. In this paper, we propose a model of learning in complex cells based on the Bienenstock, Cooper, and Munro (BCM) rule. We demonstrate that properties of receptive fields of complex cells can be learned using this biologically plausible learning rule. Quantitative comparisons between the model and experimental data are performed. Results show that model complex cells can account for the diversity of complex cells found in experimental studies. In summary, this study provides a plausible explanation for how complex cells can be learned using biologically plausible plasticity mechanisms. Our findings help us to better understand biological vision processing and provide us with insights into the general signal processing principles that the visual cortex employs to process visual information.
Collapse
Affiliation(s)
- Yanbo Lian
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- * E-mail:
| | - Ali Almasi
- National Vision Research Institute, The Australian College of Optometry, Melbourne, Australia
| | - David B. Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Tatiana Kameneva
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Faculty of Science, Engineering and Technology, Swinburne University, Melbourne, Australia
| | - Anthony N. Burkitt
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Hamish Meffin
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- National Vision Research Institute, The Australian College of Optometry, Melbourne, Australia
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
7
|
Matteucci G, Zoccolan D. Unsupervised experience with temporal continuity of the visual environment is causally involved in the development of V1 complex cells. SCIENCE ADVANCES 2020; 6:eaba3742. [PMID: 32523998 PMCID: PMC7259963 DOI: 10.1126/sciadv.aba3742] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 03/27/2020] [Indexed: 06/11/2023]
Abstract
Unsupervised adaptation to the spatiotemporal statistics of visual experience is a key computational principle that has long been assumed to govern postnatal development of visual cortical tuning, including orientation selectivity of simple cells and position tolerance of complex cells in primary visual cortex (V1). Yet, causal empirical evidence supporting this hypothesis is scant. Here, we show that degrading the temporal continuity of visual experience during early postnatal life leads to a sizable reduction of the number of complex cells and to an impairment of their functional properties while fully sparing the development of simple cells. This causally implicates adaptation to the temporal structure of the visual input in the development of transformation tolerance but not of shape tuning, thus tightly constraining computational models of unsupervised cortical learning.
Collapse
Affiliation(s)
- Giulio Matteucci
- Visual Neuroscience Laboratory, International School for Advanced Studies (SISSA), Trieste, Italy
| | | |
Collapse
|
8
|
Hosoya H, Hyvärinen A. Learning Visual Spatial Pooling by Strong PCA Dimension Reduction. Neural Comput 2016; 28:1249-64. [PMID: 27171856 DOI: 10.1162/neco_a_00843] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In visual modeling, invariance properties of visual cells are often explained by a pooling mechanism, in which outputs of neurons with similar selectivities to some stimulus parameters are integrated so as to gain some extent of invariance to other parameters. For example, the classical energy model of phase-invariant V1 complex cells pools model simple cells preferring similar orientation but different phases. Prior studies, such as independent subspace analysis, have shown that phase-invariance properties of V1 complex cells can be learned from spatial statistics of natural inputs. However, those previous approaches assumed a squaring nonlinearity on the neural outputs to capture energy correlation; such nonlinearity is arguably unnatural from a neurobiological viewpoint but hard to change due to its tight integration into their formalisms. Moreover, they used somewhat complicated objective functions requiring expensive computations for optimization. In this study, we show that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis. This approach learns to ignore a large part of detailed spatial structure of the input and thereby estimates a linear pooling matrix. Using this framework, we demonstrate that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells. For further understanding, we analyze several variants of the pooling model and argue that a reasonable pooling can generally be obtained from any kind of linear transformation that retains several of the first principal components and suppresses the remaining ones. In particular, we show how the classic Wiener filtering theory leads to one such variant.
Collapse
Affiliation(s)
- Haruo Hosoya
- Computational Neuroscience Laboratories, ATR International, Kyoto 619-0288, Japan, and Presto, Japan Science and Technology Agency, Saitama 332-0012, Japan
| | - Aapo Hyvärinen
- Department of Computer Science and HIIT, University of Helsinki, Helsinki 00560, Finland
| |
Collapse
|
9
|
Golden JR, Vilankar KP, Wu MCK, Field DJ. Conjectures regarding the nonlinear geometry of visual neurons. Vision Res 2016; 120:74-92. [PMID: 26902730 DOI: 10.1016/j.visres.2015.10.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 09/16/2015] [Accepted: 10/10/2015] [Indexed: 12/01/2022]
Abstract
From the earliest stages of sensory processing, neurons show inherent non-linearities: the response to a complex stimulus is not a sum of the responses to a set of constituent basis stimuli. These non-linearities come in a number of forms and have been explained in terms of a number of functional goals. The family of spatial non-linearities have included interactions that occur both within and outside of the classical receptive field. They include, saturation, cross orientation inhibition, contrast normalization, end-stopping and a variety of non-classical effects. In addition, neurons show a number of facilitatory and invariance related effects such as those exhibited by complex cells (integration across position). Here, we describe an approach that attempts to explain many of the non-linearities under a single geometric framework. In line with Zetzsche and colleagues (e.g., Zetzsche et al., 1999) we propose that many of the principal non-linearities can be described by a geometry where the neural response space has a simple curvature. In this paper, we focus on the geometry that produces both increased selectivity (curving outward) and increased tolerance (curving inward). We demonstrate that overcomplete sparse coding with both low-dimensional synthetic data and high-dimensional natural scene data can result in curvature that is responsible for a variety of different known non-classical effects including end-stopping and gain control. We believe that this approach provides a more fundamental explanation of these non-linearities and does not require that one postulate a variety of explanations (e.g., that gain must be controlled or the ends of lines must be detected). In its standard form, sparse coding does not however, produce invariance/tolerance represented by inward curvature. We speculate on some of the requirements needed to produce such curvature.
Collapse
Affiliation(s)
- James R Golden
- Department of Psychology, Cornell University, Ithaca, NY, USA.
| | | | - Michael C K Wu
- Biophysics Graduate Group, University of California, Berkeley, CA, USA; Lithium Technologies Inc., San Francisco, CA, USA.
| | - David J Field
- Department of Psychology, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
10
|
Parker SM, Serre T. Unsupervised invariance learning of transformation sequences in a model of object recognition yields selectivity for non-accidental properties. Front Comput Neurosci 2015; 9:115. [PMID: 26500528 PMCID: PMC4595784 DOI: 10.3389/fncom.2015.00115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 09/07/2015] [Indexed: 11/13/2022] Open
Abstract
Non-accidental properties (NAPs) correspond to image properties that are invariant to changes in viewpoint (e.g., straight vs. curved contours) and are distinguished from metric properties (MPs) that can change continuously with in-depth object rotation (e.g., aspect ratio, degree of curvature, etc.). Behavioral and electrophysiological studies of shape processing have demonstrated greater sensitivity to differences in NAPs than in MPs. However, previous work has shown that such sensitivity is lacking in multiple-views models of object recognition such as Hmax. These models typically assume that object processing is based on populations of view-tuned neurons with distributed symmetrical bell-shaped tuning that are modulated at least as much by differences in MPs as in NAPs. Here, we test the hypothesis that unsupervised learning of invariances to object transformations may increase the sensitivity to differences in NAPs vs. MPs in Hmax. We collected a database of video sequences with objects slowly rotating in-depth in an attempt to mimic sequences viewed during object manipulation by young children during early developmental stages. We show that unsupervised learning yields shape-tuning in higher stages with greater sensitivity to differences in NAPs vs. MPs in agreement with monkey IT data. Together, these results suggest that greater NAP sensitivity may arise from experiencing different in-depth rotations of objects.
Collapse
Affiliation(s)
- Sarah M. Parker
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown UniversityProvidence, RI, USA
| | - Thomas Serre
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown UniversityProvidence, RI, USA
- Brown Institute for Brain SciencesProvidence, RI, USA
| |
Collapse
|
11
|
Phillips WA. Cognitive functions of intracellular mechanisms for contextual amplification. Brain Cogn 2015; 112:39-53. [PMID: 26428863 DOI: 10.1016/j.bandc.2015.09.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 09/16/2015] [Accepted: 09/18/2015] [Indexed: 01/31/2023]
Abstract
Evidence for the hypothesis that input to the apical tufts of neocortical pyramidal cells plays a central role in cognition by amplifying their responses to feedforward input is reviewed. Apical tufts are electrically remote from the soma, and their inputs come from diverse sources including direct feedback from higher cortical regions, indirect feedback via the thalamus, and long-range lateral connections both within and between cortical regions. This suggests that input to tuft dendrites may amplify the cell's response to basal inputs that they receive via layer 4 and which have synapses closer to the soma. ERP data supporting this inference is noted. Intracellular studies of apical amplification (AA) and of disamplification by inhibitory interneurons targeted only at tufts are reviewed. Cognitive processes that have been related to them by computational, electrophysiological, and psychopathological studies are then outlined. These processes include: figure-ground segregation and Gestalt grouping; contextual disambiguation in perception and sentence comprehension; priming; winner-take-all competition; attention and working memory; setting the level of consciousness; cognitive control; and learning. It is argued that theories in cognitive neuroscience should not assume that all neurons function as integrate-and-fire point processors, but should use the capabilities of cells with distinct sites of integration for driving and modulatory inputs. Potentially 'unifying' theories that depend upon these capabilities are reviewed. It is concluded that evolution of the primitives of AA and disamplification in neocortex may have extended cognitive capabilities beyond those built from the long-established primitives of excitation, inhibition, and disinhibition.
Collapse
Affiliation(s)
- William A Phillips
- School of Natural Sciences, University of Stirling, Stirling FK9 4LA, UK.
| |
Collapse
|
12
|
Banerjee B, Dutta JK. SELP: A general-purpose framework for learning the norms from saliencies in spatiotemporal data. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.02.044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
13
|
Dähne S, Wilbert N, Wiskott L. Slow feature analysis on retinal waves leads to V1 complex cells. PLoS Comput Biol 2014; 10:e1003564. [PMID: 24810948 PMCID: PMC4014395 DOI: 10.1371/journal.pcbi.1003564] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 12/20/2013] [Indexed: 11/20/2022] Open
Abstract
The developing visual system of many mammalian species is partially structured and organized even before the onset of vision. Spontaneous neural activity, which spreads in waves across the retina, has been suggested to play a major role in these prenatal structuring processes. Recently, it has been shown that when employing an efficient coding strategy, such as sparse coding, these retinal activity patterns lead to basis functions that resemble optimal stimuli of simple cells in primary visual cortex (V1). Here we present the results of applying a coding strategy that optimizes for temporal slowness, namely Slow Feature Analysis (SFA), to a biologically plausible model of retinal waves. Previously, SFA has been successfully applied to model parts of the visual system, most notably in reproducing a rich set of complex-cell features by training SFA with quasi-natural image sequences. In the present work, we obtain SFA units that share a number of properties with cortical complex-cells by training on simulated retinal waves. The emergence of two distinct properties of the SFA units (phase invariance and orientation tuning) is thoroughly investigated via control experiments and mathematical analysis of the input-output functions found by SFA. The results support the idea that retinal waves share relevant temporal and spatial properties with natural visual input. Hence, retinal waves seem suitable training stimuli to learn invariances and thereby shape the developing early visual system such that it is best prepared for coding input from the natural world.
Collapse
Affiliation(s)
- Sven Dähne
- Machine Learning Group, Department of Computer Science, Berlin Institute of Technology, Berlin, Germany
- Institute for Theoretical Biology, Humboldt-University, Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Niko Wilbert
- Institute for Theoretical Biology, Humboldt-University, Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Laurenz Wiskott
- Institute for Theoretical Biology, Humboldt-University, Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
- Institute for Neural Computation, Ruhr-University Bochum, Bochum, Germany
| |
Collapse
|
14
|
Lies JP, Häfner RM, Bethge M. Slowness and sparseness have diverging effects on complex cell learning. PLoS Comput Biol 2014; 10:e1003468. [PMID: 24603197 PMCID: PMC3945087 DOI: 10.1371/journal.pcbi.1003468] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Accepted: 12/19/2013] [Indexed: 11/18/2022] Open
Abstract
Following earlier studies which showed that a sparse coding principle may explain the receptive field properties of complex cells in primary visual cortex, it has been concluded that the same properties may be equally derived from a slowness principle. In contrast to this claim, we here show that slowness and sparsity drive the representations towards substantially different receptive field properties. To do so, we present complete sets of basis functions learned with slow subspace analysis (SSA) in case of natural movies as well as translations, rotations, and scalings of natural images. SSA directly parallels independent subspace analysis (ISA) with the only difference that SSA maximizes slowness instead of sparsity. We find a large discrepancy between the filter shapes learned with SSA and ISA. We argue that SSA can be understood as a generalization of the Fourier transform where the power spectrum corresponds to the maximally slow subspace energies in SSA. Finally, we investigate the trade-off between slowness and sparseness when combined in one objective function.
Collapse
Affiliation(s)
- Jörn-Philipp Lies
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
| | - Ralf M. Häfner
- Swartz Center for Theoretical Neurobiology, Brandeis University, Waltham, Massachusetts, United States of America
| | - Matthias Bethge
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
15
|
Amiri A, Haykin S. Improved Sparse Coding Under the Influence of Perceptual Attention. Neural Comput 2014; 26:377-420. [DOI: 10.1162/neco_a_00546] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Sparse coding has established itself as a useful tool for the representation of natural data in the neuroscience as well as signal-processing literature. The aim of this letter, inspired by the human brain, is to improve on the performance of the sparse coding algorithm by trying to bridge the gap between neuroscience and engineering. To this end, we build on the localized perception-action cycle in cognitive neuroscience by categorizing it under the umbrella of perceptual attention, which lends itself to increase gradually the contrast between relevant information and irrelevant information. Stated in another way, irrelevant information is filtered away, while relevant information about the environment is enhanced from one cycle to the next. We may thus think in terms of the information filter, which, in a Bayesian context, was introduced in the literature by Fraser ( 1967 ). In a Bayesian context, the information filter provides a method for algorithmic implementation of perceptual attention. The information filter may therefore be viewed as the basis for improving the algorithmic performance of sparse coding. To support this performance improvement, the letter presents two computer experiments. The first experiment uses simulated (real-valued) data that are generated to purposely make the problem challenging. The second uses real-life radar data that are complex valued, hence the proposal to introduce Wirtinger calculus into derivation of the new algorithm.
Collapse
Affiliation(s)
- Ashkan Amiri
- Cognitive Systems Laboratory, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Simon Haykin
- Cognitive Systems Laboratory, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| |
Collapse
|
16
|
Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodríguez-Sánchez AJ, Wiskott L. Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:1847-1871. [PMID: 23787340 DOI: 10.1109/tpami.2012.272] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
Collapse
Affiliation(s)
- Norbert Krüger
- Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Campusvej 55, Odense M 5230, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Predictions in the light of your own action repertoire as a general computational principle. Behav Brain Sci 2013; 36:219-20. [PMID: 23663324 DOI: 10.1017/s0140525x12002294] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We argue that brains generate predictions only within the constraints of the action repertoire. This makes the computational complexity tractable and fosters a step-by-step parallel development of sensory and motor systems. Hence, it is more of a benefit than a literal constraint and may serve as a universal normative principle to understand sensorimotor coupling and interactions with the world.
Collapse
|
18
|
Teichmann M, Wiltschut J, Hamker F. Learning Invariance from Natural Images Inspired by Observations in the Primary Visual Cortex. Neural Comput 2012; 24:1271-96. [DOI: 10.1162/neco_a_00268] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The human visual system has the remarkable ability to largely recognize objects invariant of their position, rotation, and scale. A good interpretation of neurobiological findings involves a computational model that simulates signal processing of the visual cortex. In part, this is likely achieved step by step from early to late areas of visual perception. While several algorithms have been proposed for learning feature detectors, only few studies at hand cover the issue of biologically plausible learning of such invariance. In this study, a set of Hebbian learning rules based on calcium dynamics and homeostatic regulations of single neurons is proposed. Their performance is verified within a simple model of the primary visual cortex to learn so-called complex cells, based on a sequence of static images. As a result, the learned complex-cell responses are largely invariant to phase and position.
Collapse
Affiliation(s)
| | - Jan Wiltschut
- Chemnitz University of Technology, Chemnitz 01907, Germany, and Westfälische Wilhelms-Universität Münster, 48149 Münster, Germany
| | - Fred Hamker
- Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
19
|
Natural versus synthetic stimuli for estimating receptive field models: a comparison of predictive robustness. J Neurosci 2012; 32:1560-76. [PMID: 22302799 DOI: 10.1523/jneurosci.4661-12.2012] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
An ultimate goal of visual neuroscience is to understand the neural encoding of complex, everyday scenes. Yet most of our knowledge of neuronal receptive fields has come from studies using simple artificial stimuli (e.g., bars, gratings) that may fail to reveal the full nature of a neuron's actual response properties. Our goal was to compare the utility of artificial and natural stimuli for estimating receptive field (RF) models. Using extracellular recordings from simple type cells in cat A18, we acquired responses to three types of broadband stimulus ensembles: two widely used artificial patterns (white noise and short bars), and natural images. We used a primary dataset to estimate the spatiotemporal receptive field (STRF) with two hold-back datasets for regularization and validation. STRFs were estimated using an iterative regression algorithm with regularization and subsequently fit with a zero-memory nonlinearity. Each RF model (STRF and zero-memory nonlinearity) was then used in simulations to predict responses to the same stimulus type used to estimate it, as well as to other broadband stimuli and sinewave gratings. White noise stimuli often elicited poor responses leading to noisy RF estimates, while short bars and natural image stimuli were more successful in driving A18 neurons and producing clear RF estimates with strong predictive ability. Natural image-derived RF models were the most robust at predicting responses to other broadband stimulus ensembles that were not used in their estimation and also provided good predictions of tuning curves for sinewave gratings.
Collapse
|
20
|
Cadieu CF, Olshausen BA. Learning intermediate-level representations of form and motion from natural movies. Neural Comput 2011; 24:827-66. [PMID: 22168556 DOI: 10.1162/neco_a_00247] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment. The model is composed of two stages of processing: an early feature representation layer and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure, consistent with known response properties in early visual cortex (area V1). This factorization linearizes statistical dependencies among the first-layer units, making them learnable by the second layer. The second-layer units are split into two populations according to the factorization in the first layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multiscale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images and testable hypotheses regarding intermediate-level representation in visual cortex.
Collapse
Affiliation(s)
- Charles F Cadieu
- Redwood Center for Theoretical Neuroscience, Helen Wills Neuroscience Institute, and School of Optometry, University of California, Berkeley, Berkeley, CA 94720, USA.
| | | |
Collapse
|
21
|
Masquelier T. Relative spike time coding and STDP-based orientation selectivity in the early visual system in natural continuous and saccadic vision: a computational model. J Comput Neurosci 2011; 32:425-41. [PMID: 21938439 DOI: 10.1007/s10827-011-0361-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 09/05/2011] [Accepted: 09/08/2011] [Indexed: 10/17/2022]
Abstract
We have built a phenomenological spiking model of the cat early visual system comprising the retina, the Lateral Geniculate Nucleus (LGN) and V1's layer 4, and established four main results (1) When exposed to videos that reproduce with high fidelity what a cat experiences under natural conditions, adjacent Retinal Ganglion Cells (RGCs) have spike-time correlations at a short timescale (~30 ms), despite neuronal noise and possible jitter accumulation. (2) In accordance with recent experimental findings, the LGN filters out some noise. It thus increases the spike reliability and temporal precision, the sparsity, and, importantly, further decreases down to ~15 ms adjacent cells' correlation timescale. (3) Downstream simple cells in V1's layer 4, if equipped with Spike Timing-Dependent Plasticity (STDP), may detect these fine-scale cross-correlations, and thus connect principally to ON- and OFF-centre cells with Receptive Fields (RF) aligned in the visual space, and thereby become orientation selective, in accordance with Hubel and Wiesel (Journal of Physiology 160:106-154, 1962) classic model. Up to this point we dealt with continuous vision, and there was no absolute time reference such as a stimulus onset, yet information was encoded and decoded in the relative spike times. (4) We then simulated saccades to a static image and benchmarked relative spike time coding and time-to-first spike coding w.r.t. to saccade landing in the context of orientation representation. In both the retina and the LGN, relative spike times are more precise, less affected by pre-landing history and global contrast than absolute ones, and lead to robust contrast invariant orientation representations in V1.
Collapse
Affiliation(s)
- Timothée Masquelier
- Unit for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
22
|
Antolík J, Bednar JA. Development of maps of simple and complex cells in the primary visual cortex. Front Comput Neurosci 2011; 5:17. [PMID: 21559067 PMCID: PMC3082289 DOI: 10.3389/fncom.2011.00017] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 03/30/2011] [Indexed: 11/13/2022] Open
Abstract
Hubel and Wiesel (1962) classified primary visual cortex (V1) neurons as either simple, with responses modulated by the spatial phase of a sine grating, or complex, i.e., largely phase invariant. Much progress has been made in understanding how simple-cells develop, and there are now detailed computational models establishing how they can form topographic maps ordered by orientation preference. There are also models of how complex cells can develop using outputs from simple cells with different phase preferences, but no model of how a topographic orientation map of complex cells could be formed based on the actual connectivity patterns found in V1. Addressing this question is important, because the majority of existing developmental models of simple-cell maps group neurons selective to similar spatial phases together, which is contrary to experimental evidence, and makes it difficult to construct complex cells. Overcoming this limitation is not trivial, because mechanisms responsible for map development drive receptive fields (RF) of nearby neurons to be highly correlated, while co-oriented RFs of opposite phases are anti-correlated. In this work, we model V1 as two topographically organized sheets representing cortical layer 4 and 2/3. Only layer 4 receives direct thalamic input. Both sheets are connected with narrow feed-forward and feedback connectivity. Only layer 2/3 contains strong long-range lateral connectivity, in line with current anatomical findings. Initially all weights in the model are random, and each is modified via a Hebbian learning rule. The model develops smooth, matching, orientation preference maps in both sheets. Layer 4 units become simple cells, with phase preference arranged randomly, while those in layer 2/3 are primarily complex cells. To our knowledge this model is the first explaining how simple cells can develop with random phase preference, and how maps of complex cells can develop, using only realistic patterns of connectivity.
Collapse
Affiliation(s)
- Ján Antolík
- Institute for Adaptive and Neural Computation, University of EdinburghEdinburgh, UK
- Department of Neuroscience, Physiology and Pharmacology, University College LondonLondon, UK
- Unité de Neurosciences Information et Complexité, CNRSGif-sur-Yvette, France
| | - James A. Bednar
- Institute for Adaptive and Neural Computation, University of EdinburghEdinburgh, UK
| |
Collapse
|
23
|
Klampfl S, Maass W. A theoretical basis for emergent pattern discrimination in neural systems through slow feature extraction. Neural Comput 2010; 22:2979-3035. [PMID: 20858129 DOI: 10.1162/neco_a_00050] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Neurons in the brain are able to detect and discriminate salient spatiotemporal patterns in the firing activity of presynaptic neurons. It is open how they can learn to achieve this, especially without the help of a supervisor. We show that a well-known unsupervised learning algorithm for linear neurons, slow feature analysis (SFA), is able to acquire the discrimination capability of one of the best algorithms for supervised linear discrimination learning, the Fisher linear discriminant (FLD), given suitable input statistics. We demonstrate the power of this principle by showing that it enables readout neurons from simulated cortical microcircuits to learn without any supervision to discriminate between spoken digits and to detect repeated firing patterns that are embedded into a stream of noise spike trains with the same firing statistics. Both these computer simulations and our theoretical analysis show that slow feature extraction enables neurons to extract and collect information that is spread out over a trajectory of firing states that lasts several hundred ms. In addition, it enables neurons to learn without supervision to keep track of time (relative to a stimulus onset, or the initiation of a motor response). Hence, these results elucidate how the brain could compute with trajectories of firing states rather than only with fixed point attractors. It also provides a theoretical basis for understanding recent experimental results on the emergence of view- and position-invariant classification of visual objects in inferior temporal cortex.
Collapse
Affiliation(s)
- Stefan Klampfl
- Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria.
| | | |
Collapse
|
24
|
Continuous transformation learning of translation invariant representations. Exp Brain Res 2010; 204:255-70. [PMID: 20544186 DOI: 10.1007/s00221-010-2309-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Accepted: 05/21/2010] [Indexed: 01/24/2023]
Abstract
We show that spatial continuity can enable a network to learn translation invariant representations of objects by self-organization in a hierarchical model of cortical processing in the ventral visual system. During 'continuous transformation learning', the active synapses from each overlapping transform are associatively modified onto the set of postsynaptic neurons. Because other transforms of the same object overlap with previously learned exemplars, a common set of postsynaptic neurons is activated by the new transforms, and learning of the new active inputs onto the same postsynaptic neurons is facilitated. We show that the transforms must be close for this to occur; that the temporal order of presentation of each transformed image during training is not crucial for learning to occur; that relatively large numbers of transforms can be learned; and that such continuous transformation learning can be usefully combined with temporal trace training.
Collapse
|
25
|
Duff A, Verschure PF. Unifying perceptual and behavioral learning with a correlative subspace learning rule. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2009.11.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
26
|
Michler F, Eckhorn R, Wachtler T. Using spatiotemporal correlations to learn topographic maps for invariant object recognition. J Neurophysiol 2009; 102:953-64. [PMID: 19494190 DOI: 10.1152/jn.90651.2008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The retinal image of visual objects can vary drastically with changes of viewing angle. Nevertheless, our visual system is capable of recognizing objects fairly invariant of viewing angle. Under natural viewing conditions, different views of the same object tend to occur in temporal proximity, thereby generating temporal correlations in the sequence of retinal images. Such spatial and temporal stimulus correlations can be exploited for learning invariant representations. We propose a biologically plausible mechanism that implements this learning strategy using the principle of self-organizing maps. We developed a network of spiking neurons that uses spatiotemporal correlations in the inputs to map different views of objects onto a topographic representation. After learning, different views of the same object are represented in a connected neighborhood of neurons. Model neurons of a higher processing area that receive unspecific input from a local neighborhood in the map show view-invariant selectivities for visual objects. The findings suggest a functional relevance of cortical topographic maps.
Collapse
Affiliation(s)
- Frank Michler
- NeuroPhysics Group, Philipps-University Marburg, 35032 Marburg, Germany.
| | | | | |
Collapse
|
27
|
Einhäuser W, Schumann F, Bardins S, Bartl K, Böning G, Schneider E, König P. Human eye-head co-ordination in natural exploration. NETWORK (BRISTOL, ENGLAND) 2007; 18:267-297. [PMID: 17926195 DOI: 10.1080/09548980701671094] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
During natural behavior humans continuously adjust their gaze by moving head and eyes, yielding rich dynamics of the retinal input. Sensory coding models, however, typically assume visual input as smooth or a sequence of static images interleaved by volitional gaze shifts. Are these assumptions valid during free exploration behavior in natural environments? We used an innovative technique to simultaneously record gaze and head movements in humans, who freely explored various environments (forest, train station, apartment). Most movements occur along the cardinal axes, and the predominance of vertical or horizontal movements depends on the environment. Eye and head movements co-occur more frequently than their individual statistics predicts under an independence assumption. The majority of co-occurring movements point in opposite directions, consistent with a gaze-stabilizing role of eye movements. Nevertheless, a substantial fraction of eye movements point in the same direction as co-occurring head movements. Even under the very most conservative assumptions, saccadic eye movements alone cannot account for these synergistic movements. Hence nonsaccadic eye movements that interact synergistically with head movements to adjust gaze cannot be neglected in natural visual input. Natural retinal input is continuously dynamic, and cannot be faithfully modeled as a mere sequence of static frames with interleaved large saccades.
Collapse
|
28
|
Kulvicius T, Porr B, Wörgötter F. Development of receptive fields in a closed-loop behavioural system. Neurocomputing 2007. [DOI: 10.1016/j.neucom.2006.10.132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
29
|
Martinez LM. The generation of receptive-field structure in cat primary visual cortex. PROGRESS IN BRAIN RESEARCH 2007; 154:73-92. [PMID: 17010704 DOI: 10.1016/s0079-6123(06)54004-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Cells in primary visual cortex show a remarkable variety of receptive-field structures. In spite of the extensive experimental and theoretical effort over the past 50 years, it has been difficult to establish how this diversity of functional-response properties emerges in the cortex. One of the reasons is that while functional studies in the early visual pathway have been usually carried out in vivo with extracellular recording techniques, investigations about the precise structure of the cortical network have mainly been conducted in vitro. Thus, the link between structure and function has rarely been explicitly established, remaining a well-known controversial issue. In this chapter, I review recent data that simultaneously combines anatomy with physiology at the intracellular level; trying to understand how the primary visual cortex transforms the information it receives from the thalamus to generate receptive-field structure, contrast-invariant orientation tuning and other functional-response properties.
Collapse
Affiliation(s)
- L M Martinez
- Departamento de Medicina, Facultade de Ciencias da Saude, Campus de Oza, Universidade da Coruña, 15006 La Coruña, Spain.
| |
Collapse
|
30
|
Learning Temporally Stable Representations from Natural Sounds: Temporal Stability as a General Objective Underlying Sensory Processing. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-74695-9_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
31
|
Schwartz O, Sejnowski TJ, Dayan P. Soft mixer assignment in a hierarchical generative model of natural scene statistics. Neural Comput 2006; 18:2680-718. [PMID: 16999575 PMCID: PMC2915771 DOI: 10.1162/neco.2006.18.11.2680] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Gaussian scale mixture models offer a top-down description of signal generation that captures key bottom-up statistical characteristics of filter responses to images. However, the pattern of dependence among the filters for this class of models is prespecified. We propose a novel extension to the gaussian scale mixture model that learns the pattern of dependence from observed inputs and thereby induces a hierarchical representation of these inputs. Specifically, we propose that inputs are generated by gaussian variables (modeling local filter structure), multiplied by a mixer variable that is assigned probabilistically to each input from a set of possible mixers. We demonstrate inference of both components of the generative model, for synthesized data and for different classes of natural images, such as a generic ensemble and faces. For natural images, the mixer variable assignments show invariances resembling those of complex cells in visual cortex; the statistics of the gaussian components of the model are in accord with the outputs of divisive normalization models. We also show how our model helps interrelate a wide range of models of image statistics and cortical processing.
Collapse
Affiliation(s)
- Odelia Schwartz
- Howard Hughes Medical Institute, Computational Neurobiology Lab, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A
| | - Terrence J. Sejnowski
- Howard Hughes Medical Institute, Computational Neurobiology Lab, Salk Institute for Biological Studies, La Jolla, CA 92037, and Department of Biology, University of California at San Diego, La Jolla, CA 92093, U.S.A
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, University College, London WC1N 3AR, U.K
| |
Collapse
|
32
|
Perry G, Rolls ET, Stringer SM. Spatial vs temporal continuity in view invariant visual object recognition learning. Vision Res 2006; 46:3994-4006. [PMID: 16996556 DOI: 10.1016/j.visres.2006.07.025] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2006] [Revised: 06/22/2006] [Accepted: 07/24/2006] [Indexed: 11/29/2022]
Abstract
We show in a 4-layer competitive neuronal network that continuous transformation learning, which uses spatial correlations and a purely associative (Hebbian) synaptic modification rule, can build view invariant representations of complex 3D objects. This occurs even when views of the different objects are interleaved, a condition where temporal trace learning fails. Human psychophysical experiments showed that view invariant object learning can occur when spatial but not temporal continuity applies because of interleaving of stimuli, although sequential presentation, which produces temporal continuity, can facilitate learning. Thus continuous transformation learning is an important principle that may contribute to view invariant object recognition.
Collapse
Affiliation(s)
- Gavin Perry
- Oxford University, Centre for Computational Neuroscience, Department of Experimental Psychology, Oxford, UK
| | | | | |
Collapse
|
33
|
Einhäuser W, Hipp J, Eggert J, Körner E, König P. Learning viewpoint invariant object representations using a temporal coherence principle. BIOLOGICAL CYBERNETICS 2005; 93:79-90. [PMID: 16021516 DOI: 10.1007/s00422-005-0585-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2004] [Accepted: 05/23/2005] [Indexed: 05/03/2023]
Abstract
Invariant object recognition is arguably one of the major challenges for contemporary machine vision systems. In contrast, the mammalian visual system performs this task virtually effortlessly. How can we exploit our knowledge on the biological system to improve artificial systems? Our understanding of the mammalian early visual system has been augmented by the discovery that general coding principles could explain many aspects of neuronal response properties. How can such schemes be transferred to system level performance? In the present study we train cells on a particular variant of the general principle of temporal coherence, the "stability" objective. These cells are trained on unlabeled real-world images without a teaching signal. We show that after training, the cells form a representation that is largely independent of the viewpoint from which the stimulus is looked at. This finding includes generalization to previously unseen viewpoints. The achieved representation is better suited for view-point invariant object classification than the cells' input patterns. This property to facilitate view-point invariant classification is maintained even if training and classification take place in the presence of an--also unlabeled--distractor object. In summary, here we show that unsupervised learning using a general coding principle facilitates the classification of real-world objects, that are not segmented from the background and undergo complex, non-isomorphic, transformations.
Collapse
Affiliation(s)
- Wolfgang Einhäuser
- Institute of Neuroinformatics, University & ETH Zürich, Zürich, Switzerland.
| | | | | | | | | |
Collapse
|
34
|
Hipp J, Einhäuser W, Conradt J, König P. Learning of somatosensory representations for texture discrimination using a temporal coherence principle. NETWORK (BRISTOL, ENGLAND) 2005; 16:223-38. [PMID: 16411497 DOI: 10.1080/09548980500361582] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
In order to perform appropriate actions, animals need to quickly and reliably classify their sensory input. How can representations suitable for classification be acquired from statistical properties of the animal's natural environment? Akin to behavioural studies in rats, we investigate this question using texture discrimination by the vibrissae system as a model. To account for the rat's active sensing behaviour, we record whisker movements in a hardware model. Based on these signals, we determine the response of primary neurons, modelled as spatio-temporal filters. Using their output, we train a second layer of neurons to optimise a temporal coherence objective function. The performance in classifying textures using a single cell strongly correlates with the cell's temporal coherence; hence output cells outperform primary cells. Using a simple, unsupervised classifier, the performance on the output cell population is same as if using a sophisticated supervised classifier on the primary cells. Our results demonstrate that the optimisation of temporal coherence yields a representation that facilitates subsequent classification by selectively conveying relevant information.
Collapse
Affiliation(s)
- Joerg Hipp
- Institute of Neuroinformatics, University of Zürich & Swiss Federal Institute of Technology (ETH), Zürich, Switzerland.
| | | | | | | |
Collapse
|
35
|
Li M, Clark JJ. A Temporal Stability Approach to Position and Attention-Shift-Invariant Recognition. Neural Comput 2004; 16:2293-321. [PMID: 15476602 DOI: 10.1162/0899766041941907] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Incorporation of visual-related self-action signals can help neural networks learn invariance. We describe a method that can produce a network with invariance to changes in visual input caused by eye movements and covert attention shifts. Training of the network is controlled by signals associated with eye movements and covert attention shifting. A temporal perceptual stability constraint is used to drive the output of the network toward remaining constant across temporal sequences of saccadicmotions and covert attention shifts. We use a four-layer neural network model to perform the position-invariant extraction of local features and temporal integration of invariant presentations of local features in a bottom-up structure. We present results on both simulated data and real images to demonstrate that our network can acquire both position and attention shift invariance.
Collapse
Affiliation(s)
- Muhua Li
- Centre for Intelligent Machines, McGill University, Montréal, Québec, Canada H3A 2A7.
| | | |
Collapse
|
36
|
Körding KP, Kayser C, Einhäuser W, König P. How are complex cell properties adapted to the statistics of natural stimuli? J Neurophysiol 2004; 91:206-12. [PMID: 12904330 DOI: 10.1152/jn.00149.2003] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Sensory areas should be adapted to the properties of their natural stimuli. What are the underlying rules that match the properties of complex cells in primary visual cortex to their natural stimuli? To address this issue, we sampled movies from a camera carried by a freely moving cat, capturing the dynamics of image motion as the animal explores an outdoor environment. We use these movie sequences as input to simulated neurons. Following the intuition that many meaningful high-level variables, e.g., identities of visible objects, do not change rapidly in natural visual stimuli, we adapt the neurons to exhibit firing rates that are stable over time. We find that simulated neurons, which have optimally stable activity, display many properties that are observed for cortical complex cells. Their response is invariant with respect to stimulus translation and reversal of contrast polarity. Furthermore, spatial frequency selectivity and the aspect ratio of the receptive field quantitatively match the experimentally observed characteristics of complex cells. Hence, the population of complex cells in the primary visual cortex can be described as forming an optimally stable representation of natural stimuli.
Collapse
Affiliation(s)
- Konrad P Körding
- Institute of Neurology, University College London, London, WC1N 3BG, United Kingdom.
| | | | | | | |
Collapse
|
37
|
Abstract
In the early 1960s, Hubel and Wiesel reported the first physiological description of cells in cat primary visual cortex. They distinguished two main cell types: simple cells and complex cells. Based on their distinct response properties, they suggested that the two cell types could represent two consecutive stages in receptive-field construction. Since the 1960s, new experimental and computational evidence provided serious alternatives to this hierarchical model. Parallel models put forward the idea that both simple and complex receptive fields could be built in parallel by direct geniculate inputs. Recurrent models suggested that simple cells and complex cells may not be different cell types after all. To this day, a consensus among hierarchical, parallel, and recurrent models has been difficult to attain; however, the circuitry used by all models is becoming increasingly similar. The authors review theoretical and experimental evidence for each line of models emphasizing their strengths and weaknesses.
Collapse
Affiliation(s)
- Luis M. Martinez
- Neuroscience and motor control group (Neurocom), Universidade de A Coruña, A Coruña, SPAIN
- Department of Medicine. Campus de Oza. Universidade de A Coruña, A Coruña, 15006, SPAIN
| | - Jose-Manuel Alonso
- Department of Psychology, University of Connecticut, Storrs, CT 06269, USA
- To whom correspondence should be addressed at: Department of Biological Sciences, SUNY-Optometry, New York, NY 10036, , Phone: (212) 780-0523, Fax: (212) 780-5194
| |
Collapse
|
38
|
Abstract
An emerging paradigm analyses in what respect the properties of the nervous system reflect properties of natural scenes. It is hypothesized that neurons form sparse representations of natural stimuli: each neuron should respond strongly to some stimuli while being inactive upon presentation of most others. For a given network, sparse representations need fewest spikes, and thus the nervous system can consume the least energy. To obtain optimally sparse responses the receptive fields of simulated neurons are optimized. Algorithmically this is identical to searching for basis functions that allow coding for the stimuli with sparse coefficients. The problem is identical to maximizing the log likelihood of a generative model with prior knowledge of natural images. It is found that the resulting simulated neurons share most properties of simple cells found in primary visual cortex. Thus, forming optimally sparse representations is a very compact approach to describing simple cell properties. Many ways of defining sparse responses exist and it is widely believed that the particular choice of the sparse prior of the generative model does not significantly influence the estimated basis functions. Here we examine this assumption more closely. We include the constraint of unit variance of neuronal activity, used in most studies, into the objective functions. We then analyze learning on a database of natural (cat-cam) visual stimuli. We show that the effective objective functions are largely dominated by the constraint, and are therefore very similar. The resulting receptive fields show some similarities but also qualitative differences. Even for coefficient values for which the objective functions are dissimilar, the distributions of coefficients are similar and do not match the priors of the assumed generative model. In conclusion, the specific choice of the sparse prior is relevant, as is the choice of additional constraints, such as normalization of variance.
Collapse
Affiliation(s)
- Konrad P Körding
- Institute of Neuroinformatics, University and ETH Zürich, Zürich, Switzerland.
| | | | | |
Collapse
|
39
|
Abstract
Learning in neural networks is usually applied to parameters related to linear kernels and keeps the nonlinearity of the model fixed. Thus, for successful models, properties and parameters of the nonlinearity have to be specified using a priori knowledge, which often is missing. Here, we investigate adapting the nonlinearity simultaneously with the linear kernel. We use natural visual stimuli for training a simple model of the visual system. Many of the neurons converge to an energy detector matching existing models of complex cells. The overall distribution of the parameter describing the nonlinearity well matches recent physiological results. Controls with randomly shuffled natural stimuli and pink noise demonstrate that the match of simulation and experimental results depends on the higher-order statistical properties of natural stimuli.
Collapse
Affiliation(s)
- Christoph Kayser
- Institute of Neuroinformatics, University / ETH Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.
| | | | | |
Collapse
|
40
|
McCandliss BD, Cohen L, Dehaene S. The visual word form area: expertise for reading in the fusiform gyrus. Trends Cogn Sci 2003; 7:293-299. [PMID: 12860187 DOI: 10.1016/s1364-6613(03)00134-7] [Citation(s) in RCA: 966] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Brain imaging studies reliably localize a region of visual cortex that is especially responsive to visual words. This brain specialization is essential to rapid reading ability because it enhances perception of words by becoming specifically tuned to recurring properties of a writing system. The origin of this specialization poses a challenge for evolutionary accounts involving innate mechanisms for functional brain organization. We propose an alternative account, based on studies of other forms of visual expertise (i.e. bird and car experts) that lead to functional reorganization. We argue that the interplay between the unique demands of word reading and the structural constraints of the visual system lead to the emergence of the Visual Word Form Area.
Collapse
Affiliation(s)
- Bruce D. McCandliss
- Sackler Institute for Developmental Psychobiology, Weill Medical College of Cornell University, Box 140, 1300 York Avenue, 10021, New York, NY, USA
| | | | | |
Collapse
|
41
|
Wersing H, Körner E. Learning optimized features for hierarchical models of invariant object recognition. Neural Comput 2003; 15:1559-88. [PMID: 12816566 DOI: 10.1162/089976603321891800] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.
Collapse
Affiliation(s)
- Heiko Wersing
- HONDA Research Institute Europe GmbH, 63073 Offenbach/Main, Germany.
| | | |
Collapse
|
42
|
Wiemer JC. The time-organized map algorithm: extending the self-organizing map to spatiotemporal signals. Neural Comput 2003; 15:1143-71. [PMID: 12803960 DOI: 10.1162/089976603765202695] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The new time-organized map (TOM) is presented for a better understanding of the self-organization and geometric structure of cortical signal representations. The algorithm extends the common self-organizing map (SOM) from the processing of purely spatial signals to the processing of spatiotemporal signals. The main additional idea of the TOM compared with the SOM is the functionally reasonable transfer of temporal signal distances into spatial signal distances in topographic neural representations. This is achieved by neural dynamics of propagating waves, allowing current and former signals to interact spatiotemporally in the neural network. Within a biologically plausible framework, the TOM algorithm (1) reveals how dynamic neural networks can self-organize to embed spatial signals in temporal context in order to realize functional meaningful invariances, (2) predicts time-organized representational structures in cortical areas representing signals with systematic temporal relation, and (3) suggests that the strength with which signals interact in the cortex determines the type of signal topology realized in topographic maps (e.g., spatially or temporally defined signal topology). Moreover, the TOM algorithm supports the explanation of topographic reorganizations based on time-to-space transformations (Wiemer, Spengler, Joublin, Stagge, & Wacquant, 2000).
Collapse
Affiliation(s)
- Jan C Wiemer
- Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany.
| |
Collapse
|