1
|
Qiu Y, Klindt DA, Szatko KP, Gonschorek D, Hoefling L, Schubert T, Busse L, Bethge M, Euler T. Efficient coding of natural scenes improves neural system identification. PLoS Comput Biol 2023; 19:e1011037. [PMID: 37093861 PMCID: PMC10159360 DOI: 10.1371/journal.pcbi.1011037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 05/04/2023] [Accepted: 03/20/2023] [Indexed: 04/25/2023] Open
Abstract
Neural system identification aims at learning the response function of neurons to arbitrary stimuli using experimentally recorded data, but typically does not leverage normative principles such as efficient coding of natural environments. Visual systems, however, have evolved to efficiently process input from the natural environment. Here, we present a normative network regularization for system identification models by incorporating, as a regularizer, the efficient coding hypothesis, which states that neural response properties of sensory representations are strongly shaped by the need to preserve most of the stimulus information with limited resources. Using this approach, we explored if a system identification model can be improved by sharing its convolutional filters with those of an autoencoder which aims to efficiently encode natural stimuli. To this end, we built a hybrid model to predict the responses of retinal neurons to noise stimuli. This approach did not only yield a higher performance than the "stand-alone" system identification model, it also produced more biologically plausible filters, meaning that they more closely resembled neural representation in early visual systems. We found these results applied to retinal responses to different artificial stimuli and across model architectures. Moreover, our normatively regularized model performed particularly well in predicting responses of direction-of-motion sensitive retinal neurons. The benefit of natural scene statistics became marginal, however, for predicting the responses to natural movies. In summary, our results indicate that efficiently encoding environmental inputs can improve system identification models, at least for noise stimuli, and point to the benefit of probing the visual system with naturalistic stimuli.
Collapse
Affiliation(s)
- Yongrong Qiu
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Graduate Training Centre of Neuroscience (GTC), International Max Planck Research School, U Tübingen, Tübingen, Germany
| | - David A Klindt
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Klaudia P Szatko
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Graduate Training Centre of Neuroscience (GTC), International Max Planck Research School, U Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| | - Dominic Gonschorek
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Research Training Group 2381, U Tübingen, Tübingen, Germany
| | - Larissa Hoefling
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| | - Timm Schubert
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
| | - Laura Busse
- Division of Neurobiology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Bernstein Center for Computational Neuroscience, Planegg-Martinsried, Germany
| | - Matthias Bethge
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Institute for Theoretical Physics, U Tübingen, Tübingen, Germany
| | - Thomas Euler
- Institute for Ophthalmic Research, U Tübingen, Tübingen, Germany
- Centre for Integrative Neuroscience (CIN), U Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| |
Collapse
|
2
|
Statistical analysis and optimality of neural systems. Neuron 2021; 109:1227-1241.e5. [DOI: 10.1016/j.neuron.2021.01.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 09/10/2020] [Accepted: 01/19/2021] [Indexed: 11/19/2022]
|
3
|
Loxley PN. A sparse code increases the speed and efficiency of neuro-dynamic programming for optimal control tasks with correlated inputs. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
4
|
Exploitation of image statistics with sparse coding in the case of stereo vision. Neural Netw 2020; 135:158-176. [PMID: 33388507 DOI: 10.1016/j.neunet.2020.12.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 12/06/2020] [Accepted: 12/14/2020] [Indexed: 11/23/2022]
Abstract
The sparse coding algorithm has served as a model for early processing in mammalian vision. It has been assumed that the brain uses sparse coding to exploit statistical properties of the sensory stream. We hypothesize that sparse coding discovers patterns from the data set, which can be used to estimate a set of stimulus parameters by simple readout. In this study, we chose a model of stereo vision to test our hypothesis. We used the Locally Competitive Algorithm (LCA), followed by a naïve Bayes classifier, to infer stereo disparity. From the results we report three observations. First, disparity inference was successful with this naturalistic processing pipeline. Second, an expanded, highly redundant representation is required to robustly identify the input patterns. Third, the inference error can be predicted from the number of active coefficients in the LCA representation. We conclude that sparse coding can generate a suitable general representation for subsequent inference tasks.
Collapse
|
5
|
Paiton DM, Frye CG, Lundquist SY, Bowen JD, Zarcone R, Olshausen BA. Selectivity and robustness of sparse coding networks. J Vis 2020; 20:10. [PMID: 33237290 PMCID: PMC7691792 DOI: 10.1167/jov.20.12.10] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We investigate how the population nonlinearities resulting from lateral inhibition and thresholding in sparse coding networks influence neural response selectivity and robustness. We show that when compared to pointwise nonlinear models, such population nonlinearities improve the selectivity to a preferred stimulus and protect against adversarial perturbations of the input. These findings are predicted from the geometry of the single-neuron iso-response surface, which provides new insight into the relationship between selectivity and adversarial robustness. Inhibitory lateral connections curve the iso-response surface outward in the direction of selectivity. Since adversarial perturbations are orthogonal to the iso-response surface, adversarial attacks tend to be aligned with directions of selectivity. Consequently, the network is less easily fooled by perceptually irrelevant perturbations to the input. Together, these findings point to benefits of integrating computational principles found in biological vision systems into artificial neural networks.
Collapse
Affiliation(s)
- Dylan M Paiton
- Vision Science Graduate Group, University of California Berkeley, Berkeley, CA, USA.,Redwood Center for Theoretical Neuroscience, University of California Berkeley, Berkeley, CA, USA.,
| | - Charles G Frye
- Redwood Center for Theoretical Neuroscience, University of California Berkeley, Berkeley, CA, USA.,Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA.,
| | - Sheng Y Lundquist
- Department of Computer Science, Portland State University, Portland, OR, USA.,
| | - Joel D Bowen
- Vision Science Graduate Group, University of California Berkeley, Berkeley, CA, USA.,
| | - Ryan Zarcone
- Redwood Center for Theoretical Neuroscience, University of California Berkeley, Berkeley, CA, USA.,Biophysics, University of California Berkeley, Berkeley, CA, USA.,
| | - Bruno A Olshausen
- Vision Science Graduate Group, University of California Berkeley, Berkeley, CA, USA.,Redwood Center for Theoretical Neuroscience, University of California Berkeley, Berkeley, CA, USA.,Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA.,
| |
Collapse
|
6
|
Luo YL, Wang YY, Zhu SF, Zhao L, Yin YL, Geng MW, Lei CQ, Yang YH, Li JF, Ni GX. An EZ-Diffusion Model Analysis of Attentional Ability in Patients With Retinal Pigmentosa. Front Neurosci 2020; 14:583493. [PMID: 33505235 PMCID: PMC7829550 DOI: 10.3389/fnins.2020.583493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 12/08/2020] [Indexed: 02/05/2023] Open
Abstract
Retinitis pigmentosa (RP) is characterized by visual acuity decrease and visual field loss. However, the impact of visual field loss on the cognitive performance of RP patients remains unknown. In the present study, in order to understand whether and how RP affects spatial processing and attentional function, one spatial processing task and three attentional tasks were conducted on RP patients and healthy controls. In addition, an EZ-diffusion model was performed for further data analysis with four parameters, mean decision time, non-decision time, drift rate, and boundary separation. It was found that in the spatial processing task, compared with the control group, the RP group exhibited a slower response speed in large and medium visual eccentricities, and slower drift rate for the large stimulus, which is strongly verified by the significant linear correlation between the visual field eccentricity with both reaction time (p = 0.047) and non-decision time (p = 0.043) in RP patients. In the attentional orienting task and the attentional switching task, RP exerted a reduction of speed and an increase of non-decision time on every condition, with a decrease of drift rate in the orienting task and boundary separation in the switching task. In addition, the switching cost for large stimulus was observed in the control group but not in the RP group. The stop-signal task demonstrated similar inhibition function between the two groups. These findings implied that RP exerted the impairment of spatial cognition correlated with the visual field eccentricity, mainly in the peripheral visual field. Moreover, specific to the peripheral visual field, RP patients had deficits in the attentional orienting and flexibility but not in the attentional inhibition.
Collapse
Affiliation(s)
- Yan-Lin Luo
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Yuan-Ying Wang
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Su-Fang Zhu
- Second Hospital of Armed Police Beijing Office, Beijing, China
| | - Li Zhao
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Yan-Ling Yin
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Meng-Wen Geng
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Chu-Qi Lei
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Yan-Hui Yang
- Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Jun-Fa Li
- Department of Neurobiology, Capital Medical University, Beijing, China
| | - Guo-Xin Ni
- School of Sports Medicine and Rehabilitation, Beijing Sport University, Beijing, China
- *Correspondence: Guo-Xin Ni, ;
| |
Collapse
|
7
|
Dodds EM, DeWeese MR. On the Sparse Structure of Natural Sounds and Natural Images: Similarities, Differences, and Implications for Neural Coding. Front Comput Neurosci 2019; 13:39. [PMID: 31293408 PMCID: PMC6606779 DOI: 10.3389/fncom.2019.00039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 06/05/2019] [Indexed: 11/25/2022] Open
Abstract
Sparse coding models of natural images and sounds have been able to predict several response properties of neurons in the visual and auditory systems. While the success of these models suggests that the structure they capture is universal across domains to some degree, it is not yet clear which aspects of this structure are universal and which vary across sensory modalities. To address this, we fit complete and highly overcomplete sparse coding models to natural images and spectrograms of speech and report on differences in the statistics learned by these models. We find several types of sparse features in natural images, which all appear in similar, approximately Laplace distributions, whereas the many types of sparse features in speech exhibit a broad range of sparse distributions, many of which are highly asymmetric. Moreover, individual sparse coding units tend to exhibit higher lifetime sparseness for overcomplete models trained on images compared to those trained on speech. Conversely, population sparseness tends to be greater for these networks trained on speech compared with sparse coding models of natural images. To illustrate the relevance of these findings to neural coding, we studied how they impact a biologically plausible sparse coding network's representations in each sensory modality. In particular, a sparse coding network with synaptically local plasticity rules learns different sparse features from speech data than are found by more conventional sparse coding algorithms, but the learned features are qualitatively the same for these models when trained on natural images.
Collapse
Affiliation(s)
- Eric McVoy Dodds
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, Berkeley, CA, United States
- Department of Physics, University of California, Berkeley, Berkeley, CA, United States
| | - Michael Robert DeWeese
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, Berkeley, CA, United States
- Department of Physics, University of California, Berkeley, Berkeley, CA, United States
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
8
|
Cadena SA, Denfield GH, Walker EY, Gatys LA, Tolias AS, Bethge M, Ecker AS. Deep convolutional models improve predictions of macaque V1 responses to natural images. PLoS Comput Biol 2019; 15:e1006897. [PMID: 31013278 PMCID: PMC6499433 DOI: 10.1371/journal.pcbi.1006897] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/03/2019] [Accepted: 02/21/2019] [Indexed: 11/18/2022] Open
Abstract
Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have emerged for modeling these nonlinear computations: transfer learning from artificial neural networks trained on object recognition and data-driven convolutional neural network models trained end-to-end on large populations of neurons. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. We found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals. Predicting the responses of sensory neurons to arbitrary natural stimuli is of major importance for understanding their function. Arguably the most studied cortical area is primary visual cortex (V1), where many models have been developed to explain its function. However, the most successful models built on neurophysiologists’ intuitions still fail to account for spiking responses to natural images. Here, we model spiking activity in primary visual cortex (V1) of monkeys using deep convolutional neural networks (CNNs), which have been successful in computer vision. We both trained CNNs directly to fit the data, and used CNNs trained to solve a high-level task (object categorization). With these approaches, we are able to outperform previous models and improve the state of the art in predicting the responses of early visual neurons to natural images. Our results have two important implications. First, since V1 is the result of several nonlinear stages, it should be modeled as such. Second, functional models of entire visual pathways, of which V1 is an early stage, do not only account for higher areas of such pathways, but also provide useful representations for V1 predictions.
Collapse
Affiliation(s)
- Santiago A. Cadena
- Centre for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| | - George H. Denfield
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Houston, Texas, United States of America
| | - Edgar Y. Walker
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Houston, Texas, United States of America
| | - Leon A. Gatys
- Centre for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
| | - Andreas S. Tolias
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Houston, Texas, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, Houston, Texas, United States of America
| | - Matthias Bethge
- Centre for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Alexander S. Ecker
- Centre for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
9
|
Sanchez-Giraldo LG, Laskar MNU, Schwartz O. Normalization and pooling in hierarchical models of natural images. Curr Opin Neurobiol 2019; 55:65-72. [PMID: 30785005 DOI: 10.1016/j.conb.2019.01.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 12/29/2018] [Accepted: 01/13/2019] [Indexed: 11/17/2022]
Abstract
Divisive normalization and subunit pooling are two canonical classes of computation that have become widely used in descriptive (what) models of visual cortical processing. Normative (why) models from natural image statistics can help constrain the form and parameters of such classes of models. We focus on recent advances in two particular directions, namely deriving richer forms of divisive normalization, and advances in learning pooling from image statistics. We discuss the incorporation of such components into hierarchical models. We consider both hierarchical unsupervised learning from image statistics, and discriminative supervised learning in deep convolutional neural networks (CNNs). We further discuss studies on the utility and extensions of the convolutional architecture, which has also been adopted by recent descriptive models. We review the recent literature and discuss the current promises and gaps of using such approaches to gain a better understanding of how cortical neurons represent and process complex visual stimuli.
Collapse
Affiliation(s)
- Luis G Sanchez-Giraldo
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States.
| | - Md Nasir Uddin Laskar
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| | - Odelia Schwartz
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| |
Collapse
|
10
|
Turner MH, Sanchez Giraldo LG, Schwartz O, Rieke F. Stimulus- and goal-oriented frameworks for understanding natural vision. Nat Neurosci 2019; 22:15-24. [PMID: 30531846 PMCID: PMC8378293 DOI: 10.1038/s41593-018-0284-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 10/22/2018] [Indexed: 12/21/2022]
Abstract
Our knowledge of sensory processing has advanced dramatically in the last few decades, but this understanding remains far from complete, especially for stimuli with the large dynamic range and strong temporal and spatial correlations characteristic of natural visual inputs. Here we describe some of the issues that make understanding the encoding of natural images a challenge. We highlight two broad strategies for approaching this problem: a stimulus-oriented framework and a goal-oriented one. Different contexts can call for one framework or the other. Looking forward, recent advances, particularly those based in machine learning, show promise in borrowing key strengths of both frameworks and by doing so illuminating a path to a more comprehensive understanding of the encoding of natural stimuli.
Collapse
Affiliation(s)
- Maxwell H Turner
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Graduate Program in Neuroscience, University of Washington, Seattle, WA, USA
| | | | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Fred Rieke
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
11
|
Loxley PN. The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images. Neural Comput 2017; 29:2769-2799. [PMID: 28777727 DOI: 10.1162/neco_a_00997] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.
Collapse
Affiliation(s)
- P N Loxley
- School of Science and Technology, University of New England, Armidale 2351, NSW, Australia
| |
Collapse
|
12
|
Khaligh-Razavi SM, Henriksson L, Kay K, Kriegeskorte N. Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models. JOURNAL OF MATHEMATICAL PSYCHOLOGY 2017; 76:184-197. [PMID: 28298702 PMCID: PMC5341758 DOI: 10.1016/j.jmp.2016.10.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Studies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area's representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model's original feature space and the hypothesis space generated by linear transformations of that feature space.
Collapse
Affiliation(s)
- Seyed-Mahdi Khaligh-Razavi
- MRC Cognition and Brain Sciences Unit, Cambridge, UK
- Computer Science & Artificial intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Linda Henriksson
- MRC Cognition and Brain Sciences Unit, Cambridge, UK
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Kendrick Kay
- Department of Psychology, Washington University in St. Louis, St. Louis, MO, USA
| | | |
Collapse
|
13
|
Brito CSN, Gerstner W. Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation. PLoS Comput Biol 2016; 12:e1005070. [PMID: 27690349 PMCID: PMC5045191 DOI: 10.1371/journal.pcbi.1005070] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 07/19/2016] [Indexed: 11/19/2022] Open
Abstract
The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely nonlinear Hebbian learning. When nonlinear Hebbian learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities. The question of how the brain self-organizes to develop precisely tuned neurons has puzzled neuroscientists at least since the discoveries of Hubel and Wiesel. In the past decades, a variety of theories and models have been proposed to describe receptive field formation, notably V1 simple cells, from natural inputs. We cut through the jungle of candidate explanations by demonstrating that in fact a single principle is sufficient to explain receptive field development. Our results follow from two major insights. First, we show that many representative models of sensory development are in fact implementing variations of a common principle: nonlinear Hebbian learning. Second, we reveal that nonlinear Hebbian learning is sufficient for receptive field formation through sensory inputs. The surprising result is that our findings are robust of specific details of a model, and allows for robust predictions on the learned receptive fields. Nonlinear Hebbian learning is therefore general in two senses: it applies to many models developed by theoreticians, and to many sensory modalities studied by experimental neuroscientists.
Collapse
Affiliation(s)
- Carlos S. N. Brito
- School of Computer and Communication Sciences and School of Life Science, Brain Mind Institute, Ecole Polytechnique Federale de Lausanne, Lausanne EPFL, Switzerland
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
- * E-mail:
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Science, Brain Mind Institute, Ecole Polytechnique Federale de Lausanne, Lausanne EPFL, Switzerland
| |
Collapse
|
14
|
Population-Level Neural Codes Are Robust to Single-Neuron Variability from a Multidimensional Coding Perspective. Cell Rep 2016; 16:2486-98. [DOI: 10.1016/j.celrep.2016.07.065] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 04/21/2016] [Accepted: 07/25/2016] [Indexed: 11/23/2022] Open
|
15
|
Doi E, Lewicki MS. A simple model of optimal population coding for sensory systems. PLoS Comput Biol 2014; 10:e1003761. [PMID: 25121492 PMCID: PMC4133057 DOI: 10.1371/journal.pcbi.1003761] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 06/17/2014] [Indexed: 12/02/2022] Open
Abstract
A fundamental task of a sensory system is to infer information about the environment. It has long been suggested that an important goal of the first stage of this process is to encode the raw sensory signal efficiently by reducing its redundancy in the neural representation. Some redundancy, however, would be expected because it can provide robustness to noise inherent in the system. Encoding the raw sensory signal itself is also problematic, because it contains distortion and noise. The optimal solution would be constrained further by limited biological resources. Here, we analyze a simple theoretical model that incorporates these key aspects of sensory coding, and apply it to conditions in the retina. The model specifies the optimal way to incorporate redundancy in a population of noisy neurons, while also optimally compensating for sensory distortion and noise. Importantly, it allows an arbitrary input-to-output cell ratio between sensory units (photoreceptors) and encoding units (retinal ganglion cells), providing predictions of retinal codes at different eccentricities. Compared to earlier models based on redundancy reduction, the proposed model conveys more information about the original signal. Interestingly, redundancy reduction can be near-optimal when the number of encoding units is limited, such as in the peripheral retina. We show that there exist multiple, equally-optimal solutions whose receptive field structure and organization vary significantly. Among these, the one which maximizes the spatial locality of the computation, but not the sparsity of either synaptic weights or neural responses, is consistent with known basic properties of retinal receptive fields. The model further predicts that receptive field structure changes less with light adaptation at higher input-to-output cell ratios, such as in the periphery. Studies of the computational principles of sensory coding have largely focused on the redundancy reduction hypothesis, which posits that a neural population should encode the raw sensory signal efficiently by reducing its redundancy. Models based on this idea, however, have not taken into account some important aspects of sensory systems. First, neurons are noisy, and therefore, some redundancy in the code can be useful for transmitting information reliably. Second, the sensory signal itself is noisy, which should be counteracted as early as possible in the sensory pathway. Finally, neural resources such as the number of neurons are limited, which should strongly affect the form of the sensory code. Here we examine a simple model that takes all these factors into account. We find that the model conveys more information compared to redundancy reduction. When applied to the retina, the model provides a unified functional account for several known properties of retinal coding and makes novel predictions that have yet to be tested experimentally. The generality of the framework allows it to model a wide range of conditions and can be applied to predict optimal sensory coding in other systems.
Collapse
Affiliation(s)
- Eizaburo Doi
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| | - Michael S. Lewicki
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, Ohio, United States of America
| |
Collapse
|
16
|
Froudarakis E, Berens P, Ecker AS, Cotton RJ, Sinz FH, Yatsenko D, Saggau P, Bethge M, Tolias AS. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nat Neurosci 2014; 17:851-7. [PMID: 24747577 PMCID: PMC4106281 DOI: 10.1038/nn.3707] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 03/27/2014] [Indexed: 12/17/2022]
Abstract
Neural codes are believed to have adapted to the statistical properties of the natural environment. However, the principles that govern the organization of ensemble activity in the visual cortex during natural visual input are unknown. We recorded populations of up to 500 neurons in the mouse primary visual cortex and characterized the structure of their activity, comparing responses to natural movies with those to control stimuli. We found that higher order correlations in natural scenes induced a sparser code, in which information is encoded by reliable activation of a smaller set of neurons and can be read out more easily. This computationally advantageous encoding for natural scenes was state-dependent and apparent only in anesthetized and active awake animals, but not during quiet wakefulness. Our results argue for a functional benefit of sparsification that could be a general principle governing the structure of the population activity throughout cortical microcircuits.
Collapse
Affiliation(s)
| | - Philipp Berens
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Werner-Reichardt-Center for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Germany
| | - Alexander S. Ecker
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Werner-Reichardt-Center for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Germany
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - R. James Cotton
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Fabian H. Sinz
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Institute for Neurobiology, Department for Neuroethology, University Tübingen, Germany
| | - Dimitri Yatsenko
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Peter Saggau
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Matthias Bethge
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Werner-Reichardt-Center for Integrative Neuroscience and Institute for Theoretical Physics, University of Tübingen, Germany
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Andreas S. Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Department of Computational and Applied Mathematics, Rice University, Houston, TX, USA
| |
Collapse
|
17
|
Sinz FH, Bethge M. What is the limit of redundancy reduction with divisive normalization? Neural Comput 2013; 25:2809-14. [PMID: 23895047 DOI: 10.1162/neco_a_00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Divisive normalization has been proposed as a nonlinear redundancy reduction mechanism capturing contrast correlations. Its basic function is a radial rescaling of the population response. Because of the saturation of divisive normalization, however, it is impossible to achieve a fully independent representation. In this letter, we derive an analytical upper bound on the inevitable residual redundancy of any saturating radial rescaling mechanism.
Collapse
Affiliation(s)
- Fabian H Sinz
- Institute for Neurobiology, Department for Neuroethology, Eberhard Karls University Tübingen, 72076 Tübingen, Germany
| | | |
Collapse
|
18
|
Hunt JJ, Dayan P, Goodhill GJ. Sparse coding can predict primary visual cortex receptive field changes induced by abnormal visual input. PLoS Comput Biol 2013; 9:e1003005. [PMID: 23675290 PMCID: PMC3649976 DOI: 10.1371/journal.pcbi.1003005] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Accepted: 02/10/2013] [Indexed: 11/24/2022] Open
Abstract
Receptive fields acquired through unsupervised learning of sparse representations of natural scenes have similar properties to primary visual cortex (V1) simple cell receptive fields. However, what drives in vivo development of receptive fields remains controversial. The strongest evidence for the importance of sensory experience in visual development comes from receptive field changes in animals reared with abnormal visual input. However, most sparse coding accounts have considered only normal visual input and the development of monocular receptive fields. Here, we applied three sparse coding models to binocular receptive field development across six abnormal rearing conditions. In every condition, the changes in receptive field properties previously observed experimentally were matched to a similar and highly faithful degree by all the models, suggesting that early sensory development can indeed be understood in terms of an impetus towards sparsity. As previously predicted in the literature, we found that asymmetries in inter-ocular correlation across orientations lead to orientation-specific binocular receptive fields. Finally we used our models to design a novel stimulus that, if present during rearing, is predicted by the sparsity principle to lead robustly to radically abnormal receptive fields. The responses of neurons in the primary visual cortex (V1), a region of the brain involved in encoding visual input, are modified by the visual experience of the animal during development. For example, most neurons in animals reared viewing stripes of a particular orientation only respond to the orientation that the animal experienced. The responses of V1 cells in normal animals are similar to responses that simple optimisation algorithms can learn when trained on images. However, whether the similarity between these algorithms and V1 responses is merely coincidental has been unclear. Here, we used the results of a number of experiments where animals were reared with modified visual experience to test the explanatory power of three related optimisation algorithms. We did this by filtering the images for the algorithms in ways that mimicked the visual experience of the animals. This allowed us to show that the changes in V1 responses in experiment were consistent with the algorithms. This is evidence that the precepts of the algorithms, notably sparsity, can be used to understand the development of V1 responses. Further, we used our model to propose a novel rearing condition which we expect to have a dramatic effect on development.
Collapse
Affiliation(s)
- Jonathan J. Hunt
- Queensland Brain Institute, University of Queensland, St Lucia, Australia
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
| | - Geoffrey J. Goodhill
- Queensland Brain Institute, University of Queensland, St Lucia, Australia
- School of Mathematics and Physics, University of Queensland, St Lucia, Australia
- * E-mail:
| |
Collapse
|
19
|
Makin JG, Fellows MR, Sabes PN. Learning multisensory integration and coordinate transformation via density estimation. PLoS Comput Biol 2013; 9:e1003035. [PMID: 23637588 PMCID: PMC3630212 DOI: 10.1371/journal.pcbi.1003035] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 03/03/2013] [Indexed: 11/19/2022] Open
Abstract
Sensory processing in the brain includes three key operations: multisensory integration-the task of combining cues into a single estimate of a common underlying stimulus; coordinate transformations-the change of reference frame for a stimulus (e.g., retinotopic to body-centered) effected through knowledge about an intervening variable (e.g., gaze position); and the incorporation of prior information. Statistically optimal sensory processing requires that each of these operations maintains the correct posterior distribution over the stimulus. Elements of this optimality have been demonstrated in many behavioral contexts in humans and other animals, suggesting that the neural computations are indeed optimal. That the relationships between sensory modalities are complex and plastic further suggests that these computations are learned-but how? We provide a principled answer, by treating the acquisition of these mappings as a case of density estimation, a well-studied problem in machine learning and statistics, in which the distribution of observed data is modeled in terms of a set of fixed parameters and a set of latent variables. In our case, the observed data are unisensory-population activities, the fixed parameters are synaptic connections, and the latent variables are multisensory-population activities. In particular, we train a restricted Boltzmann machine with the biologically plausible contrastive-divergence rule to learn a range of neural computations not previously demonstrated under a single approach: optimal integration; encoding of priors; hierarchical integration of cues; learning when not to integrate; and coordinate transformation. The model makes testable predictions about the nature of multisensory representations.
Collapse
Affiliation(s)
- Joseph G Makin
- Department of Physiology and the Center for Integrative Neuroscience, University of California San Francisco, San Francisco, California, USA.
| | | | | |
Collapse
|
20
|
Sinz F, Bethge M. Temporal adaptation enhances efficient contrast gain control on natural images. PLoS Comput Biol 2013; 9:e1002889. [PMID: 23382664 PMCID: PMC3561086 DOI: 10.1371/journal.pcbi.1002889] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 12/04/2012] [Indexed: 11/21/2022] Open
Abstract
Divisive normalization in primary visual cortex has been linked to adaptation to natural image statistics in accordance to Barlow's redundancy reduction hypothesis. Using recent advances in natural image modeling, we show that the previously studied static model of divisive normalization is rather inefficient in reducing local contrast correlations, but that a simple temporal contrast adaptation mechanism of the half-saturation constant can substantially increase its efficiency. Our findings reveal the experimentally observed temporal dynamics of divisive normalization to be critical for redundancy reduction. The redundancy reduction hypothesis postulates that neural representations adapt to sensory input statistics such that their responses become as statistically independent as possible. Based on this hypothesis, many properties of early visual neurons—like orientation selectivity or divisive normalization—have been linked to natural image statistics. Divisive normalization, in particular, models a widely observed neural response property: The divisive inhibition of a single neuron by a pool of others. This mechanism has been shown to reduce the redundancy among neural responses to typical contrast dependencies in natural images. Here, we show that the standard model of divisive normalization achieves substantially less redundancy reduction than a theoretically optimal mechanism called radial factorization. On the other hand, we find that radial factorization is inconsistent with existing neurophysiological observations. As a solution we suggest a new physiologically plausible modification of the standard model which accounts for the dynamics of the visual input by adapting to local contrasts during fixations. In this way the dynamic version of the standard model achieves almost optimal redundancy reduction performance. Our results imply that the dynamics of natural viewing conditions are critical for testing the role of divisive normalization for redundancy reduction.
Collapse
Affiliation(s)
- Fabian Sinz
- Department for Neuroethology, University Tübingen, Tübingen, Germany.
| | | |
Collapse
|
21
|
How sensitive is the human visual system to the local statistics of natural images? PLoS Comput Biol 2013; 9:e1002873. [PMID: 23358106 PMCID: PMC3554546 DOI: 10.1371/journal.pcbi.1002873] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 11/21/2012] [Indexed: 11/19/2022] Open
Abstract
A key hypothesis in sensory system neuroscience is that sensory representations are adapted to the statistical regularities in sensory signals and thereby incorporate knowledge about the outside world. Supporting this hypothesis, several probabilistic models of local natural image regularities have been proposed that reproduce neural response properties. Although many such physiological links have been made, these models have not been linked directly to visual sensitivity. Previous psychophysical studies of sensitivity to natural image regularities focus on global perception of large images, but much less is known about sensitivity to local natural image regularities. We present a new paradigm for controlled psychophysical studies of local natural image regularities and compare how well such models capture perceptually relevant image content. To produce stimuli with precise statistics, we start with a set of patches cut from natural images and alter their content to generate a matched set whose joint statistics are equally likely under a probabilistic natural image model. The task is forced choice to discriminate natural patches from model patches. The results show that human observers can learn to discriminate the higher-order regularities in natural images from those of model samples after very few exposures and that no current model is perfect for patches as small as 5 by 5 pixels or larger. Discrimination performance was accurately predicted by model likelihood, an information theoretic measure of model efficacy, indicating that the visual system possesses a surprisingly detailed knowledge of natural image higher-order correlations, much more so than current image models. We also perform three cue identification experiments to interpret how model features correspond to perceptually relevant image features. Several aspects of primate visual physiology have been identified as adaptations to local regularities of natural images. However, much less work has measured visual sensitivity to local natural image regularities. Most previous work focuses on global perception of large images and shows that observers are more sensitive to visual information when image properties resemble those of natural images. In this work we measure human sensitivity to local natural image regularities using stimuli generated by patch-based probabilistic natural image models that have been related to primate visual physiology. We find that human observers can learn to discriminate the statistical regularities of natural image patches from those represented by current natural image models after very few exposures and that discriminability depends on the degree of regularities captured by the model. The quick learning we observed suggests that the human visual system is biased for processing natural images, even at very fine spatial scales, and that it has a surprisingly large knowledge of the regularities in natural images, at least in comparison to the state-of-the-art statistical models of natural images.
Collapse
|
22
|
Abstract
Animals living in groups collectively produce social structure. In this context individuals make strategic decisions about when to cooperate and compete. This requires that individuals can perceive patterns in collective dynamics, but how this pattern extraction occurs is unclear. Our goal is to identify a model that extracts meaningful social patterns from a behavioral time series while remaining cognitively parsimonious by making the fewest demands on memory. Using fine-grained conflict data from macaques, we show that sparse coding, an important principle of neural compression, is an effective method for compressing collective behavior. The sparse code is shown to be efficient, predictive, and socially meaningful. In our monkey society, the sparse code of conflict is composed of related individuals, the policers, and the alpha female. Our results suggest that sparse coding is a natural technique for pattern extraction when cognitive constraints and small sample sizes limit the complexity of inferential models. Our approach highlights the need for cognitive experiments addressing how individuals perceive collective features of social organization.
Collapse
|
23
|
Doi E, Lewicki MS. Characterization of Minimum Error Linear Coding with Sensory and Neural Noise. Neural Comput 2011; 23:2498-510. [PMID: 21732860 DOI: 10.1162/neco_a_00181] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Robust coding has been proposed as a solution to the problem of minimizing decoding error in the presence of neural noise. Many real-world problems, however, have degradation in the input signal, not just in neural representations. This generalized problem is more relevant to biological sensory coding where internal noise arises from limited neural precision and external noise from distortion of sensory signal such as blurring and phototransduction noise. In this note, we show that the optimal linear encoder for this problem can be decomposed exactly into two serial processes that can be optimized separately. One is Wiener filtering, which optimally compensates for input degradation. The other is robust coding, which best uses the available representational capacity for signal transmission with a noisy population of linear neurons. We also present spectral analysis of the decomposition that characterizes how the reconstruction error is minimized under different input signal spectra, types and amounts of degradation, degrees of neural precision, and neural population sizes.
Collapse
Affiliation(s)
- Eizaburo Doi
- Center for Neural Science, New York University, New York, NY 10003, U.S.A
| | - Michael S. Lewicki
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH 44106, U.S.A
| |
Collapse
|
24
|
Lyu S. Dependency reduction with divisive normalization: justification and effectiveness. Neural Comput 2011; 23:2942-73. [PMID: 21851283 DOI: 10.1162/neco_a_00197] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Efficient coding transforms that reduce or remove statistical dependencies in natural sensory signals are important for both biology and engineering. In recent years, divisive normalization (DN) has been advocated as a simple and effective nonlinear efficient coding transform. In this work, we first elaborate on the theoretical justification for DN as an efficient coding transform. Specifically, we use the multivariate t model to represent several important statistical properties of natural sensory signals and show that DN approximates the optimal transforms that eliminate statistical dependencies in the multivariate t model. Second, we show that several forms of DN used in the literature are equivalent in their effects as efficient coding transforms. Third, we provide a quantitative evaluation of the overall dependency reduction performance of DN for both the multivariate t models and natural sensory signals. Finally, we find that statistical dependencies in the multivariate t model and natural sensory signals are increased by the DN transform with low-input dimensions. This implies that for DN to be an effective efficient coding transform, it has to pool over a sufficiently large number of inputs.
Collapse
Affiliation(s)
- Siwei Lyu
- Computer Science Department, University at Albany, State University of New York, Albany, NY 12222, USA.
| |
Collapse
|
25
|
Laparra V, Camps-Valls G, Malo J. Iterative Gaussianization: from ICA to random rotations. ACTA ACUST UNITED AC 2011; 22:537-49. [PMID: 21349790 DOI: 10.1109/tnn.2011.2106511] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most signal processing problems involve the challenging task of multidimensional probability density function (PDF) estimation. In this paper, we propose a solution to this problem by using a family of rotation-based iterative Gaussianization (RBIG) transforms. The general framework consists of the sequential application of a univariate marginal Gaussianization transform followed by an orthonormal transform. The proposed procedure looks for differentiable transforms to a known PDF so that the unknown PDF can be estimated at any point of the original domain. In particular, we aim at a zero-mean unit-covariance Gaussian for convenience. RBIG is formally similar to classical iterative projection pursuit algorithms. However, we show that, unlike in PP methods, the particular class of rotations used has no special qualitative relevance in this context, since looking for interestingness is not a critical issue for PDF estimation. The key difference is that our approach focuses on the univariate part (marginal Gaussianization) of the problem rather than on the multivariate part (rotation). This difference implies that one may select the most convenient rotation suited to each practical application. The differentiability, invertibility, and convergence of RBIG are theoretically and experimentally analyzed. Relation to other methods, such as radial Gaussianization, one-class support vector domain description, and deep neural networks is also pointed out. The practical performance of RBIG is successfully illustrated in a number of multidimensional problems such as image synthesis, classification, denoising, and multi-information estimation.
Collapse
Affiliation(s)
- Valero Laparra
- Image Processing Laboratory, Universitat de València, Paterna 46980, Spain.
| | | | | |
Collapse
|
26
|
Lower bounds on the redundancy of natural images. Vision Res 2010; 50:2213-22. [DOI: 10.1016/j.visres.2010.07.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Revised: 07/28/2010] [Accepted: 07/28/2010] [Indexed: 11/23/2022]
|
27
|
Malo J, Laparra V. Psychophysically tuned divisive normalization approximately factorizes the PDF of natural images. Neural Comput 2010; 22:3179-206. [PMID: 20858127 DOI: 10.1162/neco_a_00046] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The conventional approach in computational neuroscience in favor of the efficient coding hypothesis goes from image statistics to perception. It has been argued that the behavior of the early stages of biological visual processing (e.g., spatial frequency analyzers and their nonlinearities) may be obtained from image samples and the efficient coding hypothesis using no psychophysical or physiological information. In this work we address the same issue in the opposite direction: from perception to image statistics. We show that psychophysically fitted image representation in V1 has appealing statistical properties, for example, approximate PDF factorization and substantial mutual information reduction, even though no statistical information is used to fit the V1 model. These results are complementary evidence in favor of the efficient coding hypothesis.
Collapse
Affiliation(s)
- Jesús Malo
- Image Processing Laboratory, Universitat de València, 46980 Paterna, València, Spain.
| | | |
Collapse
|
28
|
Getting real-sensory processing of natural stimuli. Curr Opin Neurobiol 2010; 20:389-95. [PMID: 20434327 DOI: 10.1016/j.conb.2010.03.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Revised: 03/12/2010] [Accepted: 03/29/2010] [Indexed: 11/18/2022]
Abstract
Normal sensory experience rarely presents us with isolated bars, gratings, or other stimuli that have shaped our knowledge of sensory representations. Instead, typical input adheres to certain statistical regularities, which make it 'natural' and cannot be adequately modeled by linear superposition of simple stimuli. Natural stimuli necessitate a paradigm shift with a focus on downstream processing. This shift currently follows three main lines: quantification of the information a downstream area can read out (decoding); describing a representation as the optimization of computational principles with respect to natural input (normative approach); understanding the sensory representation as optimal for the systems' tasks and intended actions (behavioral context). The interaction between representational levels, intermediate-level features, and bidirectional coupling through attention are key elements for sensory processing.
Collapse
|