851
|
McClure P, Kriegeskorte N. Representational Distance Learning for Deep Neural Networks. Front Comput Neurosci 2016; 10:131. [PMID: 28082889 PMCID: PMC5187453 DOI: 10.3389/fncom.2016.00131] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 11/29/2016] [Indexed: 11/21/2022] Open
Abstract
Deep neural networks (DNNs) provide useful models of visual representational transformations. We present a method that enables a DNN (student) to learn from the internal representational spaces of a reference model (teacher), which could be another DNN or, in the future, a biological brain. Representational spaces of the student and the teacher are characterized by representational distance matrices (RDMs). We propose representational distance learning (RDL), a stochastic gradient descent method that drives the RDMs of the student to approximate the RDMs of the teacher. We demonstrate that RDL is competitive with other transfer learning techniques for two publicly available benchmark computer vision datasets (MNIST and CIFAR-100), while allowing for architectural differences between student and teacher. By pulling the student's RDMs toward those of the teacher, RDL significantly improved visual classification performance when compared to baseline networks that did not use transfer learning. In the future, RDL may enable combined supervised training of deep neural networks using task constraints (e.g., images and category labels) and constraints from brain-activity measurements, so as to build models that replicate the internal representational spaces of biological brains.
Collapse
|
852
|
Abstract
AbstractRecent progress in artificial intelligence has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
Collapse
|
853
|
Cheyette SJ, Plaut DC. Modeling the N400 ERP component as transient semantic over-activation within a neural network model of word comprehension. Cognition 2016; 162:153-166. [PMID: 27871623 DOI: 10.1016/j.cognition.2016.10.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 10/21/2016] [Accepted: 10/27/2016] [Indexed: 12/25/2022]
Abstract
The study of the N400 event-related brain potential has provided fundamental insights into the nature of real-time comprehension processes, and its amplitude is modulated by a wide variety of stimulus and context factors. It is generally thought to reflect the difficulty of semantic access, but formulating a precise characterization of this process has proved difficult. Laszlo and colleagues (Laszlo & Plaut, 2012; Laszlo & Armstrong, 2014) used physiologically constrained neural networks to model the N400 as transient over-activation within semantic representations, arising as a consequence of the distribution of excitation and inhibition within and between cortical areas. The current work extends this approach to successfully model effects on both N400 amplitudes and behavior of word frequency, semantic richness, repetition, semantic and associative priming, and orthographic neighborhood size. The account is argued to be preferable to one based on "implicit semantic prediction error" (Rabovsky & McRae, 2014) for a number of reasons, the most fundamental of which is that the current model actually produces N400-like waveforms in its real-time activation dynamics.
Collapse
Affiliation(s)
- Samuel J Cheyette
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA.
| | - David C Plaut
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
854
|
Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun 2016; 7:13276. [PMID: 27824044 PMCID: PMC5105169 DOI: 10.1038/ncomms13276] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 09/16/2016] [Indexed: 11/18/2022] Open
Abstract
The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron's axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.
Collapse
Affiliation(s)
- Timothy P. Lillicrap
- Department of Pharmacology, University of Oxford, Oxford OX1 3QT, UK
- Google DeepMind, 5 New Street Square, London EC4A 3TW, UK
| | - Daniel Cownden
- School of Biology, University of St Andrews, Harold Mitchel Building, St Andrews, Fife KY16 9TH, UK
| | - Douglas B. Tweed
- Departments of Physiology and Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Centre for Vision Research, York University, Toronto, Ontario M3J 1P3, Canada
| | - Colin J. Akerman
- Department of Pharmacology, University of Oxford, Oxford OX1 3QT, UK
| |
Collapse
|
855
|
Wood JN, Wood SM. Measuring the speed of newborn object recognition in controlled visual worlds. Dev Sci 2016; 20. [DOI: 10.1111/desc.12470] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 06/02/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Justin N. Wood
- Department of Psychology University of Southern California USA
| | | |
Collapse
|
856
|
Neurophysiological Organization of the Middle Face Patch in Macaque Inferior Temporal Cortex. J Neurosci 2016; 36:12729-12745. [PMID: 27810930 DOI: 10.1523/jneurosci.0237-16.2016] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Revised: 10/07/2016] [Accepted: 10/21/2016] [Indexed: 11/21/2022] Open
Abstract
While early cortical visual areas contain fine scale spatial organization of neuronal properties, such as orientation preference, the spatial organization of higher-level visual areas is less well understood. The fMRI demonstration of face-preferring regions in human ventral cortex and monkey inferior temporal cortex ("face patches") raises the question of how neural selectivity for faces is organized. Here, we targeted hundreds of spatially registered neural recordings to the largest fMRI-identified face-preferring region in monkeys, the middle face patch (MFP), and show that the MFP contains a graded enrichment of face-preferring neurons. At its center, as much as 93% of the sites we sampled responded twice as strongly to faces than to nonface objects. We estimate the maximum neurophysiological size of the MFP to be ∼6 mm in diameter, consistent with its previously reported size under fMRI. Importantly, face selectivity in the MFP varied strongly even between neighboring sites. Additionally, extremely face-selective sites were ∼40 times more likely to be present inside the MFP than outside. These results provide the first direct quantification of the size and neural composition of the MFP by showing that the cortical tissue localized to the fMRI defined region consists of a very high fraction of face-preferring sites near its center, and a monotonic decrease in that fraction along any radial spatial axis. SIGNIFICANCE STATEMENT The underlying organization of neurons that give rise to the large spatial regions of activity observed with fMRI is not well understood. Neurophysiological studies that have targeted the fMRI identified face patches in monkeys have provided evidence for both large-scale clustering and a heterogeneous spatial organization. Here we used a novel x-ray imaging system to spatially map the responses of hundreds of sites in and around the middle face patch. We observed that face-selective signal localized to the middle face patch was characterized by a gradual spatial enrichment. Furthermore, strongly face-selective sites were ∼40 times more likely to be found inside the patch than outside of the patch.
Collapse
|
857
|
Pagan M, Simoncelli EP, Rust NC. Neural Quadratic Discriminant Analysis: Nonlinear Decoding with V1-Like Computation. Neural Comput 2016; 28:2291-2319. [PMID: 27626960 PMCID: PMC6395528 DOI: 10.1162/neco_a_00890] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Linear-nonlinear (LN) models and their extensions have proven successful in describing transformations from stimuli to spiking responses of neurons in early stages of sensory hierarchies. Neural responses at later stages are highly nonlinear and have generally been better characterized in terms of their decoding performance on prespecified tasks. Here we develop a biologically plausible decoding model for classification tasks, that we refer to as neural quadratic discriminant analysis (nQDA). Specifically, we reformulate an optimal quadratic classifier as an LN-LN computation, analogous to "subunit" encoding models that have been used to describe responses in retina and primary visual cortex. We propose a physiological mechanism by which the parameters of the nQDA classifier could be optimized, using a supervised variant of a Hebbian learning rule. As an example of its applicability, we show that nQDA provides a better account than many comparable alternatives for the transformation between neural representations in two high-level brain areas recorded as monkeys performed a visual delayed-match-to-sample task.
Collapse
Affiliation(s)
- Marino Pagan
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.
| | - Eero P Simoncelli
- Center for Neural Science and Courant Institute of Mathematical Sciences, New York University, New York, NY 10003, U.S.A. and Howard Hughes Medical Institute
| | - Nicole C Rust
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.
| |
Collapse
|
858
|
Making Sense of Real-World Scenes. Trends Cogn Sci 2016; 20:843-856. [PMID: 27769727 DOI: 10.1016/j.tics.2016.09.003] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/06/2016] [Accepted: 09/06/2016] [Indexed: 11/23/2022]
Abstract
To interact with the world, we have to make sense of the continuous sensory input conveying information about our environment. A recent surge of studies has investigated the processes enabling scene understanding, using increasingly complex stimuli and sophisticated analyses to highlight the visual features and brain regions involved. However, there are two major challenges to producing a comprehensive framework for scene understanding. First, scene perception is highly dynamic, subserving multiple behavioral goals. Second, a multitude of different visual properties co-occur across scenes and may be correlated or independent. We synthesize the recent literature and argue that for a complete view of scene understanding, it is necessary to account for both differing observer goals and the contribution of diverse scene properties.
Collapse
|
859
|
Affiliation(s)
- Ruth Rosenholtz
- Department of Brain and Cognitive Sciences, CSAIL, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
| |
Collapse
|
860
|
Gauthier I, Tarr MJ. Visual Object Recognition: Do We (Finally) Know More Now Than We Did? Annu Rev Vis Sci 2016; 2:377-396. [DOI: 10.1146/annurev-vision-111815-114621] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Isabel Gauthier
- Department of Psychology, Vanderbilt University, Nashville, Tennessee 37240-7817;
| | - Michael J. Tarr
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
861
|
Brito CSN, Gerstner W. Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation. PLoS Comput Biol 2016; 12:e1005070. [PMID: 27690349 PMCID: PMC5045191 DOI: 10.1371/journal.pcbi.1005070] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 07/19/2016] [Indexed: 11/19/2022] Open
Abstract
The development of sensory receptive fields has been modeled in the past by a variety of models including normative models such as sparse coding or independent component analysis and bottom-up models such as spike-timing dependent plasticity or the Bienenstock-Cooper-Munro model of synaptic plasticity. Here we show that the above variety of approaches can all be unified into a single common principle, namely nonlinear Hebbian learning. When nonlinear Hebbian learning is applied to natural images, receptive field shapes were strongly constrained by the input statistics and preprocessing, but exhibited only modest variation across different choices of nonlinearities in neuron models or synaptic plasticity rules. Neither overcompleteness nor sparse network activity are necessary for the development of localized receptive fields. The analysis of alternative sensory modalities such as auditory models or V2 development lead to the same conclusions. In all examples, receptive fields can be predicted a priori by reformulating an abstract model as nonlinear Hebbian learning. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities. The question of how the brain self-organizes to develop precisely tuned neurons has puzzled neuroscientists at least since the discoveries of Hubel and Wiesel. In the past decades, a variety of theories and models have been proposed to describe receptive field formation, notably V1 simple cells, from natural inputs. We cut through the jungle of candidate explanations by demonstrating that in fact a single principle is sufficient to explain receptive field development. Our results follow from two major insights. First, we show that many representative models of sensory development are in fact implementing variations of a common principle: nonlinear Hebbian learning. Second, we reveal that nonlinear Hebbian learning is sufficient for receptive field formation through sensory inputs. The surprising result is that our findings are robust of specific details of a model, and allows for robust predictions on the learned receptive fields. Nonlinear Hebbian learning is therefore general in two senses: it applies to many models developed by theoreticians, and to many sensory modalities studied by experimental neuroscientists.
Collapse
Affiliation(s)
- Carlos S. N. Brito
- School of Computer and Communication Sciences and School of Life Science, Brain Mind Institute, Ecole Polytechnique Federale de Lausanne, Lausanne EPFL, Switzerland
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
- * E-mail:
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Science, Brain Mind Institute, Ecole Polytechnique Federale de Lausanne, Lausanne EPFL, Switzerland
| |
Collapse
|
862
|
Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T. Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition. Sci Rep 2016; 6:32672. [PMID: 27601096 PMCID: PMC5013454 DOI: 10.1038/srep32672] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 08/11/2016] [Indexed: 11/08/2022] Open
Abstract
Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations.
Collapse
Affiliation(s)
- Saeed Reza Kheradpisheh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
- CERCO UMR 5549, CNRS – Université de Toulouse, F-31300, France
| | - Masoud Ghodrati
- Department of Physiology, Monash University, Clayton, Australia 3800
- Neuroscience Program, Biomedicine Discovery Institute, Monash University
| | - Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
| | - Timothée Masquelier
- CERCO UMR 5549, CNRS – Université de Toulouse, F-31300, France
- INSERM, U968, Paris, F-75012, France
- Sorbonne Universités, UPMC Univ Paris 06, UMR-S 968, Institut de la Vision, Paris, F-75012, France
- CNRS, UMR-7210, Paris, F-75012, France
| |
Collapse
|
863
|
Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.04.029] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
864
|
Sharpee TO. How Invariant Feature Selectivity Is Achieved in Cortex. Front Synaptic Neurosci 2016; 8:26. [PMID: 27601991 PMCID: PMC4993779 DOI: 10.3389/fnsyn.2016.00026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 08/05/2016] [Indexed: 02/03/2023] Open
Abstract
Parsing the visual scene into objects is paramount to survival. Yet, how this is accomplished by the nervous system remains largely unknown, even in the comparatively well understood visual system. It is especially unclear how detailed peripheral signal representations are transformed into the object-oriented representations that are independent of object position and are provided by the final stages of visual processing. This perspective discusses advances in computational algorithms for fitting large-scale models that make it possible to reconstruct the intermediate steps of visual processing based on neural responses to natural stimuli. In particular, it is now possible to characterize how different types of position invariance, such as local (also known as phase invariance) and more global, are interleaved with nonlinear operations to allow for coding of curved contours. Neurons in the mid-level visual area V4 exhibit selectivity to pairs of even- and odd-symmetric profiles along curved contours. Such pairing is reminiscent of the response properties of complex cells in the primary visual cortex (V1) and suggests specific ways in which V1 signals are transformed within subsequent visual cortical areas. These examples illustrate that large-scale models fitted to neural responses to natural stimuli can provide generative models of successive stages of sensory processing.
Collapse
Affiliation(s)
- Tatyana O. Sharpee
- Computational Neurobiology Laboratory, Salk Institute for Biological StudiesLa Jolla, CA, USA
| |
Collapse
|
865
|
Kreiman G. A null model for cortical representations with grandmothers galore. LANGUAGE, COGNITION AND NEUROSCIENCE 2016; 32:274-285. [PMID: 29204455 PMCID: PMC5710804 DOI: 10.1080/23273798.2016.1218033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
There has been extensive discussion in the literature about the extent to which cortical representations can be described as localist or distributed. Here we discuss a simple null model that encompasses a family of related architectures describing the transformation of signals throughout the parts of the visual system involved in object recognition. This family of models constitutes a rigorous first approximation to explain the neurophysiological properties of ventral visual cortex. This null model contains both distributed and local representations throughout the entire hierarchy of computations and the responses of individual units are meaningful and interpretable when encoding is adequately defined for each computational stage.
Collapse
|
866
|
Abstract
Neurons in early visual cortical areas not only represent incoming visual information but are also engaged by higher level cognitive processes, including attention, working memory, imagery, and decision-making. Are these cognitive effects an epiphenomenon or are they functionally relevant for these mental operations? We review evidence supporting the hypothesis that the modulation of activity in early visual areas has a causal role in cognition. The modulatory influences allow the early visual cortex to act as a multiscale cognitive blackboard for read and write operations by higher visual areas, which can thereby efficiently exchange information. This blackboard architecture explains how the activity of neurons in the early visual cortex contributes to scene segmentation and working memory, and relates to the subject's inferences about the visual world. The architecture also has distinct advantages for the processing of visual routines that rely on a number of sequentially executed processing steps.
Collapse
Affiliation(s)
- Pieter R Roelfsema
- Netherlands Institute for Neuroscience, 1105 BA Amsterdam, The Netherlands; .,Department of Integrative Neurophysiology, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands.,Psychiatry Department, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
867
|
Jeurissen D, Self MW, Roelfsema PR. Serial grouping of 2D-image regions with object-based attention in humans. eLife 2016; 5. [PMID: 27291188 PMCID: PMC4905743 DOI: 10.7554/elife.14320] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 05/17/2016] [Indexed: 11/13/2022] Open
Abstract
After an initial stage of local analysis within the retina and early visual pathways, the human visual system creates a structured representation of the visual scene by co-selecting image elements that are part of behaviorally relevant objects. The mechanisms underlying this perceptual organization process are only partially understood. We here investigate the time-course of perceptual grouping of two-dimensional image-regions by measuring the reaction times of human participants and report that it is associated with the gradual spread of object-based attention. Attention spreads fastest over large and homogeneous areas and is slowed down at locations that require small-scale processing. We find that the time-course of the object-based selection process is well explained by a 'growth-cone' model, which selects surface elements in an incremental, scale-dependent manner. We discuss how the visual cortical hierarchy can implement this scale-dependent spread of object-based attention, leveraging the different receptive field sizes in distinct cortical areas. DOI:http://dx.doi.org/10.7554/eLife.14320.001 When we look at an object, we perceive it as a whole. However, this is not how the brain processes objects. Instead, cells at early stages of the visual system respond selectively to single features of the object, such as edges. Moreover, each cell responds to its target feature in only a small region of space known as its receptive field. At higher levels of the visual system, cells respond to more complex features: angles rather than edges, for example. The receptive fields of the cells are also larger. For us to see an object, the brain must therefore 'stitch' together diverse features into a unified impression. This process is termed perceptual grouping. But how does it work? Jeurissen et al. hypothesized that this process depends on the visual system’s attention spreading over a region in the image occupied by an object, and that the speed of the process will depend on the size of the receptive fields involved. If an image region is narrow, the visual system must recruit cells with small receptive fields to process the individual features. Grouping will therefore be slow. By contrast, if the object consists of large uniform areas lacking in detail, grouping should be fast. These assumptions give rise to a model called the “growth-conemodel”, which makes a number of specific predictions about reaction times during perceptual grouping. Jeurissen et al. tested the growth-cone model’s predictions by measuring the speed of perceptual grouping in 160 human volunteers. These volunteers looked at an image made up of two simple shapes, and reported whether two dots fell on the same or different shapes. The results supported the growth-cone model. People were able to group large and uniform areas quickly, but were slower for narrow areas. Grouping also took more time when the distance between the dots increased. Hence, perceptual grouping of everyday objects calls on a step-by-step process that resembles solving a small maze. The results also revealed that perceptual grouping of simple shapes relies on the spreading of visual attention over the relevant object. Furthermore, the data support the hypothesis that perceptual grouping makes use of the different sizes of receptive fields at various levels of the visual system. Further research will be needed to translate these findings to the more complex natural scenes we encounter in our daily lives. DOI:http://dx.doi.org/10.7554/eLife.14320.002
Collapse
Affiliation(s)
- Danique Jeurissen
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands
| | - Matthew W Self
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands
| | - Pieter R Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands.,Department of Psychiatry, Academic Medical Center, Amsterdam, The Netherlands.,Department of Integrative Neurophysiology, Centre for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
868
|
Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep 2016; 6:27755. [PMID: 27282108 PMCID: PMC4901271 DOI: 10.1038/srep27755] [Citation(s) in RCA: 362] [Impact Index Per Article: 40.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 05/23/2016] [Indexed: 11/08/2022] Open
Abstract
The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.
Collapse
|
869
|
Kadipasaoglu CM, Conner CR, Whaley ML, Baboyan VG, Tandon N. Category-Selectivity in Human Visual Cortex Follows Cortical Topology: A Grouped icEEG Study. PLoS One 2016; 11:e0157109. [PMID: 27272936 PMCID: PMC4896492 DOI: 10.1371/journal.pone.0157109] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 05/24/2016] [Indexed: 01/20/2023] Open
Abstract
Neuroimaging studies suggest that category-selective regions in higher-order visual cortex are topologically organized around specific anatomical landmarks: the mid-fusiform sulcus (MFS) in the ventral temporal cortex (VTC) and lateral occipital sulcus (LOS) in the lateral occipital cortex (LOC). To derive precise structure-function maps from direct neural signals, we collected intracranial EEG (icEEG) recordings in a large human cohort (n = 26) undergoing implantation of subdural electrodes. A surface-based approach to grouped icEEG analysis was used to overcome challenges from sparse electrode coverage within subjects and variable cortical anatomy across subjects. The topology of category-selectivity in bilateral VTC and LOC was assessed for five classes of visual stimuli-faces, animate non-face (animals/body-parts), places, tools, and words-using correlational and linear mixed effects analyses. In the LOC, selectivity for living (faces and animate non-face) and non-living (places and tools) classes was arranged in a ventral-to-dorsal axis along the LOS. In the VTC, selectivity for living and non-living stimuli was arranged in a latero-medial axis along the MFS. Written word-selectivity was reliably localized to the intersection of the left MFS and the occipito-temporal sulcus. These findings provide direct electrophysiological evidence for topological information structuring of functional representations within higher-order visual cortex.
Collapse
Affiliation(s)
- Cihan Mehmet Kadipasaoglu
- Vivian Smith Department of Neurosurgery, University of Texas Medical School at Houston, Houston, TX, United States of America
| | - Christopher Richard Conner
- Vivian Smith Department of Neurosurgery, University of Texas Medical School at Houston, Houston, TX, United States of America
| | - Meagan Lee Whaley
- Vivian Smith Department of Neurosurgery, University of Texas Medical School at Houston, Houston, TX, United States of America
| | - Vatche George Baboyan
- Vivian Smith Department of Neurosurgery, University of Texas Medical School at Houston, Houston, TX, United States of America
| | - Nitin Tandon
- Vivian Smith Department of Neurosurgery, University of Texas Medical School at Houston, Houston, TX, United States of America
- Memorial Hermann Hospital, Texas Medical Center, Houston, TX, United States of America
- * E-mail:
| |
Collapse
|
870
|
Abstract
As information propagates along the ventral visual hierarchy, neuronal responses become both more specific for particular image features and more tolerant of image transformations that preserve those features. Here, we present evidence that neurons in area V2 are selective for local statistics that occur in natural visual textures, and tolerant of manipulations that preserve these statistics. Texture stimuli were generated by sampling from a statistical model, with parameters chosen to match the parameters of a set of visually distinct natural texture images. Stimuli generated with the same statistics are perceptually similar to each other despite differences, arising from the sampling process, in the precise spatial location of features. We assessed the accuracy with which these textures could be classified based on the responses of V1 and V2 neurons recorded individually in anesthetized macaque monkeys. We also assessed the accuracy with which particular samples could be identified, relative to other statistically matched samples. For populations of up to 100 cells, V1 neurons supported better performance in the sample identification task, whereas V2 neurons exhibited better performance in texture classification. Relative to V1, the responses of V2 show greater selectivity and tolerance for the representation of texture statistics.
Collapse
|
871
|
Dehaqani MRA, Vahabie AH, Kiani R, Ahmadabadi MN, Araabi BN, Esteky H. Temporal dynamics of visual category representation in the macaque inferior temporal cortex. J Neurophysiol 2016; 116:587-601. [PMID: 27169503 DOI: 10.1152/jn.00018.2016] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/09/2016] [Indexed: 11/22/2022] Open
Abstract
Object categories are recognized at multiple levels of hierarchical abstractions. Psychophysical studies have shown a more rapid perceptual access to the mid-level category information (e.g., human faces) than the higher (superordinate; e.g., animal) or the lower (subordinate; e.g., face identity) level. Mid-level category members share many features, whereas few features are shared among members of different mid-level categories. To understand better the neural basis of expedited access to mid-level category information, we examined neural responses of the inferior temporal (IT) cortex of macaque monkeys viewing a large number of object images. We found an earlier representation of mid-level categories in the IT population and single-unit responses compared with superordinate- and subordinate-level categories. The short-latency representation of mid-level category information shows that visual cortex first divides the category shape space at its sharpest boundaries, defined by high/low within/between-group similarity. This short-latency, mid-level category boundary map may be a prerequisite for representation of other categories at more global and finer scales.
Collapse
Affiliation(s)
- Mohammad-Reza A Dehaqani
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Research Center for Brain and Cognitive Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Abdol-Hossein Vahabie
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Research Center for Brain and Cognitive Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Roozbeh Kiani
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Center for Neural Science, New York University, New York, New York; and
| | - Majid Nili Ahmadabadi
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Cognitive Systems Lab, Control and Intelligent Processing Centre of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Babak Nadjar Araabi
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Cognitive Systems Lab, Control and Intelligent Processing Centre of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Hossein Esteky
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran; Research Center for Brain and Cognitive Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran;
| |
Collapse
|
872
|
Perceptual similarity of visual patterns predicts dynamic neural activation patterns measured with MEG. Neuroimage 2016; 132:59-70. [DOI: 10.1016/j.neuroimage.2016.02.019] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 02/05/2016] [Accepted: 02/09/2016] [Indexed: 12/14/2022] Open
|
873
|
Tian M, Yamins D, Grill-Spector K. Learning the 3-D structure of objects from 2-D views depends on shape, not format. J Vis 2016; 16:7. [PMID: 27153196 PMCID: PMC4898268 DOI: 10.1167/16.7.7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Indexed: 11/24/2022] Open
Abstract
Humans can learn to recognize new objects just from observing example views. However, it is unknown what structural information enables this learning. To address this question, we manipulated the amount of structural information given to subjects during unsupervised learning by varying the format of the trained views. We then tested how format affected participants' ability to discriminate similar objects across views that were rotated 90° apart. We found that, after training, participants' performance increased and generalized to new views in the same format. Surprisingly, the improvement was similar across line drawings, shape from shading, and shape from shading + stereo even though the latter two formats provide richer depth information compared to line drawings. In contrast, participants' improvement was significantly lower when training used silhouettes, suggesting that silhouettes do not have enough information to generate a robust 3-D structure. To test whether the learned object representations were format-specific or format-invariant, we examined if learning novel objects from example views transfers across formats. We found that learning objects from example line drawings transferred to shape from shading and vice versa. These results have important implications for theories of object recognition because they suggest that (a) learning the 3-D structure of objects does not require rich structural cues during training as long as shape information of internal and external features is provided and (b) learning generates shape-based object representations independent of the training format.
Collapse
|
874
|
Kubilius J, Bracci S, Op de Beeck HP. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLoS Comput Biol 2016; 12:e1004896. [PMID: 27124699 PMCID: PMC4849740 DOI: 10.1371/journal.pcbi.1004896] [Citation(s) in RCA: 132] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 03/30/2016] [Indexed: 11/19/2022] Open
Abstract
Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development. Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.
Collapse
Affiliation(s)
- Jonas Kubilius
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| | - Stefania Bracci
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| | - Hans P. Op de Beeck
- Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
- * E-mail: (JK); (HPOdB)
| |
Collapse
|
875
|
Wood JN, Wood SMW. The development of newborn object recognition in fast and slow visual worlds. Proc Biol Sci 2016; 283:20160166. [PMID: 27097925 PMCID: PMC4855384 DOI: 10.1098/rspb.2016.0166] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Accepted: 03/29/2016] [Indexed: 11/12/2022] Open
Abstract
Object recognition is central to perception and cognition. Yet relatively little is known about the environmental factors that cause invariant object recognition to emerge in the newborn brain. Is this ability a hardwired property of vision? Or does the development of invariant object recognition require experience with a particular kind of visual environment? Here, we used a high-throughput controlled-rearing method to examine whether newborn chicks (Gallus gallus) require visual experience with slowly changing objects to develop invariant object recognition abilities. When newborn chicks were raised with a slowly rotating virtual object, the chicks built invariant object representations that generalized across novel viewpoints and rotation speeds. In contrast, when newborn chicks were raised with a virtual object that rotated more quickly, the chicks built viewpoint-specific object representations that failed to generalize to novel viewpoints and rotation speeds. Moreover, there was a direct relationship between the speed of the object and the amount of invariance in the chick's object representation. Thus, visual experience with slowly changing objects plays a critical role in the development of invariant object recognition. These results indicate that invariant object recognition is not a hardwired property of vision, but is learned rapidly when newborns encounter a slowly changing visual world.
Collapse
Affiliation(s)
- Justin N Wood
- Department of Psychology, University of Southern California, Los Angeles, CA 90089, USA
| | - Samantha M W Wood
- Department of Psychology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
876
|
A specialized face-processing model inspired by the organization of monkey face patches explains several face-specific phenomena observed in humans. Sci Rep 2016; 6:25025. [PMID: 27113635 PMCID: PMC4844965 DOI: 10.1038/srep25025] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 04/08/2016] [Indexed: 11/30/2022] Open
Abstract
Converging reports indicate that face images are processed through specialized neural networks in the brain –i.e. face patches in monkeys and the fusiform face area (FFA) in humans. These studies were designed to find out how faces are processed in visual system compared to other objects. Yet, the underlying mechanism of face processing is not completely revealed. Here, we show that a hierarchical computational model, inspired by electrophysiological evidence on face processing in primates, is able to generate representational properties similar to those observed in monkey face patches (posterior, middle and anterior patches). Since the most important goal of sensory neuroscience is linking the neural responses with behavioral outputs, we test whether the proposed model, which is designed to account for neural responses in monkey face patches, is also able to predict well-documented behavioral face phenomena observed in humans. We show that the proposed model satisfies several cognitive face effects such as: composite face effect and the idea of canonical face views. Our model provides insights about the underlying computations that transfer visual information from posterior to anterior face patches.
Collapse
|
877
|
Wood JN, Prasad A, Goldman JG, Wood SMW. Enhanced learning of natural visual sequences in newborn chicks. Anim Cogn 2016; 19:835-45. [PMID: 27079969 DOI: 10.1007/s10071-016-0982-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 02/06/2016] [Accepted: 03/31/2016] [Indexed: 10/21/2022]
Abstract
To what extent are newborn brains designed to operate over natural visual input? To address this question, we used a high-throughput controlled-rearing method to examine whether newborn chicks (Gallus gallus) show enhanced learning of natural visual sequences at the onset of vision. We took the same set of images and grouped them into either natural sequences (i.e., sequences showing different viewpoints of the same real-world object) or unnatural sequences (i.e., sequences showing different images of different real-world objects). When raised in virtual worlds containing natural sequences, newborn chicks developed the ability to recognize familiar images of objects. Conversely, when raised in virtual worlds containing unnatural sequences, newborn chicks' object recognition abilities were severely impaired. In fact, the majority of the chicks raised with the unnatural sequences failed to recognize familiar images of objects despite acquiring over 100 h of visual experience with those images. Thus, newborn chicks show enhanced learning of natural visual sequences at the onset of vision. These results indicate that newborn brains are designed to operate over natural visual input.
Collapse
Affiliation(s)
- Justin N Wood
- Department of Psychology, University of Southern California, Building SGM, Room 501, 3620 South McClintock Avenue, Los Angeles, CA, 90089, USA.
| | - Aditya Prasad
- Department of Psychology, University of Southern California, Building SGM, Room 501, 3620 South McClintock Avenue, Los Angeles, CA, 90089, USA
| | - Jason G Goldman
- Department of Psychology, University of Southern California, Building SGM, Room 501, 3620 South McClintock Avenue, Los Angeles, CA, 90089, USA
| | - Samantha M W Wood
- Department of Psychology, University of Southern California, Building SGM, Room 501, 3620 South McClintock Avenue, Los Angeles, CA, 90089, USA
| |
Collapse
|
878
|
Iordan MC, Greene MR, Beck DM, Fei-Fei L. Typicality sharpens category representations in object-selective cortex. Neuroimage 2016; 134:170-179. [PMID: 27079531 DOI: 10.1016/j.neuroimage.2016.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 03/12/2016] [Accepted: 04/05/2016] [Indexed: 11/18/2022] Open
Abstract
The purpose of categorization is to identify generalizable classes of objects whose members can be treated equivalently. Within a category, however, some exemplars are more representative of that concept than others. Despite long-standing behavioral effects, little is known about how typicality influences the neural representation of real-world objects from the same category. Using fMRI, we showed participants 64 subordinate object categories (exemplars) grouped into 8 basic categories. Typicality for each exemplar was assessed behaviorally and we used several multi-voxel pattern analyses to characterize how typicality affects the pattern of responses elicited in early visual and object-selective areas: V1, V2, V3v, hV4, LOC. We found that in LOC, but not in early areas, typical exemplars elicited activity more similar to the central category tendency and created sharper category boundaries than less typical exemplars, suggesting that typicality enhances within-category similarity and between-category dissimilarity. Additionally, we uncovered a brain region (cIPL) where category boundaries favor less typical categories. Our results suggest that typicality may constitute a previously unexplored principle of organization for intra-category neural structure and, furthermore, that this representation is not directly reflected in image features describing natural input, but rather built by the visual system at an intermediate processing stage.
Collapse
Affiliation(s)
| | - Michelle R Greene
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
| | - Diane M Beck
- Beckman Institute and Department of Psychology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Li Fei-Fei
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
879
|
Stimulus features coded by single neurons of a macaque body category selective patch. Proc Natl Acad Sci U S A 2016; 113:E2450-9. [PMID: 27071095 DOI: 10.1073/pnas.1520371113] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Body category-selective regions of the primate temporal cortex respond to images of bodies, but it is unclear which fragments of such images drive single neurons' responses in these regions. Here we applied the Bubbles technique to the responses of single macaque middle superior temporal sulcus (midSTS) body patch neurons to reveal the image fragments the neurons respond to. We found that local image fragments such as extremities (limbs), curved boundaries, and parts of the torso drove the large majority of neurons. Bubbles revealed the whole body in only a few neurons. Neurons coded the features in a manner that was tolerant to translation and scale changes. Most image fragments were excitatory but for a few neurons both inhibitory and excitatory fragments (opponent coding) were present in the same image. The fragments we reveal here in the body patch with Bubbles differ from those suggested in previous studies of face-selective neurons in face patches. Together, our data indicate that the majority of body patch neurons respond to local image fragments that occur frequently, but not exclusively, in bodies, with a coding that is tolerant to translation and scale. Overall, the data suggest that the body category selectivity of the midSTS body patch depends more on the feature statistics of bodies (e.g., extensions occur more frequently in bodies) than on semantics (bodies as an abstract category).
Collapse
|
880
|
Hong H, Yamins DLK, Majaj NJ, DiCarlo JJ. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat Neurosci 2016; 19:613-22. [PMID: 26900926 DOI: 10.1038/nn.4247] [Citation(s) in RCA: 185] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/17/2016] [Indexed: 01/15/2023]
Abstract
Extensive research has revealed that the ventral visual stream hierarchically builds a robust representation for supporting visual object categorization tasks. We systematically explored the ability of multiple ventral visual areas to support a variety of 'category-orthogonal' object properties such as position, size and pose. For complex naturalistic stimuli, we found that the inferior temporal (IT) population encodes all measured category-orthogonal object properties, including those properties often considered to be low-level features (for example, position), more explicitly than earlier ventral stream areas. We also found that the IT population better predicts human performance patterns across properties. A hierarchical neural network model based on simple computational principles generates these same cross-area patterns of information. Taken together, our empirical results support the hypothesis that all behaviorally relevant object properties are extracted in concert up the ventral visual hierarchy, and our computational model explains how that hierarchy might be built.
Collapse
Affiliation(s)
- Ha Hong
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Daniel L K Yamins
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Najib J Majaj
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
881
|
Yamins DLK, DiCarlo JJ. Eight open questions in the computational modeling of higher sensory cortex. Curr Opin Neurobiol 2016; 37:114-120. [DOI: 10.1016/j.conb.2016.02.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 02/03/2016] [Accepted: 02/04/2016] [Indexed: 10/22/2022]
|
882
|
|
883
|
Serre T. Models of visual categorization. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2016; 7:197-213. [DOI: 10.1002/wcs.1385] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Revised: 01/12/2016] [Accepted: 01/13/2016] [Indexed: 11/08/2022]
Affiliation(s)
- Thomas Serre
- Cognitive, Linguistic & Psychological Sciences Department, Institute for Brain Sciences; Brown University; Providence RI USA
| |
Collapse
|
884
|
Ramkumar P, Hansen BC, Pannasch S, Loschky LC. Visual information representation and rapid-scene categorization are simultaneous across cortex: An MEG study. Neuroimage 2016; 134:295-304. [PMID: 27001497 DOI: 10.1016/j.neuroimage.2016.03.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 03/04/2016] [Accepted: 03/13/2016] [Indexed: 11/17/2022] Open
Abstract
Perceiving the visual world around us requires the brain to represent the features of stimuli and to categorize the stimulus based on these features. Incorrect categorization can result either from errors in visual representation or from errors in processes that lead to categorical choice. To understand the temporal relationship between the neural signatures of such systematic errors, we recorded whole-scalp magnetoencephalography (MEG) data from human subjects performing a rapid-scene categorization task. We built scene category decoders based on (1) spatiotemporally resolved neural activity, (2) spatial envelope (SpEn) image features, and (3) behavioral responses. Using confusion matrices, we tracked how well the pattern of errors from neural decoders could be explained by SpEn decoders and behavioral errors, over time and across cortical areas. Across the visual cortex and the medial temporal lobe, we found that both SpEn and behavioral errors explained unique variance in the errors of neural decoders. Critically, these effects were nearly simultaneous, and most prominent between 100 and 250ms after stimulus onset. Thus, during rapid-scene categorization, neural processes that ultimately result in behavioral categorization are simultaneous and co-localized with neural processes underlying visual information representation.
Collapse
Affiliation(s)
- Pavan Ramkumar
- Brain Research Unit, O.V. Lounasmaa Laboratory, Aalto University School of Science, Espoo, Finland.
| | - Bruce C Hansen
- Department of Psychology and Neuroscience Program, Colgate University, Hamilton, NY, USA.
| | - Sebastian Pannasch
- Brain Research Unit, O.V. Lounasmaa Laboratory, Aalto University School of Science, Espoo, Finland; Department of Psychology, Technische Universtät Dresden, Dresden, Germany.
| | - Lester C Loschky
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA.
| |
Collapse
|
885
|
Ullman S, Assif L, Fetaya E, Harari D. Atoms of recognition in human and computer vision. Proc Natl Acad Sci U S A 2016; 113:2744-9. [PMID: 26884200 PMCID: PMC4790978 DOI: 10.1073/pnas.1513198113] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.
Collapse
Affiliation(s)
- Shimon Ullman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
| | - Liav Assif
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Ethan Fetaya
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Daniel Harari
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; McGovern Institute for Brain Research, Cambridge, MA 02139
| |
Collapse
|
886
|
Tanaka H. Modeling the motor cortex: Optimality, recurrent neural networks, and spatial dynamics. Neurosci Res 2016; 104:64-71. [DOI: 10.1016/j.neures.2015.10.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 10/16/2015] [Accepted: 10/19/2015] [Indexed: 01/28/2023]
|
887
|
Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci 2016; 19:356-65. [PMID: 26906502 DOI: 10.1038/nn.4244] [Citation(s) in RCA: 699] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/13/2016] [Indexed: 11/08/2022]
Abstract
Fueled by innovation in the computer vision and artificial intelligence communities, recent developments in computational neuroscience have used goal-driven hierarchical convolutional neural networks (HCNNs) to make strides in modeling neural single-unit and population responses in higher visual cortical areas. In this Perspective, we review the recent progress in a broader modeling context and describe some of the key technical innovations that have supported it. We then outline how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.
Collapse
Affiliation(s)
- Daniel L K Yamins
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
888
|
Song HF, Yang GR, Wang XJ. Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework. PLoS Comput Biol 2016; 12:e1004792. [PMID: 26928718 PMCID: PMC4771709 DOI: 10.1371/journal.pcbi.1004792] [Citation(s) in RCA: 134] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 02/04/2016] [Indexed: 12/20/2022] Open
Abstract
The ability to simultaneously record from large numbers of neurons in behaving animals has ushered in a new era for the study of the neural circuit mechanisms underlying cognitive functions. One promising approach to uncovering the dynamical and computational principles governing population responses is to analyze model recurrent neural networks (RNNs) that have been optimized to perform the same tasks as behaving animals. Because the optimization of network parameters specifies the desired output but not the manner in which to achieve this output, “trained” networks serve as a source of mechanistic hypotheses and a testing ground for data analyses that link neural computation to behavior. Complete access to the activity and connectivity of the circuit, and the ability to manipulate them arbitrarily, make trained networks a convenient proxy for biological circuits and a valuable platform for theoretical investigation. However, existing RNNs lack basic biological features such as the distinction between excitatory and inhibitory units (Dale’s principle), which are essential if RNNs are to provide insights into the operation of biological circuits. Moreover, trained networks can achieve the same behavioral performance but differ substantially in their structure and dynamics, highlighting the need for a simple and flexible framework for the exploratory training of RNNs. Here, we describe a framework for gradient descent-based training of excitatory-inhibitory RNNs that can incorporate a variety of biological knowledge. We provide an implementation based on the machine learning library Theano, whose automatic differentiation capabilities facilitate modifications and extensions. We validate this framework by applying it to well-known experimental paradigms such as perceptual decision-making, context-dependent integration, multisensory integration, parametric working memory, and motor sequence generation. Our results demonstrate the wide range of neural activity patterns and behavior that can be modeled, and suggest a unified setting in which diverse cognitive computations and mechanisms can be studied. Cognitive functions arise from the coordinated activity of many interconnected neurons. As neuroscientists increasingly use large datasets of simultaneously recorded neurons to study the brain, one approach that has emerged as a promising tool for interpreting population responses is to analyze model recurrent neural networks (RNNs) that have been optimized to perform the same tasks as recorded animals. Complete access to the activity and connectivity of the circuit, and the ability to manipulate them in arbitrary ways, make trained networks a convenient proxy for biological circuits and a valuable platform for theoretical investigation. However, existing RNNs lack basic biological features that are essential if RNNs are to provide insights into the circuit-level operation of the brain. Moreover, trained networks can achieve the same behavioral performance but differ substantially in their structure and dynamics, highlighting the need for a simple and flexible framework for the exploratory training of RNNs. Here we describe and provide an implementation for such a framework, which we apply to several well-known experimental paradigms that illustrate the diversity of detail that can be modeled. Our work provides a foundation for neuroscientists to harness trained RNNs in their own investigations of the neural basis of cognition.
Collapse
Affiliation(s)
- H. Francis Song
- Center for Neural Science, New York University, New York, New York, United States of America
| | - Guangyu R. Yang
- Center for Neural Science, New York University, New York, New York, United States of America
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, New York, United States of America
- NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China
- * E-mail:
| |
Collapse
|
889
|
Golden JR, Vilankar KP, Wu MCK, Field DJ. Conjectures regarding the nonlinear geometry of visual neurons. Vision Res 2016; 120:74-92. [PMID: 26902730 DOI: 10.1016/j.visres.2015.10.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 09/16/2015] [Accepted: 10/10/2015] [Indexed: 12/01/2022]
Abstract
From the earliest stages of sensory processing, neurons show inherent non-linearities: the response to a complex stimulus is not a sum of the responses to a set of constituent basis stimuli. These non-linearities come in a number of forms and have been explained in terms of a number of functional goals. The family of spatial non-linearities have included interactions that occur both within and outside of the classical receptive field. They include, saturation, cross orientation inhibition, contrast normalization, end-stopping and a variety of non-classical effects. In addition, neurons show a number of facilitatory and invariance related effects such as those exhibited by complex cells (integration across position). Here, we describe an approach that attempts to explain many of the non-linearities under a single geometric framework. In line with Zetzsche and colleagues (e.g., Zetzsche et al., 1999) we propose that many of the principal non-linearities can be described by a geometry where the neural response space has a simple curvature. In this paper, we focus on the geometry that produces both increased selectivity (curving outward) and increased tolerance (curving inward). We demonstrate that overcomplete sparse coding with both low-dimensional synthetic data and high-dimensional natural scene data can result in curvature that is responsible for a variety of different known non-classical effects including end-stopping and gain control. We believe that this approach provides a more fundamental explanation of these non-linearities and does not require that one postulate a variety of explanations (e.g., that gain must be controlled or the ends of lines must be detected). In its standard form, sparse coding does not however, produce invariance/tolerance represented by inward curvature. We speculate on some of the requirements needed to produce such curvature.
Collapse
Affiliation(s)
- James R Golden
- Department of Psychology, Cornell University, Ithaca, NY, USA.
| | | | - Michael C K Wu
- Biophysics Graduate Group, University of California, Berkeley, CA, USA; Lithium Technologies Inc., San Francisco, CA, USA.
| | - David J Field
- Department of Psychology, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
890
|
Miller KD. Canonical computations of cerebral cortex. Curr Opin Neurobiol 2016; 37:75-84. [PMID: 26868041 DOI: 10.1016/j.conb.2016.01.008] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 01/14/2016] [Indexed: 12/23/2022]
Abstract
The idea that there is a fundamental cortical circuit that performs canonical computations remains compelling though far from proven. Here we review evidence for two canonical operations within sensory cortical areas: a feedforward computation of selectivity; and a recurrent computation of gain in which, given sufficiently strong external input, perhaps from multiple sources, intracortical input largely, but not completely, cancels this external input. This operation leads to many characteristic cortical nonlinearities in integrating multiple stimuli. The cortical computation must combine such local processing with hierarchical processing across areas. We point to important changes in moving from sensory cortex to motor and frontal cortex and the possibility of substantial differences between cortex in rodents vs. species with columnar organization of selectivity.
Collapse
Affiliation(s)
- Kenneth D Miller
- Center for Theoretical Neuroscience, Department of Neuroscience, Swartz Program in Theoretical Neuroscience, Kavli Institute for Brain Science, College of Physicians and Surgeons, Columbia University, New York, NY 10032-2695, United States.
| |
Collapse
|
891
|
Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Curr Opin Neurobiol 2016; 37:66-74. [PMID: 26851755 DOI: 10.1016/j.conb.2016.01.010] [Citation(s) in RCA: 394] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Revised: 01/14/2016] [Accepted: 01/18/2016] [Indexed: 12/15/2022]
Abstract
Neurons often respond to diverse combinations of task-relevant variables. This form of mixed selectivity plays an important computational role which is related to the dimensionality of the neural representations: high-dimensional representations with mixed selectivity allow a simple linear readout to generate a huge number of different potential responses. In contrast, neural representations based on highly specialized neurons are low dimensional and they preclude a linear readout from generating several responses that depend on multiple task-relevant variables. Here we review the conceptual and theoretical framework that explains the importance of mixed selectivity and the experimental evidence that recorded neural representations are high-dimensional. We end by discussing the implications for the design of future experiments.
Collapse
Affiliation(s)
- Stefano Fusi
- Center for Theoretical Neuroscience, Columbia University College of Physicians and Surgeons, USA.
| | - Earl K Miller
- The Picower Institute for Learning and Memory & Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, USA
| | - Mattia Rigotti
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| |
Collapse
|
892
|
Giese MA, Rizzolatti G. Neural and Computational Mechanisms of Action Processing: Interaction between Visual and Motor Representations. Neuron 2016; 88:167-80. [PMID: 26447579 DOI: 10.1016/j.neuron.2015.09.040] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Action recognition has received enormous interest in the field of neuroscience over the last two decades. In spite of this interest, the knowledge in terms of fundamental neural mechanisms that provide constraints for underlying computations remains rather limited. This fact stands in contrast with a wide variety of speculative theories about how action recognition might work. This review focuses on new fundamental electrophysiological results in monkeys, which provide constraints for the detailed underlying computations. In addition, we review models for action recognition and processing that have concrete mathematical implementations, as opposed to conceptual models. We think that only such implemented models can be meaningfully linked quantitatively to physiological data and have a potential to narrow down the many possible computational explanations for action recognition. In addition, only concrete implementations allow judging whether postulated computational concepts have a feasible implementation in terms of realistic neural circuits.
Collapse
Affiliation(s)
- Martin A Giese
- Section on Computational Sensomotorics, Hertie Institute for Clinical Brain Research & Center for Integrative Neuroscience, University Clinic Tübingen, Otfried-Müller Str. 25, 72076 Tübingen, Germany.
| | - Giacomo Rizzolatti
- IIT Brain Center for Social and Motor Cognition, 43100, Parma, Italy; Dipartimento di Neuroscienze, Università di Parma, 43100 Parma, Italy.
| |
Collapse
|
893
|
Wang P, Gauthier I, Cottrell G. Are Face and Object Recognition Independent? A Neurocomputational Modeling Exploration. J Cogn Neurosci 2016; 28:558-74. [PMID: 26741802 DOI: 10.1162/jocn_a_00919] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Are face and object recognition abilities independent? Although it is commonly believed that they are, Gauthier et al. [Gauthier, I., McGugin, R. W., Richler, J. J., Herzmann, G., Speegle, M., & VanGulick, A. E. Experience moderates overlap between object and face recognition, suggesting a common ability. Journal of Vision, 14, 7, 2014] recently showed that these abilities become more correlated as experience with nonface categories increases. They argued that there is a single underlying visual ability, v, that is expressed in performance with both face and nonface categories as experience grows. Using the Cambridge Face Memory Test and the Vanderbilt Expertise Test, they showed that the shared variance between Cambridge Face Memory Test and Vanderbilt Expertise Test performance increases monotonically as experience increases. Here, we address why a shared resource across different visual domains does not lead to competition and to an inverse correlation in abilities? We explain this conundrum using our neurocomputational model of face and object processing ["The Model", TM, Cottrell, G. W., & Hsiao, J. H. Neurocomputational models of face processing. In A. J. Calder, G. Rhodes, M. Johnson, & J. Haxby (Eds.), The Oxford handbook of face perception. Oxford, UK: Oxford University Press, 2011]. We model the domain general ability v as the available computational resources (number of hidden units) in the mapping from input to label and experience as the frequency of individual exemplars in an object category appearing during network training. Our results show that, as in the behavioral data, the correlation between subordinate level face and object recognition accuracy increases as experience grows. We suggest that different domains do not compete for resources because the relevant features are shared between faces and objects. The essential power of experience is to generate a "spreading transform" for faces (separating them in representational space) that generalizes to objects that must be individuated. Interestingly, when the task of the network is basic level categorization, no increase in the correlation between domains is observed. Hence, our model predicts that it is the type of experience that matters and that the source of the correlation is in the fusiform face area, rather than in cortical areas that subserve basic level categorization. This result is consistent with our previous modeling elucidating why the FFA is recruited for novel domains of expertise [Tong, M. H., Joyce, C. A., & Cottrell, G. W. Why is the fusiform face area recruited for novel categories of expertise? A neurocomputational investigation. Brain Research, 1202, 14-24, 2008].
Collapse
|
894
|
Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance. J Neurosci 2015; 35:13402-18. [PMID: 26424887 DOI: 10.1523/jneurosci.5181-14.2015] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
To go beyond qualitative models of the biological substrate of object recognition, we ask: can a single ventral stream neuronal linking hypothesis quantitatively account for core object recognition performance over a broad range of tasks? We measured human performance in 64 object recognition tests using thousands of challenging images that explore shape similarity and identity preserving object variation. We then used multielectrode arrays to measure neuronal population responses to those same images in visual areas V4 and inferior temporal (IT) cortex of monkeys and simulated V1 population responses. We tested leading candidate linking hypotheses and control hypotheses, each postulating how ventral stream neuronal responses underlie object recognition behavior. Specifically, for each hypothesis, we computed the predicted performance on the 64 tests and compared it with the measured pattern of human performance. All tested hypotheses based on low- and mid-level visually evoked activity (pixels, V1, and V4) were very poor predictors of the human behavioral pattern. However, simple learned weighted sums of distributed average IT firing rates exactly predicted the behavioral pattern. More elaborate linking hypotheses relying on IT trial-by-trial correlational structure, finer IT temporal codes, or ones that strictly respect the known spatial substructures of IT ("face patches") did not improve predictive power. Although these results do not reject those more elaborate hypotheses, they suggest a simple, sufficient quantitative model: each object recognition task is learned from the spatially distributed mean firing rates (100 ms) of ∼60,000 IT neurons and is executed as a simple weighted sum of those firing rates. Significance statement: We sought to go beyond qualitative models of visual object recognition and determine whether a single neuronal linking hypothesis can quantitatively account for core object recognition behavior. To achieve this, we designed a database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior.
Collapse
|
895
|
Koenig-Robert R, VanRullen R, Tsuchiya N. Semantic Wavelet-Induced Frequency-Tagging (SWIFT) Periodically Activates Category Selective Areas While Steadily Activating Early Visual Areas. PLoS One 2015; 10:e0144858. [PMID: 26691722 PMCID: PMC4686956 DOI: 10.1371/journal.pone.0144858] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 11/23/2015] [Indexed: 11/19/2022] Open
Abstract
Primate visual systems process natural images in a hierarchical manner: at the early stage, neurons are tuned to local image features, while neurons in high-level areas are tuned to abstract object categories. Standard models of visual processing assume that the transition of tuning from image features to object categories emerges gradually along the visual hierarchy. Direct tests of such models remain difficult due to confounding alteration in low-level image properties when contrasting distinct object categories. When such contrast is performed in a classic functional localizer method, the desired activation in high-level visual areas is typically accompanied with activation in early visual areas. Here we used a novel image-modulation method called SWIFT (semantic wavelet-induced frequency-tagging), a variant of frequency-tagging techniques. Natural images modulated by SWIFT reveal object semantics periodically while keeping low-level properties constant. Using functional magnetic resonance imaging (fMRI), we indeed found that faces and scenes modulated with SWIFT periodically activated the prototypical category-selective areas while they elicited sustained and constant responses in early visual areas. SWIFT and the localizer were selective and specific to a similar extent in activating category-selective areas. Only SWIFT progressively activated the visual pathway from low- to high-level areas, consistent with predictions from standard hierarchical models. We confirmed these results with criterion-free methods, generalizing the validity of our approach and show that it is possible to dissociate neural activation in early and category-selective areas. Our results provide direct evidence for the hierarchical nature of the representation of visual objects along the visual stream and open up future applications of frequency-tagging methods in fMRI.
Collapse
Affiliation(s)
- Roger Koenig-Robert
- School of Psychological Sciences, Faculty of Biomedical and Psychological Sciences, Monash University, Melbourne, Australia
- * E-mail: (RK); (NT)
| | - Rufin VanRullen
- CNRS, UMR5549, Centre de Recherche Cerveau et Cognition, Faculté de Médecine de Purpan, 31052 Toulouse, France
- Université de Toulouse, Centre de Recherche Cerveau et Cognition, Université Paul Sabatier, 31052 Toulouse, France
| | - Naotsugu Tsuchiya
- School of Psychological Sciences, Faculty of Biomedical and Psychological Sciences, Monash University, Melbourne, Australia
- Decoding and Controlling Brain Information, Japan Science and Technology Agency, Chiyoda-ku, Tokyo, Japan, 102–8266
- * E-mail: (RK); (NT)
| |
Collapse
|
896
|
|
897
|
Varieties of perceptual truth and their possible evolutionary roots. Psychon Bull Rev 2015; 22:1519-22. [DOI: 10.3758/s13423-014-0741-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 09/22/2014] [Indexed: 11/08/2022]
|
898
|
Lim S, McKee JL, Woloszyn L, Amit Y, Freedman DJ, Sheinberg DL, Brunel N. Inferring learning rules from distributions of firing rates in cortical neurons. Nat Neurosci 2015; 18:1804-10. [PMID: 26523643 PMCID: PMC4666720 DOI: 10.1038/nn.4158] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 10/07/2015] [Indexed: 02/08/2023]
Abstract
Information about external stimuli is thought to be stored in cortical circuits through experience-dependent modifications of synaptic connectivity. These modifications of network connectivity should lead to changes in neuronal activity as a particular stimulus is repeatedly encountered. Here we ask what plasticity rules are consistent with the differences in the statistics of the visual response to novel and familiar stimuli in inferior temporal cortex, an area underlying visual object recognition. We introduce a method that allows one to infer the dependence of the presumptive learning rule on postsynaptic firing rate, and we show that the inferred learning rule exhibits depression for low postsynaptic rates and potentiation for high rates. The threshold separating depression from potentiation is strongly correlated with both mean and s.d. of the firing rate distribution. Finally, we show that network models implementing a rule extracted from data show stable learning dynamics and lead to sparser representations of stimuli.
Collapse
Affiliation(s)
- Sukbin Lim
- Department of Neurobiology, University of Chicago, Chicago, IL 60637, USA
| | - Jillian L. McKee
- Department of Neurobiology, University of Chicago, Chicago, IL 60637, USA
| | - Luke Woloszyn
- Department of Neuroscience, Columbia University, New York, NY 10032, USA
| | - Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA
- Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| | - David J. Freedman
- Department of Neurobiology, University of Chicago, Chicago, IL 60637, USA
| | | | - Nicolas Brunel
- Department of Neurobiology, University of Chicago, Chicago, IL 60637, USA
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA
- Corresponding author:
| |
Collapse
|
899
|
Abstract
UNLABELLED Although the rhesus monkey is used widely as an animal model of human visual processing, it is not known whether invariant visual object recognition behavior is quantitatively comparable across monkeys and humans. To address this question, we systematically compared the core object recognition behavior of two monkeys with that of human subjects. To test true object recognition behavior (rather than image matching), we generated several thousand naturalistic synthetic images of 24 basic-level objects with high variation in viewing parameters and image background. Monkeys were trained to perform binary object recognition tasks on a match-to-sample paradigm. Data from 605 human subjects performing the same tasks on Mechanical Turk were aggregated to characterize "pooled human" object recognition behavior, as well as 33 separate Mechanical Turk subjects to characterize individual human subject behavior. Our results show that monkeys learn each new object in a few days, after which they not only match mean human performance but show a pattern of object confusion that is highly correlated with pooled human confusion patterns and is statistically indistinguishable from individual human subjects. Importantly, this shared human and monkey pattern of 3D object confusion is not shared with low-level visual representations (pixels, V1+; models of the retina and primary visual cortex) but is shared with a state-of-the-art computer vision feature representation. Together, these results are consistent with the hypothesis that rhesus monkeys and humans share a common neural shape representation that directly supports object perception. SIGNIFICANCE STATEMENT To date, several mammalian species have shown promise as animal models for studying the neural mechanisms underlying high-level visual processing in humans. In light of this diversity, making tight comparisons between nonhuman and human primates is particularly critical in determining the best use of nonhuman primates to further the goal of the field of translating knowledge gained from animal models to humans. To the best of our knowledge, this study is the first systematic attempt at comparing a high-level visual behavior of humans and macaque monkeys.
Collapse
|
900
|
Kriegeskorte N. Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing. Annu Rev Vis Sci 2015; 1:417-446. [PMID: 28532370 DOI: 10.1146/annurev-vision-082114-035447] [Citation(s) in RCA: 466] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recent advances in neural network modeling have enabled major strides in computer vision and other artificial intelligence applications. Human-level visual recognition abilities are coming within reach of artificial systems. Artificial neural networks are inspired by the brain, and their computations could be implemented in biological neurons. Convolutional feedforward networks, which now dominate computer vision, take further inspiration from the architecture of the primate visual hierarchy. However, the current models are designed with engineering goals, not to model brain computations. Nevertheless, initial studies comparing internal representations between these models and primate brains find surprisingly similar representational spaces. With human-level performance no longer out of reach, we are entering an exciting new era, in which we will be able to build biologically faithful feedforward and recurrent computational models of how biological brains perform high-level feats of intelligence, including vision.
Collapse
Affiliation(s)
- Nikolaus Kriegeskorte
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom;
| |
Collapse
|