1101
|
Spratling MW. Learning viewpoint invariant perceptual representations from cluttered images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2005; 27:753-61. [PMID: 15875796 DOI: 10.1109/tpami.2005.105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations.
Collapse
|
1102
|
Sigala R, Serre T, Poggio T, Giese M. Learning Features of Intermediate Complexity for the Recognition of Biological Motion. ARTIFICIAL NEURAL NETWORKS: BIOLOGICAL INSPIRATIONS – ICANN 2005 2005. [DOI: 10.1007/11550822_39] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
1103
|
Fukushima K. Use of non-uniform spatial blur for image comparison: symmetry axis extraction. Neural Netw 2005; 18:23-32. [PMID: 15649659 DOI: 10.1016/j.neunet.2004.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2004] [Accepted: 08/03/2004] [Indexed: 10/26/2022]
Abstract
This paper shows that the introduction of non-uniform blur is very useful for comparing images, and proposes a neural network model that extracts axes of symmetry from visual patterns. The blurring operation greatly increases robustness against deformations and various kinds of noise, and largely reduces computational cost. Asymmetry between two groups of signals can be detected in a single action by the use of non-uniform blur having a cone-shaped distribution. The proposed model is a hierarchical multi-layered network, which consists of a contrast-extracting layer, edge-extracting layers (simple and complex types), and layers extracting symmetry axes. The model extracts oriented edges from an input image first, and then tries to extract axes of symmetry. The model checks conditions of symmetry, not directly from the oriented edges, but from a blurred version of the response of edge-extracting layer. The input patterns can be complicated line drawings, plane figures or gray-scaled natural images taken by CCD cameras.
Collapse
Affiliation(s)
- Kunihiko Fukushima
- School of Media Science, Tokyo University of Technology, 1404-1, Katakura, Hachioji, Tokyo 192-0982, Japan.
| |
Collapse
|
1104
|
Kirstein S, Wersing H, Körner E. Rapid Online Learning of Objects in a Biologically Motivated Recognition Architecture. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11550518_38] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
1105
|
Fiser J, Aslin RN. Encoding Multielement Scenes: Statistical Learning of Visual Feature Hierarchies. ACTA ACUST UNITED AC 2005; 134:521-37. [PMID: 16316289 DOI: 10.1037/0096-3445.134.4.521] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The authors investigated how human adults encode and remember parts of multielement scenes composed of recursively embedded visual shape combinations. The authors found that shape combinations that are parts of larger configurations are less well remembered than shape combinations of the same kind that are not embedded. Combined with basic mechanisms of statistical learning, this embeddedness constraint enables the development of complex new features for acquiring internal representations efficiently without being computationally intractable. The resulting representations also encode parts and wholes by chunking the visual input into components according to the statistical coherence of their constituents. These results suggest that a bootstrapping approach of constrained statistical learning offers a unified framework for investigating the formation of different internal representations in pattern and scene perception.
Collapse
Affiliation(s)
- József Fiser
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA.
| | | |
Collapse
|
1106
|
Murray SO, Schrater P, Kersten D. Perceptual grouping and the interactions between visual cortical areas. Neural Netw 2004; 17:695-705. [PMID: 15288893 DOI: 10.1016/j.neunet.2004.03.010] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2004] [Accepted: 03/31/2004] [Indexed: 10/26/2022]
Abstract
Visual perception involves the grouping of individual elements into coherent patterns, such as object representations, that reduce the descriptive complexity of a visual scene. The computational and physiological bases of this perceptual remain poorly understood. We discuss recent fMRI evidence from our laboratory where we measured activity in a higher object processing area (LOC), and in primary visual cortex (V1) in response to visual elements that were either grouped into objects or randomly arranged. We observed significant activity increases in the LOC and concurrent reductions of activity in V1 when elements formed coherent shapes, suggesting that activity in early visual areas is reduced as a result of grouping processes performed in higher areas. In light of these results we review related empirical findings of context-dependent changes in activity, recent neurophysiology research related to cortical feedback, and computational models that incorporate feedback operations. We suggest that feedback from high-level visual areas reduces activity in lower areas in order to simplify the description of a visual image-consistent with both predictive coding models of perception and probabilistic notions of 'explaining away.'
Collapse
Affiliation(s)
- Scott O Murray
- Department of Psychology, University of Minnesota, 75 E. River Road, Minneapolis, MN 55455, USA.
| | | | | |
Collapse
|
1107
|
Li M, Clark JJ. A Temporal Stability Approach to Position and Attention-Shift-Invariant Recognition. Neural Comput 2004; 16:2293-321. [PMID: 15476602 DOI: 10.1162/0899766041941907] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Incorporation of visual-related self-action signals can help neural networks learn invariance. We describe a method that can produce a network with invariance to changes in visual input caused by eye movements and covert attention shifts. Training of the network is controlled by signals associated with eye movements and covert attention shifting. A temporal perceptual stability constraint is used to drive the output of the network toward remaining constant across temporal sequences of saccadicmotions and covert attention shifts. We use a four-layer neural network model to perform the position-invariant extraction of local features and temporal integration of invariant presentations of local features in a bottom-up structure. We present results on both simulated data and real images to demonstrate that our network can acquire both position and attention shift invariance.
Collapse
Affiliation(s)
- Muhua Li
- Centre for Intelligent Machines, McGill University, Montréal, Québec, Canada H3A 2A7.
| | | |
Collapse
|
1108
|
Suzuki N, Hashimoto N, Kashimori Y, Zheng M, Kambara T. A neural model of predictive recognition in form pathway of visual cortex. Biosystems 2004; 76:33-42. [PMID: 15351128 DOI: 10.1016/j.biosystems.2004.05.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2003] [Revised: 07/11/2003] [Accepted: 08/01/2003] [Indexed: 10/26/2022]
Abstract
We present a functional model of form pathway in visual cortex based on predictive coding scheme, in which the prediction is compared with feedforward signals filtered by two kinds of spatial resolution maps, broad and fine resolution map. We propose here the functional role of the prediction and of the two kinds of resolution maps in perception of object form in visual system. The prediction is represented based on memory of dynamical attractors in temporal cortex, categorized by an elemental figure in posterior temporal cortex. The prediction is generated by the feedforward signals of main neurons in broad resolution maps of V(1) and V(4), and then is compared with the feedforward signals of main neurons in fine resolution map of V(1) and V(4).
Collapse
Affiliation(s)
- Nobuyuki Suzuki
- Department of Information Network Science, Graduate School of Information Science, University of Electro-Communications, Chofu, Tokyo 182-8585, Japan.
| | | | | | | | | |
Collapse
|
1109
|
Affiliation(s)
- Thomas J Palmeri
- Department of Psychology, Center for Integrative and Cognitive Neuroscience, Vanderbilt University, 301 Wilson Hall, Nashville, Tennessee 37203, USA.
| | | |
Collapse
|
1110
|
Abstract
This paper proposes a new neocognitron that accepts incremental learning, without giving a severe damage to old memories or reducing learning speed. The new neocognitron uses a competitive learning, and the learning of all stages of the hierarchical network progresses simultaneously. To increase the learning speed, conventional neocognitrons of recent versions sacrificed the ability of incremental learning, and used a technique of sequential construction of layers, by which the learning of a layer started after the learning of the preceding layers had completely finished. If the learning speed is simply set high for the conventional neocognitron, simultaneous construction of layers produces many garbage cells, which become always silent after having finished the learning. The proposed neocognitron with a new learning method can prevent the generation of such garbage cells even with a high learning speed, allowing incremental learning.
Collapse
Affiliation(s)
- Kunihiko Fukushima
- School of Media Science, Tokyo University of Technology, 1404-1 Katakura Hachioji, Tokyo 192-0982, Japan.
| |
Collapse
|
1111
|
|
1112
|
Körding KP, Kayser C, Einhäuser W, König P. How are complex cell properties adapted to the statistics of natural stimuli? J Neurophysiol 2004; 91:206-12. [PMID: 12904330 DOI: 10.1152/jn.00149.2003] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Sensory areas should be adapted to the properties of their natural stimuli. What are the underlying rules that match the properties of complex cells in primary visual cortex to their natural stimuli? To address this issue, we sampled movies from a camera carried by a freely moving cat, capturing the dynamics of image motion as the animal explores an outdoor environment. We use these movie sequences as input to simulated neurons. Following the intuition that many meaningful high-level variables, e.g., identities of visible objects, do not change rapidly in natural visual stimuli, we adapt the neurons to exhibit firing rates that are stable over time. We find that simulated neurons, which have optimally stable activity, display many properties that are observed for cortical complex cells. Their response is invariant with respect to stimulus translation and reversal of contrast polarity. Furthermore, spatial frequency selectivity and the aspect ratio of the receptive field quantitatively match the experimentally observed characteristics of complex cells. Hence, the population of complex cells in the primary visual cortex can be described as forming an optimally stable representation of natural stimuli.
Collapse
Affiliation(s)
- Konrad P Körding
- Institute of Neurology, University College London, London, WC1N 3BG, United Kingdom.
| | | | | | | |
Collapse
|
1113
|
Mitchell JF, Zipser D. Sequential memory-guided saccades and target selection: a neural model of the frontal eye fields. Vision Res 2003; 43:2669-95. [PMID: 14552808 DOI: 10.1016/s0042-6989(03)00468-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We present a neural model of the frontal eye fields. It consists of several retinotopic arrays of neuron-like units that are recurrently connected. The network is trained to make memory-guided saccades to sequentially flashed targets that appear at arbitrary locations. This task is interesting because the large number of possible sequences does not permit a pre-learned response. Instead locations and their priority must be maintained in active working memory. The network learns to perform the task. Surprisingly, after training it can also select targets in visual search tasks. When targets are shown in parallel it chooses them according to their salience. Its search behavior is comparable to that of humans. It exhibits saccadic averaging, increased reaction times with more distractors, latency vs accuracy trade-offs, and inhibition of return. Analysis of the network shows that it operates like a queue, storing the potential targets in sequence for later execution. A small number of unit types are sufficient to encode this information, but the manner of coding is non-obvious. Units respond to multiple targets similar to quasi-visual cells recently studied [Exp. Brain Res. 130 (2000) 433]. Predictions are made that can be experimentally tested.
Collapse
Affiliation(s)
- Jude F Mitchell
- Cognitive Science, University of California at San Diego, La Jolla, CA 92093-0515, USA.
| | | |
Collapse
|
1114
|
Abstract
Previous studies have suggested that both the prefrontal cortex (PFC) and inferior temporal cortex (ITC) are involved in high-level visual processing and categorization, but their respective roles are not known. To address this, we trained monkeys to categorize a continuous set of visual stimuli into two categories, "cats" and "dogs." The stimuli were parametrically generated using a computer graphics morphing system (Sheltonelton, 2000) that allowed precise control over stimulus shape. After training, we recorded neural activity from the PFC and the ITC of monkeys while they performed a category-matching task. We found that the PFC and the ITC play distinct roles in category-based behaviors: the ITC seems more involved in the analysis of currently viewed shapes, whereas the PFC showed stronger category signals, memory effects, and a greater tendency to encode information in terms of its behavioral meaning.
Collapse
|
1115
|
A visual system for invariant recognition in animated image sequences. Neurocomputing 2003. [DOI: 10.1016/s0925-2312(02)00845-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
1116
|
Matsugu M, Mori K, Mitari Y, Kaneda Y. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw 2003; 16:555-9. [PMID: 12850007 DOI: 10.1016/s0893-6080(03)00115-1] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Reliable detection of ordinary facial expressions (e.g. smile) despite the variability among individuals as well as face appearance is an important step toward the realization of perceptual user interface with autonomous perception of persons. We describe a rule-based algorithm for robust facial expression recognition combined with robust face detection using a convolutional neural network. In this study, we address the problem of subject independence as well as translation, rotation, and scale invariance in the recognition of facial expression. The result shows reliable detection of smiles with recognition rate of 97.6% for 5600 still images of more than 10 subjects. The proposed algorithm demonstrated the ability to discriminate smiling from talking based on the saliency score obtained from voting visual cues. To the best of our knowledge, it is the first facial expression recognition model with the property of subject independence combined with robustness to variability in facial appearance.
Collapse
Affiliation(s)
- Masakazu Matsugu
- Canon Research Center, 5-1, Morinosato-Wakamiya, Atsugi 243-0193, Japan.
| | | | | | | |
Collapse
|
1117
|
Perez C, Salinas C, Estevez P, Valenzuela P. Genetic design of biologically inspired receptive fields for neural pattern recognition. ACTA ACUST UNITED AC 2003; 33:258-70. [DOI: 10.1109/tsmcb.2003.810441] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
1118
|
Abstract
The visual recognition of complex movements and actions is crucial for the survival of many species. It is important not only for communication and recognition at a distance, but also for the learning of complex motor actions by imitation. Movement recognition has been studied in psychophysical, neurophysiological and imaging experiments, and several cortical areas involved in it have been identified. We use a neurophysiologically plausible and quantitative model as a tool for organizing and making sense of the experimental data, despite their growing size and complexity. We review the main experimental findings and discuss possible neural mechanisms, and show that a learning-based, feedforward model provides a neurophysiologically plausible and consistent summary of many key experimental results.
Collapse
Affiliation(s)
- Martin A Giese
- Laboratory for Action Representation and Learning, Department of Cognitive Neurology, University Clinic Tübingen, Spemannstrasse 34, D-72076 Tübingen, Germany.
| | | |
Collapse
|
1119
|
Wyss R, Konig P, Verschure PFMJ. Invariant representations of visual patterns in a temporal population code. Proc Natl Acad Sci U S A 2003; 100:324-9. [PMID: 12502790 PMCID: PMC140966 DOI: 10.1073/pnas.0136977100] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2002] [Accepted: 11/15/2002] [Indexed: 11/18/2022] Open
Abstract
Mammalian visual systems are characterized by their ability to recognize stimuli invariant to various transformations. Here, we investigate the hypothesis that this ability is achieved by the temporal encoding of visual stimuli. By using a model of a cortical network, we show that this encoding is invariant to several transformations and robust with respect to stimulus variability. Furthermore, we show that the proposed model provides a rapid encoding, in accordance with recent physiological results. Taking into account properties of primary visual cortex, the application of the encoding scheme to an enhanced network demonstrates favorable scaling and high performance in a task humans excel at.
Collapse
Affiliation(s)
- Reto Wyss
- Institute of Neuroinformatics, University of Zürich and Swiss Federal Institute of Technology, Switzerland.
| | | | | |
Collapse
|
1120
|
Lee SK, Chung PC, Chang CI, Lo CS, Lee T, Hsu GC, Yang CW. Classification of clustered microcalcifications using a Shape Cognitron neural network. Neural Netw 2003; 16:121-32. [PMID: 12576111 DOI: 10.1016/s0893-6080(02)00164-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A new shape recognition-based neural network built with universal feature planes, called Shape Cognitron (S-Cognitron) is introduced to classify clustered microcalcifications. The architecture of S-Cognitron consists of two modules and an extra layer, called 3D figure layer lies in between. The first module contains a shape orientation layer, built with 20 cell planes of low level universal shape features to convert first-order shape orientations into numeric values, and a complex layer, to extract second-order shape features. The 3D figure layer is a feature extract-display layer that extracts the shape curvatures of an input pattern and displays them as a 3D figure. It is then followed by a second module made up of a feature formation layer and a probabilistic neural network-based classification layer. The system is evaluated by using Nijmegen mammogram database and experimental results show that sensitivity and specificity can reach 86.1 and 74.1%, respectively.
Collapse
Affiliation(s)
- San Kan Lee
- Department of Radiology, Taichung Veterans General Hospital, VACRS, 40705, Taichung, Taiwan, ROC
| | | | | | | | | | | | | |
Collapse
|
1121
|
Yu AJ, Giese MA, Poggio TA. Biophysiologically plausible implementations of the maximum operation. Neural Comput 2002; 14:2857-81. [PMID: 12487795 DOI: 10.1162/089976602760805313] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Visual processing in the cortex can be characterized by a predominantly hierarchical architecture, in which specialized brain regions along the processing pathways extract visual features of increasing complexity, accompanied by greater invariance in stimulus properties such as size and position. Various studies have postulated that a nonlinear pooling function such as the maximum (MAX) operation could be fundamental in achieving such selectivity and invariance. In this article, we are concerned with neurally plausible mechanisms that may be involved in realizing the MAX operation. Different canonical models are proposed, each based on neural mechanisms that have been previously discussed in the context of cortical processing. Through simulations and mathematical analysis, we compare the performance and robustness of these mechanisms. We derive experimentally verifiable predictions for each model and discuss the relevant physiological considerations.
Collapse
Affiliation(s)
- Angela J Yu
- Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, UK.
| | | | | |
Collapse
|
1122
|
|
1123
|
Murray SO, Kersten D, Olshausen BA, Schrater P, Woods DL. Shape perception reduces activity in human primary visual cortex. Proc Natl Acad Sci U S A 2002; 99:15164-9. [PMID: 12417754 PMCID: PMC137561 DOI: 10.1073/pnas.192579399] [Citation(s) in RCA: 292] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2002] [Accepted: 09/24/2002] [Indexed: 11/18/2022] Open
Abstract
Visual perception involves the grouping of individual elements into coherent patterns that reduce the descriptive complexity of a visual scene. The physiological basis of this perceptual simplification remains poorly understood. We used functional MRI to measure activity in a higher object processing area, the lateral occipital complex, and in primary visual cortex in response to visual elements that were either grouped into objects or randomly arranged. We observed significant activity increases in the lateral occipital complex and concurrent reductions of activity in primary visual cortex when elements formed coherent shapes, suggesting that activity in early visual areas is reduced as a result of grouping processes performed in higher areas. These findings are consistent with predictive coding models of vision that postulate that inferences of high-level areas are subtracted from incoming sensory information in lower areas through cortical feedback.
Collapse
Affiliation(s)
- Scott O Murray
- Center for Neuroscience, Department of Psychology, University of California, Davis 95616, USA.
| | | | | | | | | |
Collapse
|
1124
|
Shibata K, Nishino T, Okabe Y. Active perception and recognition learning system based on Actor-Q architecture. ACTA ACUST UNITED AC 2002. [DOI: 10.1002/scj.10207] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
1125
|
Abstract
Single-unit recordings from behaving monkeys and human functional magnetic resonance imaging studies have continued to provide a host of experimental data on the properties and mechanisms of object recognition in cortex. Recent advances in object recognition, spanning issues regarding invariance, selectivity, representation and levels of recognition have allowed us to propose a putative model of object recognition in cortex.
Collapse
Affiliation(s)
- Maximilian Riesenhuber
- McGovern Institute for Brain Research, Department of Brain & Cognitive Sciences, Center for Biological and Computational Learning and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, 02142, USA.
| | | |
Collapse
|
1126
|
Einhäuser W, Kayser C, König P, Körding KP. Learning the invariance properties of complex cells from their responses to natural stimuli. Eur J Neurosci 2002; 15:475-86. [PMID: 11876775 DOI: 10.1046/j.0953-816x.2001.01885.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Neurons in primary visual cortex are typically classified as either simple or complex. Whereas simple cells respond strongly to grating and bar stimuli displayed at a certain phase and visual field location, complex cell responses are insensitive to small translations of the stimulus within the receptive field [Hubel & Wiesel (1962) J. Physiol. (Lond.), 160, 106-154; Kjaer et al. (1997) J. Neurophysiol., 78, 3187-3197]. This constancy in the response to variations of the stimuli is commonly called invariance. Hubel and Wiesel's classical model of the primary visual cortex proposes a connectivity scheme which successfully describes simple and complex cell response properties. However, the question as to how this connectivity arises during normal development is left open. Based on their work and inspired by recent physiological findings we suggest a network model capable of learning from natural stimuli and developing receptive field properties which match those of cortical simple and complex cells. Stimuli are drawn from videos obtained by a camera mounted to a cat's head, so they should approximate the natural input to the cat's visual system. The network uses a competitive scheme to learn simple and complex cell response properties. Employing delayed signals to learn connections between simple and complex cells enables the model to utilize temporal properties of the input. We show that the temporal structure of the input gives rise to the emergence and refinement of complex cell receptive fields, whereas removing temporal continuity prevents this processes. This model lends a physiologically based explanation of the development of complex cell invariance response properties.
Collapse
Affiliation(s)
- Wolfgang Einhäuser
- Institute of Neuroinformatics, University of Zürich and ETH Zürich, Winterthurerstr. 190, 8057 Zürich, Switzerland.
| | | | | | | |
Collapse
|
1127
|
Bussey TJ, Saksida LM. The organization of visual object representations: a connectionist model of effects of lesions in perirhinal cortex. Eur J Neurosci 2002; 15:355-64. [PMID: 11849301 DOI: 10.1046/j.0953-816x.2001.01850.x] [Citation(s) in RCA: 169] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We have developed a simple connectionist model based on the idea that perirhinal cortex has properties similar to other regions in the ventral visual stream, or 'what' pathway. The model is based on the assumption that representations in the ventral visual stream are organized hierarchically, such that representations of simple features of objects are stored in caudal regions of the ventral visual stream, and representations of the conjunctions of these features are stored in more rostral regions. We propose that a function of these feature conjunction representations is to help to resolve 'feature ambiguity', a property of visual discrimination problems that can emerge when features of an object predict a given outcome (e.g. reward) when part of one object, but predict a different outcome when part of another object. Several recently reported effects of lesions of perirhinal cortex in monkeys have provided key insights into the functions of this region. In the present study these effects were simulated by comparing the performance of connectionist networks before and after removal of a layer of units corresponding to perirhinal cortex. The results of these simulations suggest that effects of lesions in perirhinal cortex on visual discrimination may be due not to the impairment of a specific type of learning or memory, such as declarative or procedural, but to compromising the representations of visual stimuli. Furthermore, we propose that attempting to classify perirhinal cortex function as either 'perceptual' or 'mnemonic' may be misguided, as it seems unlikely that these broad constructs will map neatly onto anatomically defined regions of the brain.
Collapse
Affiliation(s)
- Timothy J Bussey
- Section on the Neurobiology of Learning and Memory, Laboratory of Neuropsychology, National Institute of Mental Health, Bethesda, MD 20892, USA.
| | | |
Collapse
|
1128
|
Walther D, Itti L, Riesenhuber M, Poggio T, Koch C. Attentional Selection for Object Recognition — A Gentle Way. BIOLOGICALLY MOTIVATED COMPUTER VISION 2002. [DOI: 10.1007/3-540-36181-2_47] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
1129
|
Körding KP, König P. Neurons with two sites of synaptic integration learn invariant representations. Neural Comput 2001; 13:2823-49. [PMID: 11705412 DOI: 10.1162/089976601317098547] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Neurons in mammalian cerebral cortex combine specific responses with respect to some stimulus features with invariant responses to other stimulus features. For example, in primary visual cortex, complex cells code for orientation of a contour but ignore its position to a certain degree. In higher areas, such as the inferotemporal cortex, translation-invariant, rotation-invariant, and even view point-invariant responses can be observed. Such properties are of obvious interest to artificial systems performing tasks like pattern recognition. It remains to be resolved how such response properties develop in biological systems. Here we present an unsupervised learning rule that addresses this problem. It is based on a neuron model with two sites of synaptic integration, allowing qualitatively different effects of input to basal and apical dendritic trees, respectively. Without supervision, the system learns to extract invariance properties using temporal or spatial continuity of stimuli. Furthermore, top-down information can be smoothly integrated in the same framework. Thus, this model lends a physiological implementation to approaches of unsupervised learning of invariant-response properties.
Collapse
Affiliation(s)
- K P Körding
- Institute of Neuroinformatics, ETH/University Zürich, 8057 Zürich, Switzerland.
| | | |
Collapse
|
1130
|
Abstract
A neural network and the associated learning algorithm are presented as a generic approach for invariant recognition of visual patterns independent of their geometric attributes, such as spatial location, orientation and scale. The network is a multi-layer hierarchy with each layer composed of a set of groups of nodes. The groups of the input layer represent local areas spatially arranged in the visual field according to the geometric variations. Each node in the subsequent higher layers receives input laterally from other groups of the same layer as well as vertically from the layer below. The learning that takes place in the vertical feed forward paths between layers is based on an unsupervised hybrid algorithm combining both competitive learning and Hebbian learning. As the result of the architecture and the hybrid learning, the desired invariant recognition emerges at the output layer of the network. The network can serve as a simple and biologically plausible computational model to account for the invariant object recognition in the biological visual system. Also, as the algorithm is generic and robust, it can be applied to solve various practical recognition problems.
Collapse
Affiliation(s)
- R Wang
- Engineering Department, Harvey Mudd College, Claremont, CA 91711, USA.
| |
Collapse
|
1131
|
Salinas E, Abbott LF. Coordinate transformations in the visual system: how to generate gain fields and what to compute with them. PROGRESS IN BRAIN RESEARCH 2001; 130:175-90. [PMID: 11480274 DOI: 10.1016/s0079-6123(01)30012-2] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Affiliation(s)
- E Salinas
- Howard Hughes Medical Institute, Computational Neurobiology Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA.
| | | |
Collapse
|
1132
|
Reed S, Coupland J. Cascaded linear shift-invariant processors in optical pattern recognition. APPLIED OPTICS 2001; 40:3843-3849. [PMID: 18360417 DOI: 10.1364/ao.40.003843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We study a cascade of linear shift-invariant processing modules (correlators), each augmented with a nonlinear threshold as a means to increase the performance of high-speed optical pattern recognition. This configuration is a special class of multilayer, feed-forward neural networks and has been proposed in the literature as a relatively fast best-guess classifier. However, it seems that, although cascaded correlation has been proposed in a number of specific pattern recognition problems, the importance of the configuration has been largely overlooked. We prove that the cascaded architecture is the exact structure that must be adopted if a multilayer feed-forward neural network is trained to produce a shift-invariant output. In contrast with more generalized multilayer networks, the approach is easily implemented in practice with optical techniques and is therefore ideally suited to the high-speed analysis of large images. We have trained a digital model of the system using a modified backpropagation algorithm with optimization using simulated annealing techniques. The resulting cascade has been applied to a defect recognition problem in the canning industry as a benchmark for comparison against a standard linear correlation filter, the minimum average correlation energy (MACE) filter. We show that the nonlinear performance of the cascade is a significant improvement over that of the linear MACE filter in this case.
Collapse
|
1133
|
Hoshino O, Inoue S, Kashimori Y, Kambara T. A hierarchical dynamical map as a basic frame for cortical mapping and its application to priming. Neural Comput 2001; 13:1781-810. [PMID: 11506670 DOI: 10.1162/08997660152469341] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A hierarchical dynamical map is proposed as the basic framework for sensory cortical mapping. To show how the hierarchical dynamical map works in cognitive processes, we applied it to a typical cognitive task known as priming, in which cognitive performance is facilitated as a consequence of prior experience. Prior to the priming task, the network memorizes a sensory scene containing multiple objects presented simultaneously using a hierarchical dynamical map. Each object is composed of different sensory features. The hierarchical dynamical map presented here is formed by random itinerancy among limit-cycle attractors into which these objects are encoded. Each limit-cycle attractor contains multiple point attractors into which elemental features belonging to the same object are encoded. When a feature stimulus is presented as a priming cue, the network state is changed from the itinerant state to a limit-cycle attractor relevant to the priming cue. After a short priming period, the network state reverts to the itinerant state. Under application of the test cue, consisting of some feature belonging to the object relevant to the priming cue and fragments of features belonging to others, the network state is changed to a limit-cycle attractor and finally to a point attractor relevant to the target feature. This process is considered as the identification of the target. The model consistently reproduces various observed results for priming processes such as the difference in identification time between cross-modality and within-modality priming tasks, the effect of interval between priming cue and test cue on identification time, the effect of priming duration on the time, and the effect of repetition of the same priming task on neural activity.
Collapse
Affiliation(s)
- O Hoshino
- Department of Human Welfare Engineering, Oita University, Otia 870-1192, Japan
| | | | | | | |
Collapse
|
1134
|
Tetko IV, Kovalishyn VV, Livingstone DJ. Volume learning algorithm artificial neural networks for 3D QSAR studies. J Med Chem 2001; 44:2411-20. [PMID: 11448223 DOI: 10.1021/jm010858e] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The current study introduces a new method, the volume learning algorithm (VLA), for the investigation of three-dimensional quantitative structure-activity relationships (QSAR) of chemical compounds. This method incorporates the advantages of comparative molecular field analysis (CoMFA) and artificial neural network approaches. VLA is a combination of supervised and unsupervised neural networks applied to solve the same problem. The supervised algorithm is a feed-forward neural network trained with a back-propagation algorithm while the unsupervised network is a self-organizing map of Kohonen. The use of both of these algorithms makes it possible to cluster the input CoMFA field variables and to use only a small number of the most relevant parameters to correlate spatial properties of the molecules with their activity. The statistical coefficients calculated by the proposed algorithm for cannabimimetic aminoalkyl indoles were comparable to, or improved, in comparison to the original study using the partial least squares algorithm. The results of the algorithm can be visualized and easily interpreted. Thus, VLA is a new convenient tool for three-dimensional QSAR studies.
Collapse
Affiliation(s)
- I V Tetko
- Biomedical Department, Institute of Bioorganic & Petroleum Chemistry, Murmanskaya 1, Kiev-660, 253660 Ukraine.
| | | | | |
Collapse
|
1135
|
McGuire P, D'Eleuterio G. Eigenpaxels and a neural-network approach to image classification. ACTA ACUST UNITED AC 2001; 12:625-35. [DOI: 10.1109/72.925566] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
1136
|
Watanabe M, Nakanishi O, Aihara K. Solving the binding problem of the brain with bi-directional functional connectivity. Neural Netw 2001; 14:395-406. [PMID: 11411628 DOI: 10.1016/s0893-6080(01)00036-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We propose a neural network model which gives one solution to the binding problem on the basis of 'functional connectivity' and bidirectional connections. Here, 'functional connectivity' is dynamic neuronal connectivity peculiar to temporal spike coding neural networks with coincidence detector neurons. The model consists of a single primary map and two higher modules which extract two different features shown on the primary map. There exist three layers in each higher module and the layers are connected bi-directionally. An object in the outer world is represented by a 'global dynamical cell assembly' which is organized across the primary map and the two higher modules. Detailed, but spatially localized, information is coded in the primary map, whereas coarse, but spatially extracted information or globally integrated information is coded in the higher modules. Computer simulations of the proposed model show that multiple cell assemblies sharing the same neurons partially can co-exist. Furthermore, we introduce a three-dimensional J-PSTH (Joint-Peri Stimulus Time Histogram) which is capable of tracking such cell assemblies, altering its constituent neurons as in our proposed model.
Collapse
Affiliation(s)
- M Watanabe
- Department of Mathematical Engineering and Information Physics, Graduate School of Engineering, The University of Tokyo, Japan.
| | | | | |
Collapse
|
1137
|
Abstract
Understanding how biological visual systems recognize objects is one of the ultimate goals in computational neuroscience. From the computational viewpoint of learning, different recognition tasks, such as categorization and identification, are similar, representing different trade-offs between specificity and invariance. Thus, the different tasks do not require different classes of models. We briefly review some recent trends in computational vision and then focus on feedforward, view-based models that are supported by psychophysical and physiological data.
Collapse
Affiliation(s)
- M Riesenhuber
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Center for Biological and Computational Learning and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge 02142, USA
| | | |
Collapse
|
1138
|
Abstract
Many visual tasks can be decomposed into a sequence of simpler subtasks. Ullman suggested that such subtasks are carried out by elemental operations that are implemented by specialized processes in the visual brain [Ullman, S. (1984). Visual routines. Cognition (18), 97-159]. According to this hypothesis, there are a limited number of elemental operations that, since they can be applied sequentially, may nevertheless give rise to a large number of visual routines. Examples of such elemental operations are visual search, texture segregation and contour grouping. Here we attempt to delineate how such elemental operations are implemented in the visual brain. When an image appears, feedforward processing rapidly leads to an activity pattern that is distributed across many visual areas. Thereafter, elemental operations come into play, and these are implemented by the modulation of firing rates. Firing rate modulations effectuate grouping of neural responses into coherent object representations. Moreover, they permit transfer of information from one operator to the next, which allows flexibility in the sequencing of operations. We discuss how the elemental operations provide a tool to relate cortical physiology to psychophysics, and suggest a reclassification of pre-attentive and attentive processes.
Collapse
Affiliation(s)
- P R Roelfsema
- Department of Visual System Analysis, Academic Medical Center (UvA), Graduate School of Neurosciences Amsterdam, P.O. Box 12011, 1100 AC, Amsterdam, The Netherlands.
| | | | | |
Collapse
|
1139
|
Rolls ET. Functions of the primate temporal lobe cortical visual areas in invariant visual object and face recognition. Neuron 2000; 27:205-18. [PMID: 10985342 DOI: 10.1016/s0896-6273(00)00030-1] [Citation(s) in RCA: 214] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- E T Rolls
- University of Oxford, Department of Experimental Psychology, United Kingdom.
| |
Collapse
|
1140
|
Stringer SM, Rolls ET. Position invariant recognition in the visual system with cluttered environments. Neural Netw 2000; 13:305-15. [PMID: 10937964 DOI: 10.1016/s0893-6080(00)00017-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The effects of cluttered environments are investigated on the performance of a hierarchical multilayer model of invariant object recognition in the visual system (VisNet) that employs learning rules that utilise a trace of previous neural activity. This class of model relies on the spatio-temporal statistics of natural visual inputs to be able to associate together different exemplars of the same stimulus or object which will tend to occur in temporal proximity. In this paper the different exemplars of a stimulus are the same stimulus in different positions. First it is shown that if the stimuli have been learned previously against a plain background, then the stimuli can be correctly recognised even in environments with cluttered (e.g. natural) backgrounds which form complex scenes. Second it is shown that the functional architecture has difficulty in learning new objects if they are presented against cluttered backgrounds. It is suggested that processes such as the use of a high-resolution fovea, or attention, may be particularly useful in suppressing the effects of background noise and in segmenting objects from their background when new objects need to be learned. However, it is shown third that this problem may be ameliorated by the prior existence of stimulus tuned feature detecting neurons in the early layers of the VisNet, and that these feature detecting neurons may be set up through previous exposure to the relevant class of objects. Fourth we extend these results to partially occluded objects, showing that (in contrast with many artificial vision systems) correct recognition in this class of architecture can occur if the objects have been learned previously without occlusion.
Collapse
Affiliation(s)
- S M Stringer
- Oxford University, Department of Experimental Psychology, UK
| | | |
Collapse
|
1141
|
Torres-Mendez L, Ruiz-Suarez J, Sucar L, Gomez G. Translation, rotation, and scale-invariant object recognition. ACTA ACUST UNITED AC 2000. [DOI: 10.1109/5326.827484] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
1142
|
Abstract
Visual processing in cortex is classically modeled as a hierarchy of increasingly sophisticated representations, naturally extending the model of simple to complex cells of Hubel and Wiesel. Surprisingly, little quantitative modeling has been done to explore the biological feasibility of this class of models to explain aspects of higher-level visual processing such as object recognition. We describe a new hierarchical model consistent with physiological data from inferotemporal cortex that accounts for this complex visual task and makes testable predictions. The model is based on a MAX-like operation applied to inputs to certain cortical neurons that may have a general role in cortical function.
Collapse
Affiliation(s)
- M Riesenhuber
- Department of Brain and Cognitive Sciences, Center for Biological and Computational Learning and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | | |
Collapse
|
1143
|
Doya K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 1999; 12:961-974. [PMID: 12662639 DOI: 10.1016/s0893-6080(99)00046-5] [Citation(s) in RCA: 377] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The classical notion that the cerebellum and the basal ganglia are dedicated to motor control is under dispute given increasing evidence of their involvement in non-motor functions. Is it then impossible to characterize the functions of the cerebellum, the basal ganglia and the cerebral cortex in a simplistic manner? This paper presents a novel view that their computational roles can be characterized not by asking what are the "goals" of their computation, such as motor or sensory, but by asking what are the "methods" of their computation, specifically, their learning algorithms. There is currently enough anatomical, physiological, and theoretical evidence to support the hypotheses that the cerebellum is a specialized organism for supervised learning, the basal ganglia are for reinforcement learning, and the cerebral cortex is for unsupervised learning.This paper investigates how the learning modules specialized for these three kinds of learning can be assembled into goal-oriented behaving systems. In general, supervised learning modules in the cerebellum can be utilized as "internal models" of the environment. Reinforcement learning modules in the basal ganglia enable action selection by an "evaluation" of environmental states. Unsupervised learning modules in the cerebral cortex can provide statistically efficient representation of the states of the environment and the behaving system. Two basic action selection architectures are shown, namely, reactive action selection and predictive action selection. They can be implemented within the anatomical constraint of the network linking these structures. Furthermore, the use of the cerebellar supervised learning modules for state estimation, behavioral simulation, and encapsulation of learned skill is considered. Finally, the usefulness of such theoretical frameworks in interpreting brain imaging data is demonstrated in the paradigm of procedural learning.
Collapse
Affiliation(s)
- K Doya
- Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, 2-2 Hikaridai, Seika, Soraku, Kyoto, Japan
| |
Collapse
|
1144
|
Abstract
A fundamental capacity of the perceptual systems and the brain in general is to deal with the novel and the unexpected. In vision, we can effortlessly recognize a familiar object under novel viewing conditions, or recognize a new object as a member of a familiar class, such as a house, a face, or a car. This ability to generalize and deal efficiently with novel stimuli has long been considered a challenging example of brain-like computation that proved extremely difficult to replicate in artificial systems. In this paper we present an approach to generalization and invariant recognition. We focus our discussion on the problem of invariance to position in the visual field, but also sketch how similar principles could apply to other domains.The approach is based on the use of a large repertoire of partial generalizations that are built upon past experience. In the case of shift invariance, visual patterns are described as the conjunction of multiple overlapping image fragments. The invariance to the more primitive fragments is built into the system by past experience. Shift invariance of complex shapes is obtained from the invariance of their constituent fragments. We study by simulations aspects of this shift invariance method and then consider its extensions to invariant perception and classification by brain-like structures.
Collapse
Affiliation(s)
- S Ullman
- Department of Applied Mathematics & Computer Science, The Weizmann Institute of Science, Rehovot, Israel
| | | |
Collapse
|
1145
|
Affiliation(s)
- A Treisman
- Psychology Department, Princeton University, New Jersey 08544, USA.
| |
Collapse
|
1146
|
Affiliation(s)
- J M Wolfe
- Center for Ophthalmic Research, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | |
Collapse
|
1147
|
Affiliation(s)
- W Singer
- Max-Planck-Institute for Brain Research, Frankfurt, Federal Republic of Germany.
| |
Collapse
|
1148
|
Affiliation(s)
- M N Shadlen
- Department of Physiology and Biophysics, University of Washington, Seattle 98195, USA
| | | |
Collapse
|
1149
|
Affiliation(s)
- C von der Malsburg
- Institut für Neuroinformatik, Ruhr-Universität Bochum, Federal Republic of Germany.
| |
Collapse
|
1150
|
Affiliation(s)
- G M Ghose
- Division of Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
| | | |
Collapse
|