1
|
Stecher R, Kaiser D. Representations of imaginary scenes and their properties in cortical alpha activity. Sci Rep 2024; 14:12796. [PMID: 38834699 DOI: 10.1038/s41598-024-63320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 05/28/2024] [Indexed: 06/06/2024] Open
Abstract
Imagining natural scenes enables us to engage with a myriad of simulated environments. How do our brains generate such complex mental images? Recent research suggests that cortical alpha activity carries information about individual objects during visual imagery. However, it remains unclear if more complex imagined contents such as natural scenes are similarly represented in alpha activity. Here, we answer this question by decoding the contents of imagined scenes from rhythmic cortical activity patterns. In an EEG experiment, participants imagined natural scenes based on detailed written descriptions, which conveyed four complementary scene properties: openness, naturalness, clutter level and brightness. By conducting classification analyses on EEG power patterns across neural frequencies, we were able to decode both individual imagined scenes as well as their properties from the alpha band, showing that also the contents of complex visual images are represented in alpha rhythms. A cross-classification analysis between alpha power patterns during the imagery task and during a perception task, in which participants were presented images of the described scenes, showed that scene representations in the alpha band are partly shared between imagery and late stages of perception. This suggests that alpha activity mediates the top-down re-activation of scene-related visual contents during imagery.
Collapse
Affiliation(s)
- Rico Stecher
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany.
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, 35032, Marburg, Germany
| |
Collapse
|
2
|
Milne GA, Lisi M, McLean A, Zheng R, Groen II, Dekker TM. Perceptual reorganization from prior knowledge emerges late in childhood. iScience 2024; 27:108787. [PMID: 38303715 PMCID: PMC10831247 DOI: 10.1016/j.isci.2024.108787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/05/2023] [Accepted: 01/02/2024] [Indexed: 02/03/2024] Open
Abstract
Human vision relies heavily on prior knowledge. Here, we show for the first time that prior-knowledge-induced reshaping of visual inputs emerges gradually in late childhood. To isolate the effects of prior knowledge on perception, we presented 4- to 12-year-olds and adults with two-tone images - hard-to-recognize degraded photos. In adults, seeing the original photo triggers perceptual reorganization, causing mandatory recognition of the two-tone version. This involves top-down signaling from higher-order brain areas to early visual cortex. We show that children younger than 7-9 years do not experience this knowledge-guided shift, despite viewing the original photo immediately before each two-tone. To assess computations underlying this development, we compared human performance to three neural networks with varying architectures. The best-performing model behaved much like 4- to 5-year-olds, displaying feature-based rather than holistic processing strategies. The reconciliation of prior knowledge with sensory input undergoes a striking age-related shift, which may underpin the development of many perceptual abilities.
Collapse
Affiliation(s)
- Georgia A. Milne
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
- Division of Psychology and Language Sciences, University College London, WC1H 0AP London, UK
| | - Matteo Lisi
- Department of Psychology, Royal Holloway, University of London, TW20 0EX London, UK
| | - Aisha McLean
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
| | - Rosie Zheng
- Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, the Netherlands
| | - Iris I.A. Groen
- Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, the Netherlands
| | - Tessa M. Dekker
- Institute of Ophthalmology, University College London, EC1V 9EL London, UK
- Division of Psychology and Language Sciences, University College London, WC1H 0AP London, UK
| |
Collapse
|
3
|
Karapetian A, Boyanova A, Pandaram M, Obermayer K, Kietzmann TC, Cichy RM. Empirically Identifying and Computationally Modeling the Brain-Behavior Relationship for Human Scene Categorization. J Cogn Neurosci 2023; 35:1879-1897. [PMID: 37590093 PMCID: PMC10586810 DOI: 10.1162/jocn_a_02043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Humans effortlessly make quick and accurate perceptual decisions about the nature of their immediate visual environment, such as the category of the scene they face. Previous research has revealed a rich set of cortical representations potentially underlying this feat. However, it remains unknown which of these representations are suitably formatted for decision-making. Here, we approached this question empirically and computationally, using neuroimaging and computational modeling. For the empirical part, we collected EEG data and RTs from human participants during a scene categorization task (natural vs. man-made). We then related EEG data to behavior to behavior using a multivariate extension of signal detection theory. We observed a correlation between neural data and behavior specifically between ∼100 msec and ∼200 msec after stimulus onset, suggesting that the neural scene representations in this time period are suitably formatted for decision-making. For the computational part, we evaluated a recurrent convolutional neural network (RCNN) as a model of brain and behavior. Unifying our previous observations in an image-computable model, the RCNN predicted well the neural representations, the behavioral scene categorization data, as well as the relationship between them. Our results identify and computationally characterize the neural and behavioral correlates of scene categorization in humans.
Collapse
Affiliation(s)
- Agnessa Karapetian
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | | | | | - Klaus Obermayer
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Technische Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| | | | - Radoslaw M Cichy
- Freie Universität Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
- Humboldt-Universität zu Berlin, Germany
| |
Collapse
|
4
|
Orima T, Motoyoshi I. Spatiotemporal cortical dynamics for visual scene processing as revealed by EEG decoding. Front Neurosci 2023; 17:1167719. [PMID: 38027518 PMCID: PMC10646306 DOI: 10.3389/fnins.2023.1167719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
The human visual system rapidly recognizes the categories and global properties of complex natural scenes. The present study investigated the spatiotemporal dynamics of neural signals involved in visual scene processing using electroencephalography (EEG) decoding. We recorded visual evoked potentials from 11 human observers for 232 natural scenes, each of which belonged to one of 13 natural scene categories (e.g., a bedroom or open country) and had three global properties (naturalness, openness, and roughness). We trained a deep convolutional classification model of the natural scene categories and global properties using EEGNet. Having confirmed that the model successfully classified natural scene categories and the three global properties, we applied Grad-CAM to the EEGNet model to visualize the EEG channels and time points that contributed to the classification. The analysis showed that EEG signals in the occipital electrodes at short latencies (approximately 80 ~ ms) contributed to the classifications, whereas those in the frontal electrodes at relatively long latencies (200 ~ ms) contributed to the classification of naturalness and the individual scene category. These results suggest that different global properties are encoded in different cortical areas and with different timings, and that the combination of the EEGNet model and Grad-CAM can be a tool to investigate both temporal and spatial distribution of natural scene processing in the human brain.
Collapse
Affiliation(s)
- Taiki Orima
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
- Japan Society for the Promotion of Science, Tokyo, Japan
| | - Isamu Motoyoshi
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
5
|
Wang T, Zhao Y, Jia J. Nonadditive integration of visual information in ensemble processing. iScience 2023; 26:107988. [PMID: 37822498 PMCID: PMC10562869 DOI: 10.1016/j.isci.2023.107988] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 09/03/2023] [Accepted: 09/16/2023] [Indexed: 10/13/2023] Open
Abstract
Statistically summarizing information from a stimulus array into an ensemble representation (e.g., the mean) improves the efficiency of visual processing. However, little is known about how the brain computes the ensemble statistics. Here, we propose that ensemble processing is realized by nonadditive integration, rather than linear averaging, of individual items. We used a linear regression model approach to extract EEG responses to three levels of information: the individual items, their local interactions, and their global interaction. The local and global interactions, representing nonadditive integration of individual items, elicited rapid and independent neural responses. Critically, only the neural representation of the global interaction predicted the precision of the ensemble perception at the behavioral level. Furthermore, spreading attention over the global pattern to enhance ensemble processing directly promoted rapid neural representation of the global interaction. Taken together, these findings advocate a global, nonadditive mechanism of ensemble processing in the brain.
Collapse
Affiliation(s)
- Tongyu Wang
- Department of Psychology, Hangzhou Normal University, Hangzhou 311121, Zhejiang, China
| | - Yuqing Zhao
- Department of Psychology, Hangzhou Normal University, Hangzhou 311121, Zhejiang, China
| | - Jianrong Jia
- Department of Psychology, Hangzhou Normal University, Hangzhou 311121, Zhejiang, China
- Zhejiang Philosophy and Social Science Laboratory for Research in Early Development and Childcare, Hangzhou Normal University, Hangzhou 311121, Zhejiang, China
| |
Collapse
|
6
|
Wencheng W, Ge Y, Zuo Z, Chen L, Qin X, Zuxiang L. Visual number sense for real-world scenes shared by deep neural networks and humans. Heliyon 2023; 9:e18517. [PMID: 37560656 PMCID: PMC10407052 DOI: 10.1016/j.heliyon.2023.e18517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023] Open
Abstract
Recently, visual number sense has been identified from deep neural networks (DNNs). However, whether DNNs have the same capacity for real-world scenes, rather than the simple geometric figures that are often tested, is unclear. In this study, we explore the number perception of scenes using AlexNet and find that numerosity can be represented by the pattern of group activation of the category layer units. The global activation of these units increases with the number of objects in the scene, and the variations in their activation decrease accordingly. By decoding the numerosity from this pattern, we reveal that the embedding coefficient of a scene determines the likelihood of potential objects to contribute to numerical perception. This was demonstrated by the more optimized performance for pictures with relatively high embedding coefficients in both DNNs and humans. This study for the first time shows that a distinct feature in visual environments, revealed by DNNs, can modulate human perception, supported by a group-coding mechanism.
Collapse
Affiliation(s)
- Wu Wencheng
- AHU-IAI AI Joint Laboratory, Anhui University, Hefei, 230601, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
| | - Yingxi Ge
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Zhentao Zuo
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Lin Chen
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Xu Qin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Hefei, 230601, China
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei, 230601, China
- School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Liu Zuxiang
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing, 100101, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
7
|
Woo T, Liang X, Evans DA, Fernandez O, Kretschmer F, Reiter S, Laurent G. The dynamics of pattern matching in camouflaging cuttlefish. Nature 2023:10.1038/s41586-023-06259-2. [PMID: 37380772 PMCID: PMC10322717 DOI: 10.1038/s41586-023-06259-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 05/22/2023] [Indexed: 06/30/2023]
Abstract
Many cephalopods escape detection using camouflage1. This behaviour relies on a visual assessment of the surroundings, on an interpretation of visual-texture statistics2-4 and on matching these statistics using millions of skin chromatophores that are controlled by motoneurons located in the brain5-7. Analysis of cuttlefish images proposed that camouflage patterns are low dimensional and categorizable into three pattern classes, built from a small repertoire of components8-11. Behavioural experiments also indicated that, although camouflage requires vision, its execution does not require feedback5,12,13, suggesting that motion within skin-pattern space is stereotyped and lacks the possibility of correction. Here, using quantitative methods14, we studied camouflage in the cuttlefish Sepia officinalis as behavioural motion towards background matching in skin-pattern space. An analysis of hundreds of thousands of images over natural and artificial backgrounds revealed that the space of skin patterns is high-dimensional and that pattern matching is not stereotyped-each search meanders through skin-pattern space, decelerating and accelerating repeatedly before stabilizing. Chromatophores could be grouped into pattern components on the basis of their covariation during camouflaging. These components varied in shapes and sizes, and overlay one another. However, their identities varied even across transitions between identical skin-pattern pairs, indicating flexibility of implementation and absence of stereotypy. Components could also be differentiated by their sensitivity to spatial frequency. Finally, we compared camouflage to blanching, a skin-lightening reaction to threatening stimuli. Pattern motion during blanching was direct and fast, consistent with open-loop motion in low-dimensional pattern space, in contrast to that observed during camouflage.
Collapse
Affiliation(s)
- Theodosia Woo
- Max Planck Institute for Brain Research, Frankfurt, Germany
| | - Xitong Liang
- Max Planck Institute for Brain Research, Frankfurt, Germany
- School of Life Sciences, Peking University, Beijing, China
| | | | | | | | - Sam Reiter
- Max Planck Institute for Brain Research, Frankfurt, Germany.
- Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Gilles Laurent
- Max Planck Institute for Brain Research, Frankfurt, Germany.
| |
Collapse
|
8
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
9
|
Effects of Natural Scene Inversion on Visual-evoked Brain Potentials and Pupillary Responses: A Matter of Effortful Processing of Unfamiliar Configurations. Neuroscience 2023; 509:201-209. [PMID: 36462569 DOI: 10.1016/j.neuroscience.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/17/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022]
Abstract
The inversion of a picture of a face hampers the accuracy and speed at which observers can perceptually process it. Event-related potentials and pupillary responses, successfully used as biomarkers of face inversion in the past, suggest that the perception of visual features, that are organized in an unfamiliar manner, recruits demanding additional processes. However, it remains unclear whether such inversion effects generalize beyond face stimuli and whether indeed more mental effort is needed to process inverted images. Here we aimed to study the effects of natural scene inversion on visual evoked potentials and pupil dilations. We simultaneously measured responses of 47 human participants to presentations of images showing upright or inverted natural scenes. For inverted scenes, we observed relatively stronger occipito-temporo-parietal N1 peak amplitudes and larger pupil dilations (on top of an initial orienting response) than for upright scenes. This study revealed neural and physiological markers of natural scene inversion that are in line with inversion effects of other stimulus types and demonstrates the robustness and generalizability of the phenomenon that unfamiliar configurations of visual content require increased processing effort.
Collapse
|
10
|
Ellmore TM, Reichert Plaska C, Ng K, Mei N. Visual continuous recognition reveals behavioral and neural differences for short- and long-term scene memory. Front Behav Neurosci 2022; 16:958609. [PMID: 36187377 PMCID: PMC9520405 DOI: 10.3389/fnbeh.2022.958609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/24/2022] [Indexed: 11/23/2022] Open
Abstract
Humans have a remarkably high capacity and long duration memory for complex scenes. Previous research documents the neural substrates that allow for efficient categorization of scenes from other complex stimuli like objects and faces, but the spatiotemporal neural dynamics underlying scene memory at timescales relevant to working and longer-term memory are less well understood. In the present study, we used high density EEG during a visual continuous recognition task in which new, old, and scrambled scenes consisting of color outdoor photographs were presented at an average rate 0.26 Hz. Old scenes were single repeated presentations occurring within either a short-term (< 20 s) or longer-term intervals of between 30 s and 3 min or 4 and 10 min. Overall recognition was far above chance, with better performance at shorter- than longer-term intervals. Sensor-level ANOVA and post hoc pairwise comparisons of event related potentials (ERPs) revealed three main findings: (1) occipital and parietal amplitudes distinguishing new and old from scrambled scenes; (2) frontal amplitudes distinguishing old from new scenes with a central positivity highest for hits compared to misses, false alarms and correct rejections; and (3) frontal and parietal changes from ∼300 to ∼600 ms distinguishing among old scenes previously encountered at short- and longer-term retention intervals. These findings reveal how distributed spatiotemporal neural changes evolve to support short- and longer-term recognition of complex scenes.
Collapse
Affiliation(s)
- Timothy M. Ellmore
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
- Behavioral and Cognitive Neuroscience, The Graduate Center of the City University of New York, New York, NY, United States
| | - Chelsea Reichert Plaska
- Behavioral and Cognitive Neuroscience, The Graduate Center of the City University of New York, New York, NY, United States
| | - Kenneth Ng
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
| | - Ning Mei
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
| |
Collapse
|
11
|
Ramanoël S, Durteste M, Bizeul A, Ozier‐Lafontaine A, Bécu M, Sahel J, Habas C, Arleo A. Selective neural coding of object, feature, and geometry spatial cues in humans. Hum Brain Mapp 2022; 43:5281-5295. [PMID: 35776524 PMCID: PMC9812241 DOI: 10.1002/hbm.26002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/02/2022] [Accepted: 06/20/2022] [Indexed: 01/15/2023] Open
Abstract
Orienting in space requires the processing of visual spatial cues. The dominant hypothesis about the brain structures mediating the coding of spatial cues stipulates the existence of a hippocampal-dependent system for the representation of geometry and a striatal-dependent system for the representation of landmarks. However, this dual-system hypothesis is based on paradigms that presented spatial cues conveying either conflicting or ambiguous spatial information and that used the term landmark to refer to both discrete three-dimensional objects and wall features. Here, we test the hypothesis of complex activation patterns in the hippocampus and the striatum during visual coding. We also postulate that object-based and feature-based navigation are not equivalent instances of landmark-based navigation. We examined how the neural networks associated with geometry-, object-, and feature-based spatial navigation compared with a control condition in a two-choice behavioral paradigm using fMRI. We showed that the hippocampus was involved in all three types of cue-based navigation, whereas the striatum was more strongly recruited in the presence of geometric cues than object or feature cues. We also found that unique, specific neural signatures were associated with each spatial cue. Object-based navigation elicited a widespread pattern of activity in temporal and occipital regions relative to feature-based navigation. These findings extend the current view of a dual, juxtaposed hippocampal-striatal system for visual spatial coding in humans. They also provide novel insights into the neural networks mediating object versus feature spatial coding, suggesting a need to distinguish these two types of landmarks in the context of human navigation.
Collapse
Affiliation(s)
- Stephen Ramanoël
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance,Université Côte d'Azur, LAMHESSNiceFrance
| | - Marion Durteste
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance
| | - Alice Bizeul
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance
| | | | - Marcia Bécu
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance
| | - José‐Alain Sahel
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance,CHNO des Quinze‐Vingts, INSERM‐DGOS CIC 1423ParisFrance,Fondation Ophtalmologique RothschildParisFrance,Department of OphtalmologyThe University of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| | - Christophe Habas
- CHNO des Quinze‐Vingts, INSERM‐DGOS CIC 1423ParisFrance,Université Versailles St Quentin en YvelineParisFrance
| | - Angelo Arleo
- Sorbonne Université, INSERM, CNRS, Institut de la VisionParisFrance
| |
Collapse
|
12
|
Functional recursion of orientation cues in figure-ground separation. Vision Res 2022; 197:108047. [PMID: 35691090 PMCID: PMC9262819 DOI: 10.1016/j.visres.2022.108047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 03/16/2022] [Accepted: 03/23/2022] [Indexed: 11/23/2022]
Abstract
Visual texture is an important cue to figure-ground organization. While processing of texture differences is a prerequisite for the use of this cue to extract figure-ground organization, these stages are distinct processes. One potential indicator of this distinction is the possibility that texture statistics play a different role in the figure vs. in the ground. To determine whether this is the case, we probed figure-ground processing with a family of local image statistics that specified textures that varied in the strength and spatial scale of structure, and the extent to which features are oriented. For image statistics that generated approximately isotropic textures, the threshold for identification of figure-ground structure was determined by the difference in correlation strength in figure vs. ground, independent of whether the correlations were present in figure, ground, or both. However, for image statistics with strong orientation content, thresholds were up to two times higher for correlations in the ground, vs. the figure. This held equally for texture-defined objects with convex or concave boundaries, indicating that these threshold differences are driven by border ownership, not boundary shape. Similar threshold differences were found for presentation times ranging from 125 to 500 ms. These findings identify a qualitative difference in how texture is used for figure-ground analysis, vs. texture discrimination. Additionally, it reveals a functional recursion: texture differences are needed to identify tentative boundaries and consequent scene organization into figure and ground, but then scene organization modifies sensitivity to texture differences according to the figure-ground assignment.
Collapse
|
13
|
Harel A, Nador JD, Bonner MF, Epstein RA. Early Electrophysiological Markers of Navigational Affordances in Scenes. J Cogn Neurosci 2021; 34:397-410. [PMID: 35015877 DOI: 10.1162/jocn_a_01810] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Scene perception and spatial navigation are interdependent cognitive functions, and there is increasing evidence that cortical areas that process perceptual scene properties also carry information about the potential for navigation in the environment (navigational affordances). However, the temporal stages by which visual information is transformed into navigationally relevant information are not yet known. We hypothesized that navigational affordances are encoded during perceptual processing and therefore should modulate early visually evoked ERPs, especially the scene-selective P2 component. To test this idea, we recorded ERPs from participants while they passively viewed computer-generated room scenes matched in visual complexity. By simply changing the number of doors (no doors, 1 door, 2 doors, 3 doors), we were able to systematically vary the number of pathways that afford movement in the local environment, while keeping the overall size and shape of the environment constant. We found that rooms with no doors evoked a higher P2 response than rooms with three doors, consistent with prior research reporting higher P2 amplitude to closed relative to open scenes. Moreover, we found P2 amplitude scaled linearly with the number of doors in the scenes. Navigability effects on the ERP waveform were also observed in a multivariate analysis, which showed significant decoding of the number of doors and their location at earlier time windows. Together, our results suggest that navigational affordances are represented in the early stages of scene perception. This complements research showing that the occipital place area automatically encodes the structure of navigable space and strengthens the link between scene perception and navigation.
Collapse
|
14
|
Cermeño-Aínsa S. The perception/cognition distincton: Challenging the representational account. Conscious Cogn 2021; 95:103216. [PMID: 34649065 DOI: 10.1016/j.concog.2021.103216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 11/30/2022]
Abstract
A central goal for cognitive science and philosophy of mind is to distinguish between perception and cognition. The representational approach has emerged as a prominent candidate to draw such a distinction. The idea is that perception and cognition differ in the content and the format in which the information is represented -just as perceptual representations are nonconceptual in content and iconic in format, cognitive representations are conceptual in content and discursive in format. This paper argues against this view. I argue that both perception and cognition can use conceptual and nonconceptual contents and be vehiculated in iconic and discursive formats. If correct, the representational strategy to distinguish perception from cognition fails.
Collapse
Affiliation(s)
- Sergio Cermeño-Aínsa
- Autonomous University of Barcelona, Cognitive Science and Language (CCiL), Edifici B, Campus de la UAB, 08193 Bellaterra, (Cerdanyola del Vallès), Spain.
| |
Collapse
|
15
|
Hansen BC, Greene MR, Field DJ. Dynamic Electrode-to-Image (DETI) mapping reveals the human brain's spatiotemporal code of visual information. PLoS Comput Biol 2021; 17:e1009456. [PMID: 34570753 PMCID: PMC8496831 DOI: 10.1371/journal.pcbi.1009456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 10/07/2021] [Accepted: 09/16/2021] [Indexed: 11/18/2022] Open
Abstract
A number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI’s poor temporal resolution. In this study, we leveraged the high temporal resolution of EEG to develop an encoding technique based on the distribution of responses generated by a population of real-world scenes. This approach maps neural signals to each pixel within a given image and reveals location-specific transformations of the visual code, providing a spatiotemporal signature for the image at each electrode. Our analyses of the mapping results revealed that scenes undergo a series of nonuniform transformations that prioritize different spatial frequencies at different regions of scenes over time. This mapping technique offers a potential avenue for future studies to explore how dynamic feedforward and recurrent processes inform and refine high-level representations of our visual world. The visual information that we sample from our environment undergoes a series of neural modifications, with each modification state (or visual code) consisting of a unique distribution of responses across neurons along the visual pathway. However, current noninvasive neuroimaging techniques provide an account of that code that is coarse with respect to time or space. Here, we present dynamic electrode-to-image (DETI) mapping, an analysis technique that capitalizes on the high temporal resolution of EEG to map neural signals to each pixel within a given image to reveal location-specific modifications of the visual code. The DETI technique reveals maps of features that are associated with the neural signal at each pixel and at each time point. DETI mapping shows that real-world scenes undergo a series of nonuniform modifications over both space and time. Specifically, we find that the visual code varies in a location-specific manner, likely reflecting that neural processing prioritizes different features at different image locations over time. DETI mapping therefore offers a potential avenue for future studies to explore how each modification state informs and refines the conceptual meaning of our visual world.
Collapse
Affiliation(s)
- Bruce C. Hansen
- Colgate University, Department of Psychological & Brain Sciences, Neuroscience Program, Hamilton New York, United States of America
- * E-mail:
| | - Michelle R. Greene
- Bates College, Neuroscience Program, Lewiston, Maine, United States of America
| | - David J. Field
- Cornell University, Department of Psychology, Ithaca, New York, United States of America
| |
Collapse
|
16
|
Curvature and Entropy Statistics-Based Blind Multi-Exposure Fusion Image Quality Assessment. Symmetry (Basel) 2021. [DOI: 10.3390/sym13081446] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The multi-exposure fusion (MEF) technique provides humans a new opportunity for natural scene representation, and the related quality assessment issues are urgent to be considered for validating the effectiveness of these techniques. In this paper, a curvature and entropy statistics-based blind MEF image quality assessment (CE-BMIQA) method is proposed to perceive the quality degradation objectively. The transformation process from multiple images with different exposure levels to the final MEF image leads to the loss of structure and detail information, so that the related curvature statistics features and entropy statistics features are utilized to portray the above distortion presentation. The former features are extracted from the histogram statistics of surface type map calculated by mean curvature and Gaussian curvature of MEF image. Moreover, contrast energy weighting is attached to consider the contrast variation of the MEF image. The latter features refer to spatial entropy and spectral entropy. All extracted features based on a multi-scale scheme are aggregated by training the quality regression model via random forest. Since the MEF image and its feature representation are spatially symmetric in physics, the final prediction quality is symmetric to and representative of the image distortion. Experimental results on a public MEF image database demonstrate that the proposed CE-BMIQA method achieves more outstanding performance than the state-of-the-art blind image quality assessment ones.
Collapse
|
17
|
Seijdel N, Loke J, van de Klundert R, van der Meer M, Quispel E, van Gaal S, de Haan EHF, Scholte HS. On the Necessity of Recurrent Processing during Object Recognition: It Depends on the Need for Scene Segmentation. J Neurosci 2021; 41:6281-6289. [PMID: 34088797 PMCID: PMC8287993 DOI: 10.1523/jneurosci.2851-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/11/2021] [Accepted: 05/13/2021] [Indexed: 11/21/2022] Open
Abstract
Although feedforward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation, extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 human participants (female and male), we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds, with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials between conditions around time points beyond feedforward activity, and exploratory object decoding analyses based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, deep convolutional neural network performance confirmed this interpretation. Feedforward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared with recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes.SIGNIFICANCE STATEMENT The incredible speed of object recognition suggests that it relies purely on a fast feedforward buildup of perceptual activity. However, this view is contradicted by studies showing that disruption of recurrent processing leads to decreased object recognition performance. Here, we resolve this issue by showing that how object recognition is resolved and whether recurrent processing is crucial depends on the context in which it is presented. For objects presented in isolation or in simple environments, feedforward activity could be sufficient for successful object recognition. However, when the environment is more complex, additional processing seems necessary to select the elements that belong to the object and by that segregate them from the background.
Collapse
Affiliation(s)
- Noor Seijdel
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Jessica Loke
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Ron van de Klundert
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Matthew van der Meer
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Eva Quispel
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Simon van Gaal
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - Edward H F de Haan
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| | - H Steven Scholte
- Department of Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Center, University of Amsterdam, 1018 WS Amsterdam, The Netherlands
| |
Collapse
|
18
|
Hunt C, Meinhardt G. Synergy of spatial frequency and orientation bandwidth in texture segregation. J Vis 2021; 21:5. [PMID: 33560290 PMCID: PMC7873498 DOI: 10.1167/jov.21.2.5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 12/23/2020] [Indexed: 11/28/2022] Open
Abstract
Defining target textures by increased bandwidths in spatial frequency and orientation, we observed strong cue combination effects in a combined texture figure detection and discrimination task. Performance for double-cue targets was better than predicted by independent processing of either cue and even better than predicted from linear cue integration. Application of a texture-processing model revealed that the oversummative cue combination effect is captured by calculating a low-level summary statistic (\(\Delta CE_m\)), which describes the differential contrast energy to target and reference textures, from multiple scales and orientations, and integrating this statistic across channels with a winner-take-all rule. Modeling detection performance using a signal detection theory framework showed that the observers' sensitivity to single-cue and double-cue texture targets, measured in \(d^{\prime }\) units, could be reproduced with plausible settings for filter and noise parameters. These results challenge models assuming separate channeling of elementary features and their later integration, since oversummative cue combination effects appear as an inherent property of local energy mechanisms, at least for spatial frequency and orientation bandwidth-modulated textures.
Collapse
Affiliation(s)
- Cordula Hunt
- Department of Psychology, Methods Section, Johannes Gutenberg-Universität, Mainz, Germany
| | - Günter Meinhardt
- Department of Psychology, Methods Section, Johannes Gutenberg-Universität, Mainz, Germany
| |
Collapse
|
19
|
Kaiser D, Inciuraite G, Cichy RM. Rapid contextualization of fragmented scene information in the human visual system. Neuroimage 2020; 219:117045. [PMID: 32540354 DOI: 10.1016/j.neuroimage.2020.117045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 04/24/2020] [Accepted: 06/09/2020] [Indexed: 10/24/2022] Open
Abstract
Real-world environments are extremely rich in visual information. At any given moment in time, only a fraction of this information is available to the eyes and the brain, rendering naturalistic vision a collection of incomplete snapshots. Previous research suggests that in order to successfully contextualize this fragmented information, the visual system sorts inputs according to spatial schemata, that is knowledge about the typical composition of the visual world. Here, we used a large set of 840 different natural scene fragments to investigate whether this sorting mechanism can operate across the diverse visual environments encountered during real-world vision. We recorded brain activity using electroencephalography (EEG) while participants viewed incomplete scene fragments at fixation. Using representational similarity analysis on the EEG data, we tracked the fragments' cortical representations across time. We found that the fragments' typical vertical location within the environment (top or bottom) predicted their cortical representations, indexing a sorting of information according to spatial schemata. The fragments' cortical representations were most strongly organized by their vertical location at around 200 ms after image onset, suggesting rapid perceptual sorting of information according to spatial schemata. In control analyses, we show that this sorting is flexible with respect to visual features: it is neither explained by commonalities between visually similar indoor and outdoor scenes, nor by the feature organization emerging from a deep neural network trained on scene categorization. Demonstrating such a flexible sorting across a wide range of visually diverse scenes suggests a contextualization mechanism suitable for complex and variable real-world environments.
Collapse
Affiliation(s)
- Daniel Kaiser
- Department of Psychology, University of York, York, UK.
| | - Gabriele Inciuraite
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| |
Collapse
|
20
|
Robust Single-Image Haze Removal Using Optimal Transmission Map and Adaptive Atmospheric Light. REMOTE SENSING 2020. [DOI: 10.3390/rs12142233] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Haze removal is an ill-posed problem that has attracted much scientific interest due to its various practical applications. Existing methods are usually founded upon various priors; consequently, they demonstrate poor performance in circumstances in which the priors do not hold. By examining hazy and haze-free images, we determined that haze density is highly correlated with image features such as contrast energy, entropy, and sharpness. Then, we proposed an iterative algorithm to accurately estimate the extinction coefficient of the transmission medium via direct optimization of the objective function taking into account all of the features. Furthermore, to address the heterogeneity of the lightness, we devised adaptive atmospheric light to replace the homogeneous light generally used in haze removal. A comparative evaluation against other state-of-the-art approaches demonstrated the superiority of the proposed method. The source code and data sets used in this paper are made publicly available to facilitate further research.
Collapse
|
21
|
Seijdel N, Tsakmakidis N, de Haan EHF, Bohte SM, Scholte HS. Depth in convolutional neural networks solves scene segmentation. PLoS Comput Biol 2020; 16:e1008022. [PMID: 32706770 PMCID: PMC7406083 DOI: 10.1371/journal.pcbi.1008022] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/05/2020] [Accepted: 06/06/2020] [Indexed: 01/25/2023] Open
Abstract
Feed-forward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations ('routines') that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or "binding" features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.
Collapse
Affiliation(s)
- Noor Seijdel
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Brain & Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Nikos Tsakmakidis
- Machine Learning Group, Centrum Wiskunde & Informatica, Amsterdam, the Netherlands
| | - Edward H. F. de Haan
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Brain & Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Sander M. Bohte
- Machine Learning Group, Centrum Wiskunde & Informatica, Amsterdam, the Netherlands
| | - H. Steven Scholte
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Brain & Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
22
|
Seijdel N, Jahfari S, Groen IIA, Scholte HS. Low-level image statistics in natural scenes influence perceptual decision-making. Sci Rep 2020; 10:10573. [PMID: 32601499 PMCID: PMC7324621 DOI: 10.1038/s41598-020-67661-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 06/08/2020] [Indexed: 11/10/2022] Open
Abstract
A fundamental component of interacting with our environment is gathering and interpretation of sensory information. When investigating how perceptual information influences decision-making, most researchers have relied on manipulated or unnatural information as perceptual input, resulting in findings that may not generalize to real-world scenes. Unlike simplified, artificial stimuli, real-world scenes contain low-level regularities that are informative about the structural complexity, which the brain could exploit. In this study, participants performed an animal detection task on low, medium or high complexity scenes as determined by two biologically plausible natural scene statistics, contrast energy (CE) or spatial coherence (SC). In experiment 1, stimuli were sampled such that CE and SC both influenced scene complexity. Diffusion modelling showed that the speed of information processing was affected by low-level scene complexity. Experiment 2a/b refined these observations by showing how isolated manipulation of SC resulted in weaker but comparable effects, with an additional change in response boundary, whereas manipulation of only CE had no effect. Overall, performance was best for scenes with intermediate complexity. Our systematic definition quantifies how natural scene complexity interacts with decision-making. We speculate that CE and SC serve as an indication to adjust perceptual decision-making based on the complexity of the input.
Collapse
Affiliation(s)
- Noor Seijdel
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands. .,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands.
| | - Sara Jahfari
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Spinoza Centre for Neuroimaging, Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, The Netherlands
| | - Iris I A Groen
- Department of Psychology, New York University, New York, USA
| | - H Steven Scholte
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
23
|
Prediction of $${\mathrm{PM}}_{2.5}$$ concentration based on multi-source data and self-organizing fuzzy neural network. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-2380-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
24
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|
25
|
Scene Representations Conveyed by Cortical Feedback to Early Visual Cortex Can Be Described by Line Drawings. J Neurosci 2019; 39:9410-9423. [PMID: 31611306 PMCID: PMC6867807 DOI: 10.1523/jneurosci.0852-19.2019] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 08/27/2019] [Accepted: 09/23/2019] [Indexed: 11/25/2022] Open
Abstract
Human behavior is dependent on the ability of neuronal circuits to predict the outside world. Neuronal circuits in early visual areas make these predictions based on internal models that are delivered via non-feedforward connections. Despite our extensive knowledge of the feedforward sensory features that drive cortical neurons, we have a limited grasp on the structure of the brain's internal models. Progress in neuroscience therefore depends on our ability to replicate the models that the brain creates internally. Here we record human fMRI data while presenting partially occluded visual scenes. Visual occlusion allows us to experimentally control sensory input to subregions of visual cortex while internal models continue to influence activity in these regions. Because the observed activity is dependent on internal models, but not on sensory input, we have the opportunity to map visual features conveyed by the brain's internal models. Our results show that activity related to internal models in early visual cortex are more related to scene-specific features than to categorical or depth features. We further demonstrate that behavioral line drawings provide a good description of internal model structure representing scene-specific features. These findings extend our understanding of internal models, showing that line drawings provide a window into our brains' internal models of vision. SIGNIFICANCE STATEMENT We find that fMRI activity patterns corresponding to occluded visual information in early visual cortex fill in scene-specific features. Line drawings of the missing scene information correlate with our recorded activity patterns, and thus to internal models. Despite our extensive knowledge of the sensory features that drive cortical neurons, we have a limited grasp on the structure of our brains' internal models. These results therefore constitute an advance to the field of neuroscience by extending our knowledge about the models that our brains construct to efficiently represent and predict the world. Moreover, they link a behavioral measure to these internal models, which play an active role in many components of human behavior, including visual predictions, action planning, and decision making.
Collapse
|
26
|
Hansen BC, Field DJ, Greene MR, Olson C, Miskovic V. Towards a state-space geometry of neural responses to natural scenes: A steady-state approach. Neuroimage 2019; 201:116027. [PMID: 31325643 DOI: 10.1016/j.neuroimage.2019.116027] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/13/2019] [Accepted: 07/16/2019] [Indexed: 10/26/2022] Open
Abstract
Our understanding of information processing by the mammalian visual system has come through a variety of techniques ranging from psychophysics and fMRI to single unit recording and EEG. Each technique provides unique insights into the processing framework of the early visual system. Here, we focus on the nature of the information that is carried by steady state visual evoked potentials (SSVEPs). To study the information provided by SSVEPs, we presented human participants with a population of natural scenes and measured the relative SSVEP response. Rather than focus on particular features of this signal, we focused on the full state-space of possible responses and investigated how the evoked responses are mapped onto this space. Our results show that it is possible to map the relatively high-dimensional signal carried by SSVEPs onto a 2-dimensional space with little loss. We also show that a simple biologically plausible model can account for a high proportion of the explainable variance (~73%) in that space. Finally, we describe a technique for measuring the mutual information that is available about images from SSVEPs. The techniques introduced here represent a new approach to understanding the nature of the information carried by SSVEPs. Crucially, this approach is general and can provide a means of comparing results across different neural recording methods. Altogether, our study sheds light on the encoding principles of early vision and provides a much needed reference point for understanding subsequent transformations of the early visual response space to deeper knowledge structures that link different visual environments.
Collapse
Affiliation(s)
- Bruce C Hansen
- Colgate University, Department of Psychological & Brain Sciences, Neuroscience Program, Hamilton, NY, USA.
| | - David J Field
- Cornell University, Department of Psychology, Ithaca, NY, USA
| | | | - Cassady Olson
- Colgate University, Department of Psychological & Brain Sciences, Neuroscience Program, Hamilton, NY, USA; Current Address: University of Chicago, Committee on Computational Neuroscience, Chicago, IL, USA
| | - Vladimir Miskovic
- State University of New York at Binghamton, Department of Psychology, Binghamton, NY, USA
| |
Collapse
|
27
|
Abstract
Humans are remarkably adept at perceiving and understanding complex real-world scenes. Uncovering the neural basis of this ability is an important goal of vision science. Neuroimaging studies have identified three cortical regions that respond selectively to scenes: parahippocampal place area, retrosplenial complex/medial place area, and occipital place area. Here, we review what is known about the visual and functional properties of these brain areas. Scene-selective regions exhibit retinotopic properties and sensitivity to low-level visual features that are characteristic of scenes. They also mediate higher-level representations of layout, objects, and surface properties that allow individual scenes to be recognized and their spatial structure ascertained. Challenges for the future include developing computational models of information processing in scene regions, investigating how these regions support scene perception under ecologically realistic conditions, and understanding how they operate in the context of larger brain networks.
Collapse
Affiliation(s)
- Russell A Epstein
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
| | - Chris I Baker
- Section on Learning and Plasticity, Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
28
|
De Cesarei A, Cavicchi S, Micucci A, Codispoti M. Categorization Goals Modulate the Use of Natural Scene Statistics. J Cogn Neurosci 2019; 31:109-125. [DOI: 10.1162/jocn_a_01333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Understanding natural scenes involves the contribution of bottom–up analysis and top–down modulatory processes. However, the interaction of these processes during the categorization of natural scenes is not well understood. In the current study, we approached this issue using ERPs and behavioral and computational data. We presented pictures of natural scenes and asked participants to categorize them in response to different questions (Is it an animal/vehicle? Is it indoors/outdoors? Are there one/two foreground elements?). ERPs for target scenes requiring a “yes” response began to differ from those of nontarget scenes, beginning at 250 msec from picture onset, and this ERP difference was unmodulated by the categorization questions. Earlier ERPs showed category-specific differences (e.g., between animals and vehicles), which were associated with the processing of scene statistics. From 180 msec after scene onset, these category-specific ERP differences were modulated by the categorization question that was asked. Categorization goals do not modulate only later stages associated with target/nontarget decision but also earlier perceptual stages, which are involved in the processing of scene statistics.
Collapse
|
29
|
Greene MR. The information content of scene categories. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
30
|
Groen IIA, Jahfari S, Seijdel N, Ghebreab S, Lamme VAF, Scholte HS. Scene complexity modulates degree of feedback activity during object detection in natural scenes. PLoS Comput Biol 2018; 14:e1006690. [PMID: 30596644 PMCID: PMC6329519 DOI: 10.1371/journal.pcbi.1006690] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Revised: 01/11/2019] [Accepted: 12/01/2018] [Indexed: 02/06/2023] Open
Abstract
Selective brain responses to objects arise within a few hundreds of milliseconds of neural processing, suggesting that visual object recognition is mediated by rapid feed-forward activations. Yet disruption of neural responses in early visual cortex beyond feed-forward processing stages affects object recognition performance. Here, we unite these discrepant findings by reporting that object recognition involves enhanced feedback activity (recurrent processing within early visual cortex) when target objects are embedded in natural scenes that are characterized by high complexity. Human participants performed an animal target detection task on natural scenes with low, medium or high complexity as determined by a computational model of low-level contrast statistics. Three converging lines of evidence indicate that feedback was selectively enhanced for high complexity scenes. First, functional magnetic resonance imaging (fMRI) activity in early visual cortex (V1) was enhanced for target objects in scenes with high, but not low or medium complexity. Second, event-related potentials (ERPs) evoked by target objects were selectively enhanced at feedback stages of visual processing (from ~220 ms onwards) for high complexity scenes only. Third, behavioral performance for high complexity scenes deteriorated when participants were pressed for time and thus less able to incorporate the feedback activity. Modeling of the reaction time distributions using drift diffusion revealed that object information accumulated more slowly for high complexity scenes, with evidence accumulation being coupled to trial-to-trial variation in the EEG feedback response. Together, these results suggest that while feed-forward activity may suffice to recognize isolated objects, the brain employs recurrent processing more adaptively in naturalistic settings, using minimal feedback for simple scenes and increasing feedback for complex scenes.
Collapse
Affiliation(s)
- Iris I. A. Groen
- New York University, Department of Psychology, New York, New York, United States of America
| | - Sara Jahfari
- Spinoza Centre for Neuroimaging, Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, The Netherlands
- University of Amsterdam, Department of Psychology, Section Brain and Cognition, Amsterdam, The Netherlands
| | - Noor Seijdel
- University of Amsterdam, Department of Psychology, Section Brain and Cognition, Amsterdam, The Netherlands
| | - Sennay Ghebreab
- University of Amsterdam, Department of Psychology, Section Brain and Cognition, Amsterdam, The Netherlands
- University of Amsterdam, Department of Informatics, Intelligent Systems Lab, Amsterdam, The Netherlands
| | - Victor A. F. Lamme
- University of Amsterdam, Department of Psychology, Section Brain and Cognition, Amsterdam, The Netherlands
| | - H. Steven Scholte
- University of Amsterdam, Department of Psychology, Section Brain and Cognition, Amsterdam, The Netherlands
| |
Collapse
|
31
|
Affiliation(s)
- Peter A. White
- School of Psychology, Cardiff University, Cardiff, Wales, UK
| |
Collapse
|
32
|
Dima DC, Perry G, Singh KD. Spatial frequency supports the emergence of categorical representations in visual cortex during natural scene perception. Neuroimage 2018; 179:102-116. [PMID: 29902586 PMCID: PMC6057270 DOI: 10.1016/j.neuroimage.2018.06.033] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/01/2018] [Accepted: 06/09/2018] [Indexed: 11/22/2022] Open
Abstract
In navigating our environment, we rapidly process and extract meaning from visual cues. However, the relationship between visual features and categorical representations in natural scene perception is still not well understood. Here, we used natural scene stimuli from different categories and filtered at different spatial frequencies to address this question in a passive viewing paradigm. Using representational similarity analysis (RSA) and cross-decoding of magnetoencephalography (MEG) data, we show that categorical representations emerge in human visual cortex at ∼180 ms and are linked to spatial frequency processing. Furthermore, dorsal and ventral stream areas reveal temporally and spatially overlapping representations of low and high-level layer activations extracted from a feedforward neural network. Our results suggest that neural patterns from extrastriate visual cortex switch from low-level to categorical representations within 200 ms, highlighting the rapid cascade of processing stages essential in human visual perception.
Collapse
Affiliation(s)
- Diana C Dima
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom.
| | - Gavin Perry
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| | - Krish D Singh
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, CF24 4HQ, United Kingdom
| |
Collapse
|
33
|
Dmochowski JP, Ki JJ, DeGuzman P, Sajda P, Parra LC. Extracting multidimensional stimulus-response correlations using hybrid encoding-decoding of neural activity. Neuroimage 2018; 180:134-146. [DOI: 10.1016/j.neuroimage.2017.05.037] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Revised: 05/03/2017] [Accepted: 05/17/2017] [Indexed: 10/19/2022] Open
|
34
|
The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks. Neuroimage 2018; 178:172-182. [DOI: 10.1016/j.neuroimage.2018.05.037] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 05/10/2018] [Accepted: 05/14/2018] [Indexed: 11/18/2022] Open
|
35
|
Lowe MX, Rajsic J, Ferber S, Walther DB. Discriminating scene categories from brain activity within 100 milliseconds. Cortex 2018; 106:275-287. [PMID: 30037637 DOI: 10.1016/j.cortex.2018.06.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 02/26/2018] [Accepted: 06/01/2018] [Indexed: 10/28/2022]
Abstract
Humans have the ability to make sense of the world around them in only a single glance. This astonishing feat requires the visual system to extract information from our environment with remarkable speed. How quickly does this process unfold across time, and what visual information contributes to our understanding of the visual world? We address these questions by directly measuring the temporal dynamics of the perception of colour photographs and line drawings of scenes with electroencephalography (EEG) during a scene-memorization task. Within a fraction of a second, event-related potentials (ERPs) show dissociable response patterns for global scene properties of content (natural versus manmade) and layout (open versus closed). Subsequent detailed analyses of within-category versus between-category discriminations found significant dissociations of basic-level scene categories (e.g., forest; city) within the first 100 msec of perception. The similarity of this neural activity with feature-based discriminations suggests low-level image statistics may be foundational for this rapid categorization. Interestingly, our results also suggest that the structure preserved in line drawings may form a primary and necessary basis for visual processing, whereas surface information may further enhance category selectivity in later-stage processing. Critically, these findings provide evidence that the distinction of both basic-level categories and global properties of scenes from neural signals occurs within 100 msec.
Collapse
Affiliation(s)
| | - Jason Rajsic
- Psychology Department, University of Toronto, Canada
| | - Susanne Ferber
- Psychology Department, University of Toronto, Canada; Rotman Research Institute, Baycrest, Toronto, Canada
| | - Dirk B Walther
- Psychology Department, University of Toronto, Canada; Rotman Research Institute, Baycrest, Toronto, Canada
| |
Collapse
|
36
|
Establishing reference scales for scene naturalness and openness : Naturalness and openness scales. Behav Res Methods 2018; 51:1179-1186. [PMID: 29845553 DOI: 10.3758/s13428-018-1053-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A key question in the field of scene perception is what information people use when making decisions about images of scenes. A significant body of evidence has indicated the importance of global properties of a scene image. Ideally, well-controlled, real-world images would be used to examine the influence of these properties on perception. Unfortunately, real-world images are generally complex and impractical to control. In the current research, we elicit ratings of naturalness and openness from a large number of subjects using Amazon Mechanic Turk. Subjects were asked to indicate which of a randomly chosen pair of scene images was more representative of a global property. A score and rank for each image was then estimated based on those comparisons using the Bradley-Terry-Luce model. These ranked images offer the opportunity to exercise control over the global scene properties in stimulus set drawn from complex real-world images. This will allow a deeper exploration of the relationship between global scene properties and behavioral and neural responses.
Collapse
|
37
|
Hansen NE, Noesen BT, Nador JD, Harel A. The influence of behavioral relevance on the processing of global scene properties: An ERP study. Neuropsychologia 2018; 114:168-180. [PMID: 29729276 DOI: 10.1016/j.neuropsychologia.2018.04.040] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Revised: 04/27/2018] [Accepted: 04/30/2018] [Indexed: 12/01/2022]
Abstract
Recent work studying the temporal dynamics of visual scene processing (Harel et al., 2016) has found that global scene properties (GSPs) modulate the amplitude of early Event-Related Potentials (ERPs). It is still not clear, however, to what extent the processing of these GSPs is influenced by their behavioral relevance, determined by the goals of the observer. To address this question, we investigated how behavioral relevance, operationalized by the task context impacts the electrophysiological responses to GSPs. In a set of two experiments we recorded ERPs while participants viewed images of real-world scenes, varying along two GSPs, naturalness (manmade/natural) and spatial expanse (open/closed). In Experiment 1, very little attention to scene content was required as participants viewed the scenes while performing an orthogonal fixation-cross task. In Experiment 2 participants saw the same scenes but now had to actively categorize them, based either on their naturalness or spatial expense. We found that task context had very little impact on the early ERP responses to the naturalness and spatial expanse of the scenes: P1, N1, and P2 could distinguish between open and closed scenes and between manmade and natural scenes across both experiments. Further, the specific effects of naturalness and spatial expanse on the ERP components were largely unaffected by their relevance for the task. A task effect was found at the N1 and P2 level, but this effect was manifest across all scene dimensions, indicating a general effect rather than an interaction between task context and GSPs. Together, these findings suggest that the extraction of global scene information reflected in the early ERP components is rapid and very little influenced by top-down observer-based goals.
Collapse
Affiliation(s)
- Natalie E Hansen
- Department of Psychology, Wright State University, Dayton, OH, United States
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, United States
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, United States
| | - Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, United States.
| |
Collapse
|
38
|
Gu K, Tao D, Qiao JF, Lin W. Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1301-1313. [PMID: 28287984 DOI: 10.1109/tnnls.2017.2649101] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g., object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g., visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images, which are generally thought to be of the best quality. In this paper, we present two most important contributions. The first contribution is to develop a new no-reference (NR) IQA model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measure of visual quality using a regression module, which is learned with big-data training samples that are much bigger than the size of relevant image data sets. The results of experiments on nine data sets validate the superiority and efficiency of our blind metric compared with typical state-of-the-art full-reference, reduced-reference and NA IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NR-IQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, low-contrast images, low-light images, and dehazed images. The source code will be released at https://sites.google.com/site/guke198701/publications.
Collapse
|
39
|
Song Y, Luo H, Ma J, Hui B, Chang Z. Sky Detection in Hazy Image. SENSORS 2018; 18:s18041060. [PMID: 29614778 PMCID: PMC5948826 DOI: 10.3390/s18041060] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 03/23/2018] [Accepted: 03/28/2018] [Indexed: 11/16/2022]
Abstract
Sky detection plays an essential role in various computer vision applications. Most existing sky detection approaches, being trained on ideal dataset, may lose efficacy when facing unfavorable conditions like the effects of weather and lighting conditions. In this paper, a novel algorithm for sky detection in hazy images is proposed from the perspective of probing the density of haze. We address the problem by an image segmentation and a region-level classification. To characterize the sky of hazy scenes, we unprecedentedly introduce several haze-relevant features that reflect the perceptual hazy density and the scene depth. Based on these features, the sky is separated by two imbalance SVM classifiers and a similarity measurement. Moreover, a sky dataset (named HazySky) with 500 annotated hazy images is built for model training and performance evaluation. To evaluate the performance of our method, we conducted extensive experiments both on our HazySky dataset and the SkyFinder dataset. The results demonstrate that our method performs better on the detection accuracy than previous methods, not only under hazy scenes, but also under other weather conditions.
Collapse
Affiliation(s)
- Yingchao Song
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
| | - Haibo Luo
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
| | - Junkai Ma
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
| | - Bin Hui
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
| | - Zheng Chang
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
| |
Collapse
|
40
|
Duan Y, Yakovleva A, Norcia AM. Determinants of neural responses to disparity in natural scenes. J Vis 2018; 18:21. [PMID: 29677337 PMCID: PMC6097643 DOI: 10.1167/18.3.21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 02/05/2018] [Indexed: 11/24/2022] Open
Abstract
We studied disparity-evoked responses in natural scenes using high-density electroencephalography (EEG) in an event-related design. Thirty natural scenes that mainly included outdoor settings with trees and buildings were used. Twenty-four subjects viewed a series of trials composed of sequential two-alternative temporal forced-choice presentation of two different versions (two-dimensional [2D] vs. three-dimensional [3D]) of the same scene interleaved by a scrambled image with the same power spectrum. Scenes were viewed orthostereoscopically at 3 m through a pair of shutter glasses. After each trial, participants indicated with a key press which version of the scene was 3D. Performance on the discrimination was >90%. Participants who were more accurate also tended to respond faster; scenes that were reported more accurately as 3D also led to faster reaction times. We compared visual evoked potentials elicited by scrambled, 2D, and 3D scenes using reliable component analysis to reduce dimensionality. The disparity-evoked response to natural scene stimuli, measured from the difference potential between 2D and 3D scenes, comprised a sustained relative negativity in the dominant response component. The magnitude of the disparity-specific response was correlated with the observer's stereoacuity. Scenes with more homogeneous depth maps also tended to elicit large disparity-specific responses. Finally, the magnitude of the disparity-specific response was correlated with the magnitude of the differential response between scrambled and 2D scenes, suggesting that monocular higher-order scene statistics modulate disparity-specific responses.
Collapse
Affiliation(s)
- Yiran Duan
- Department of Psychology, Stanford University, Stanford, CA, USA
| | | | - Anthony M Norcia
- Department of Psychology, Stanford University, Stanford, CA, USA
| |
Collapse
|
41
|
Zuiderbaan W, van Leeuwen J, Dumoulin SO. Change Blindness Is Influenced by Both Contrast Energy and Subjective Importance within Local Regions of the Image. Front Psychol 2017; 8:1718. [PMID: 29046655 PMCID: PMC5632668 DOI: 10.3389/fpsyg.2017.01718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 09/19/2017] [Indexed: 11/13/2022] Open
Abstract
Our visual system receives an enormous amount of information, but not all information is retained. This is exemplified by the fact that subjects fail to detect large changes in a visual scene, i.e., change-blindness. Current theories propose that our ability to detect these changes is influenced by the gist or interpretation of an image. On the other hand, stimulus-driven image features such as contrast energy dominate the representation in early visual cortex (De Valois and De Valois, 1988; Boynton et al., 1999; Olman et al., 2004; Mante and Carandini, 2005; Dumoulin et al., 2008). Here we investigated whether contrast energy contributes to our ability to detect changes within a visual scene. We compared the ability to detect changes in contrast energy together with changes to a measure of the interpretation of an image. We used subjective important aspects of the image as a measure of the interpretation of an image. We measured reaction times while manipulating the contrast energy and subjective important properties using the change blindness paradigm. Our results suggest that our ability to detect changes in a visual scene is not only influenced by the subjective importance, but also by contrast energy. Also, we find that contrast energy and subjective importance interact. We speculate that contrast energy and subjective important properties are not independently represented in the visual system. Thus, our results suggest that the information that is retained of a visual scene is both influenced by stimulus-driven information as well as the interpretation of a scene.
Collapse
Affiliation(s)
- Wietske Zuiderbaan
- Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands
| | - Jonathan van Leeuwen
- Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands.,Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Serge O Dumoulin
- Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands.,Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands.,Spinoza Centre for Neuroimaging, Amsterdam, Netherlands
| |
Collapse
|
42
|
Object detection in natural scenes: Independent effects of spatial and category-based attention. Atten Percept Psychophys 2017; 79:738-752. [PMID: 28138945 PMCID: PMC5352795 DOI: 10.3758/s13414-017-1279-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Humans are remarkably efficient in detecting highly familiar object categories in natural scenes, with evidence suggesting that such object detection can be performed in the (near) absence of attention. Here we systematically explored the influences of both spatial attention and category-based attention on the accuracy of object detection in natural scenes. Manipulating both types of attention additionally allowed for addressing how these factors interact: whether the requirement for spatial attention depends on the extent to which observers are prepared to detect a specific object category-that is, on category-based attention. The results showed that the detection of targets from one category (animals or vehicles) was better than the detection of targets from two categories (animals and vehicles), demonstrating the beneficial effect of category-based attention. This effect did not depend on the semantic congruency of the target object and the background scene, indicating that observers attended to visual features diagnostic of the foreground target objects from the cued category. Importantly, in three experiments the detection of objects in scenes presented in the periphery was significantly impaired when observers simultaneously performed an attentionally demanding task at fixation, showing that spatial attention affects natural scene perception. In all experiments, the effects of category-based attention and spatial attention on object detection performance were additive rather than interactive. Finally, neither spatial nor category-based attention influenced metacognitive ability for object detection performance. These findings demonstrate that efficient object detection in natural scenes is independently facilitated by spatial and category-based attention.
Collapse
|
43
|
Mensen A, Marshall W, Tononi G. EEG Differentiation Analysis and Stimulus Set Meaningfulness. Front Psychol 2017; 8:1748. [PMID: 29056921 PMCID: PMC5635725 DOI: 10.3389/fpsyg.2017.01748] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 09/21/2017] [Indexed: 11/13/2022] Open
Abstract
A set of images can be considered as meaningfully different for an observer if they can be distinguished phenomenally from one another. Each phenomenal difference must be supported by some neurophysiological differences. Differentiation analysis aims to quantify neurophysiological differentiation evoked by a given set of stimuli to assess its meaningfulness to the individual observer. As a proof of concept using high-density EEG, we show increased neurophysiological differentiation for a set of natural, meaningfully different images in contrast to another set of artificially generated, meaninglessly different images in nine participants. Stimulus-evoked neurophysiological differentiation (over 257 channels, 800 ms) was systematically greater for meaningful vs. meaningless stimulus categories both at the group level and for individual subjects. Spatial breakdown showed a central-posterior peak of differentiation, consistent with the visual nature of the stimulus sets. Temporal breakdown revealed an early peak of differentiation around 110 ms, prominent in the central-posterior region; and a later, longer-lasting peak at 300-500 ms that was spatially more distributed. The early peak of differentiation was not accompanied by changes in mean ERP amplitude, whereas the later peak was associated with a higher amplitude ERP for meaningful images. An ERP component similar to visual-awareness-negativity occurred during the nadir of differentiation across all image types. Control stimulus sets and further analysis indicate that changes in neurophysiological differentiation between meaningful and meaningless stimulus sets could not be accounted for by spatial properties of the stimuli or by stimulus novelty and predictability.
Collapse
Affiliation(s)
- Armand Mensen
- Center for Sleep and Consciousness, University of Wisconsin-Madison, Madison, WI, United States.,Department of Neurology, Inselspital Bern, Bern, Switzerland
| | - William Marshall
- Center for Sleep and Consciousness, University of Wisconsin-Madison, Madison, WI, United States
| | - Giulio Tononi
- Center for Sleep and Consciousness, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
44
|
Cichy RM, Pantazis D. Multivariate pattern analysis of MEG and EEG: A comparison of representational structure in time and space. Neuroimage 2017; 158:441-454. [DOI: 10.1016/j.neuroimage.2017.07.023] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 06/03/2017] [Accepted: 07/12/2017] [Indexed: 11/24/2022] Open
|
45
|
Van der Jagt APN, Craig T, Brewer MJ, Pearson DG. A view not to be missed: Salient scene content interferes with cognitive restoration. PLoS One 2017; 12:e0169997. [PMID: 28723975 PMCID: PMC5516974 DOI: 10.1371/journal.pone.0169997] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 12/27/2016] [Indexed: 11/18/2022] Open
Abstract
Attention Restoration Theory (ART) states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration.
Collapse
Affiliation(s)
| | - Tony Craig
- The James Hutton Institute, Aberdeen, United Kingdom
| | - Mark J. Brewer
- Biomathematics and Statistics Scotland, Aberdeen, United Kingdom
| | | |
Collapse
|
46
|
Oosterwijk S. Choosing the negative: A behavioral demonstration of morbid curiosity. PLoS One 2017; 12:e0178399. [PMID: 28683147 PMCID: PMC5500011 DOI: 10.1371/journal.pone.0178399] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2016] [Accepted: 05/12/2017] [Indexed: 01/09/2023] Open
Abstract
This paper examined, with a behavioral paradigm, to what extent people choose to view stimuli that portray death, violence or harm. Based on briefly presented visual cues, participants made choices between highly arousing, negative images and positive or negative alternatives. The negative images displayed social scenes that involved death, violence or harm (e.g., war scene), or decontextualized, close-ups of physical harm (e.g., mutilated face) or natural threat (e.g., attacking shark). The results demonstrated that social negative images were chosen significantly more often than other negative categories. Furthermore, participants preferred social negative images over neutral images. Physical harm images and natural threat images were not preferred over neutral images, but were chosen in about thirty-five percent of the trials. These results were replicated across three different studies, including a study that presented verbal descriptions of images as pre-choice cues. Together, these results show that people deliberately subject themselves to negative images. With this, the present paper demonstrates a dynamic relationship between negative information and behavior and advances new insights into the phenomenon of morbid curiosity.
Collapse
Affiliation(s)
- Suzanne Oosterwijk
- Department of Social Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Brain and Cognition Centre, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
47
|
Goddard E, Klein C, Solomon SG, Hogendoorn H, Carlson TA. Interpreting the dimensions of neural feature representations revealed by dimensionality reduction. Neuroimage 2017; 180:41-67. [PMID: 28663068 DOI: 10.1016/j.neuroimage.2017.06.068] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 06/23/2017] [Indexed: 10/19/2022] Open
Abstract
Recent progress in understanding the structure of neural representations in the cerebral cortex has centred around the application of multivariate classification analyses to measurements of brain activity. These analyses have proved a sensitive test of whether given brain regions provide information about specific perceptual or cognitive processes. An exciting extension of this approach is to infer the structure of this information, thereby drawing conclusions about the underlying neural representational space. These approaches rely on exploratory data-driven dimensionality reduction to extract the natural dimensions of neural spaces, including natural visual object and scene representations, semantic and conceptual knowledge, and working memory. However, the efficacy of these exploratory methods is unknown, because they have only been applied to representations in brain areas for which we have little or no secondary knowledge. One of the best-understood areas of the cerebral cortex is area MT of primate visual cortex, which is known to be important in motion analysis. To assess the effectiveness of dimensionality reduction for recovering neural representational space we applied several dimensionality reduction methods to multielectrode measurements of spiking activity obtained from area MT of marmoset monkeys, made while systematically varying the motion direction and speed of moving stimuli. Despite robust tuning at individual electrodes, and high classifier performance, dimensionality reduction rarely revealed dimensions for direction and speed. We use this example to illustrate important limitations of these analyses, and suggest a framework for how to best apply such methods to data where the structure of the neural representation is unknown.
Collapse
Affiliation(s)
- Erin Goddard
- McGill Vision Research, Dept of Ophthalmology, McGill University, Montreal, QC, H3G 1A4, Canada; School of Psychology, University of Sydney, Sydney, NSW, 2006, Australia; ARC Centre of Excellence in Cognition and Its Disorders (CCD), Macquarie University, Sydney, NSW, 2109, Australia.
| | - Colin Klein
- ARC Centre of Excellence in Cognition and Its Disorders (CCD), Macquarie University, Sydney, NSW, 2109, Australia; Department of Philosophy, Macquarie University, Sydney, NSW, 2109, Australia
| | - Samuel G Solomon
- Department of Experimental Psychology, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Hinze Hogendoorn
- School of Psychology, University of Sydney, Sydney, NSW, 2006, Australia; Helmholtz Institute, Neuroscience & Cognition Utrecht, Experimental Psychology Division, Utrecht University, Utrecht, The Netherlands
| | - Thomas A Carlson
- School of Psychology, University of Sydney, Sydney, NSW, 2006, Australia; ARC Centre of Excellence in Cognition and Its Disorders (CCD), Macquarie University, Sydney, NSW, 2109, Australia
| |
Collapse
|
48
|
De Cesarei A, Loftus GR, Mastria S, Codispoti M. Understanding natural scenes: Contributions of image statistics. Neurosci Biobehav Rev 2017; 74:44-57. [DOI: 10.1016/j.neubiorev.2017.01.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 01/05/2017] [Accepted: 01/09/2017] [Indexed: 10/20/2022]
|
49
|
Güçlü U, van Gerven MAJ. Modeling the Dynamics of Human Brain Activity with Recurrent Neural Networks. Front Comput Neurosci 2017; 11:7. [PMID: 28232797 PMCID: PMC5299026 DOI: 10.3389/fncom.2017.00007] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 01/25/2017] [Indexed: 11/13/2022] Open
Abstract
Encoding models are used for predicting brain activity in response to sensory stimuli with the objective of elucidating how sensory information is represented in the brain. Encoding models typically comprise a nonlinear transformation of stimuli to features (feature model) and a linear convolution of features to responses (response model). While there has been extensive work on developing better feature models, the work on developing better response models has been rather limited. Here, we investigate the extent to which recurrent neural network models can use their internal memories for nonlinear processing of arbitrary feature sequences to predict feature-evoked response sequences as measured by functional magnetic resonance imaging. We show that the proposed recurrent neural network models can significantly outperform established response models by accurately estimating long-term dependencies that drive hemodynamic responses. The results open a new window into modeling the dynamics of brain activity in response to sensory stimuli.
Collapse
Affiliation(s)
- Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Netherlands
| | - Marcel A J van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Netherlands
| |
Collapse
|
50
|
Groen IIA, Silson EH, Baker CI. Contributions of low- and high-level properties to neural processing of visual scenes in the human brain. Philos Trans R Soc Lond B Biol Sci 2017; 372:rstb.2016.0102. [PMID: 28044013 DOI: 10.1098/rstb.2016.0102] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2016] [Indexed: 11/12/2022] Open
Abstract
Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Iris I A Groen
- Laboratory of Brain and Cognition, National Institutes of Health, 10 Center Drive 10-3N228, Bethesda, MD, USA
| | - Edward H Silson
- Laboratory of Brain and Cognition, National Institutes of Health, 10 Center Drive 10-3N228, Bethesda, MD, USA
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institutes of Health, 10 Center Drive 10-3N228, Bethesda, MD, USA
| |
Collapse
|