1
|
Morgenstern Y, Storrs KR, Schmidt F, Hartmann F, Tiedemann H, Wagemans J, Fleming RW. High-level aftereffects reveal the role of statistical features in visual shape encoding. Curr Biol 2024; 34:1098-1106.e5. [PMID: 38218184 PMCID: PMC10931819 DOI: 10.1016/j.cub.2023.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 11/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
Visual shape perception is central to many everyday tasks, from object recognition to grasping and handling tools.1,2,3,4,5,6,7,8,9,10 Yet how shape is encoded in the visual system remains poorly understood. Here, we probed shape representations using visual aftereffects-perceptual distortions that occur following extended exposure to a stimulus.11,12,13,14,15,16,17 Such effects are thought to be caused by adaptation in neural populations that encode both simple, low-level stimulus characteristics17,18,19,20 and more abstract, high-level object features.21,22,23 To tease these two contributions apart, we used machine-learning methods to synthesize novel shapes in a multidimensional shape space, derived from a large database of natural shapes.24 Stimuli were carefully selected such that low-level and high-level adaptation models made distinct predictions about the shapes that observers would perceive following adaptation. We found that adaptation along vector trajectories in the high-level shape space predicted shape aftereffects better than simple low-level processes. Our findings reveal the central role of high-level statistical features in the visual representation of shape. The findings also hint that human vision is attuned to the distribution of shapes experienced in the natural environment.
Collapse
Affiliation(s)
- Yaniv Morgenstern
- Erasmus University Rotterdam, Department of Psychology, Burgemeester Oudlaan 50, 3062PA Rotterdam, the Netherlands; University of Leuven (KU Leuven), Brain and Cognition, Tiensestraat 102, 3000 Leuven, Belgium.
| | - Katherine R Storrs
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Auckland, School of Psychology, 23 Symonds Street, Auckland 1010, New Zealand
| | - Filipp Schmidt
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Marburg and Justus Liebig University Giessen, Center for Mind, Brain and Behavior (CMBB), Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Frieder Hartmann
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany
| | - Henning Tiedemann
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany
| | - Johan Wagemans
- University of Leuven (KU Leuven), Brain and Cognition, Tiensestraat 102, 3000 Leuven, Belgium
| | - Roland W Fleming
- Justus Liebig University Giessen, Department of Psychology, Otto-Behaghel-Str. 10, 3000 Giessen, Germany; University of Marburg and Justus Liebig University Giessen, Center for Mind, Brain and Behavior (CMBB), Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| |
Collapse
|
2
|
Han S, Rezanejad M, Walther DB. Memorability of line drawings of scenes: the role of contour properties. Mem Cognit 2023:10.3758/s13421-023-01478-4. [PMID: 37903987 DOI: 10.3758/s13421-023-01478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2023] [Indexed: 11/01/2023]
Abstract
Why are some images more likely to be remembered than others? Previous work focused on the influence of global, low-level visual features as well as image content on memorability. To better understand the role of local, shape-based contours, we here investigate the memorability of photographs and line drawings of scenes. We find that the memorability of photographs and line drawings of the same scenes is correlated. We quantitatively measure the role of contour properties and their spatial relationships for scene memorability using a Random Forest analysis. To determine whether this relationship is merely correlational or if manipulating these contour properties causes images to be remembered better or worse, we split each line drawing into two half-images, one with high and the other with low predicted memorability according to the trained Random Forest model. In a new memorability experiment, we find that the half-images predicted to be more memorable were indeed remembered better, confirming a causal role of shape-based contour features, and, in particular, T junctions in scene memorability. We performed a categorization experiment on half-images to test for differential access to scene content. We found that half-images predicted to be more memorable were categorized more accurately. However, categorization accuracy for individual images was not correlated with their memorability. These results demonstrate that we can measure the contributions of individual contour properties to scene memorability and verify their causal involvement with targeted image manipulations, thereby bridging the gap between low-level features and scene semantics in our understanding of memorability.
Collapse
Affiliation(s)
- Seohee Han
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada.
| | - Morteza Rezanejad
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada
| | - Dirk B Walther
- Department of Psychology, University of Toronto, 100 St. George Street, Toronto, Canada
| |
Collapse
|
3
|
Farzanfar D, Walther DB. Changing What You Like: Modifying Contour Properties Shifts Aesthetic Valuations of Scenes. Psychol Sci 2023; 34:1101-1120. [PMID: 37669066 DOI: 10.1177/09567976231190546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023] Open
Abstract
To what extent do aesthetic experiences arise from the human ability to perceive and extract meaning from visual features? Ordinary scenes, such as a beach sunset, can elicit a sense of beauty in most observers. Although it appears that aesthetic responses can be shared among humans, little is known about the cognitive mechanisms that underlie this phenomenon. We developed a contour model of aesthetics that assigns values to visual properties in scenes, allowing us to predict aesthetic responses in adults from around the world. Through a series of experiments, we manipulate contours to increase or decrease aesthetic value while preserving scene semantic identity. Contour manipulations directly shift subjective aesthetic judgments. This provides the first experimental evidence for a causal relationship between contour properties and aesthetic valuation. Our findings support the notion that visual regularities underlie the human capacity to derive pleasure from visual information.
Collapse
|
4
|
Ayzenberg V, Lourenco S. Perception of an object's global shape is best described by a model of skeletal structure in human infants. eLife 2022; 11:e74943. [PMID: 35612898 PMCID: PMC9132572 DOI: 10.7554/elife.74943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
Categorization of everyday objects requires that humans form representations of shape that are tolerant to variations among exemplars. Yet, how such invariant shape representations develop remains poorly understood. By comparing human infants (6-12 months; N=82) to computational models of vision using comparable procedures, we shed light on the origins and mechanisms underlying object perception. Following habituation to a never-before-seen object, infants classified other novel objects across variations in their component parts. Comparisons to several computational models of vision, including models of high-level and low-level vision, revealed that infants' performance was best described by a model of shape based on the skeletal structure. Interestingly, infants outperformed a range of artificial neural network models, selected for their massive object experience and biological plausibility, under the same conditions. Altogether, these findings suggest that robust representations of shape can be formed with little language or object experience by relying on the perceptually invariant skeletal structure.
Collapse
Affiliation(s)
| | - Stella Lourenco
- Department of Psychology, Emory UniversityAtlantaUnited States
| |
Collapse
|
5
|
Tiedemann H, Morgenstern Y, Schmidt F, Fleming RW. One-shot generalization in humans revealed through a drawing task. eLife 2022; 11:75485. [PMID: 35536739 PMCID: PMC9090327 DOI: 10.7554/elife.75485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 05/01/2022] [Indexed: 11/13/2022] Open
Abstract
Humans have the amazing ability to learn new visual concepts from just a single exemplar. How we achieve this remains mysterious. State-of-the-art theories suggest observers rely on internal 'generative models', which not only describe observed objects, but can also synthesize novel variations. However, compelling evidence for generative models in human one-shot learning remains sparse. In most studies, participants merely compare candidate objects created by the experimenters, rather than generating their own ideas. Here, we overcame this key limitation by presenting participants with 2D 'Exemplar' shapes and asking them to draw their own 'Variations' belonging to the same class. The drawings reveal that participants inferred-and synthesized-genuine novel categories that were far more varied than mere copies. Yet, there was striking agreement between participants about which shape features were most distinctive, and these tended to be preserved in the drawn Variations. Indeed, swapping distinctive parts caused objects to swap apparent category. Our findings suggest that internal generative models are key to how humans generalize from single exemplars. When observers see a novel object for the first time, they identify its most distinctive features and infer a generative model of its shape, allowing them to mentally synthesize plausible variants.
Collapse
Affiliation(s)
- Henning Tiedemann
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Yaniv Morgenstern
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Laboratory of Experimental Psychology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Filipp Schmidt
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Giessen, Germany
| | - Roland W Fleming
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
6
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
7
|
Wilder J, Rezanejad M, Dickinson S, Siddiqi K, Jepson A, Walther DB. Neural correlates of local parallelism during naturalistic vision. PLoS One 2022; 17:e0260266. [PMID: 35061699 PMCID: PMC8782314 DOI: 10.1371/journal.pone.0260266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 11/07/2021] [Indexed: 11/18/2022] Open
Abstract
Human observers can rapidly perceive complex real-world scenes. Grouping visual elements into meaningful units is an integral part of this process. Yet, so far, the neural underpinnings of perceptual grouping have only been studied with simple lab stimuli. We here uncover the neural mechanisms of one important perceptual grouping cue, local parallelism. Using a new, image-computable algorithm for detecting local symmetry in line drawings and photographs, we manipulated the local parallelism content of real-world scenes. We decoded scene categories from patterns of brain activity obtained via functional magnetic resonance imaging (fMRI) in 38 human observers while they viewed the manipulated scenes. Decoding was significantly more accurate for scenes containing strong local parallelism compared to weak local parallelism in the parahippocampal place area (PPA), indicating a central role of parallelism in scene perception. To investigate the origin of the parallelism signal we performed a model-based fMRI analysis of the public BOLD5000 dataset, looking for voxels whose activation time course matches that of the locally parallel content of the 4916 photographs viewed by the participants in the experiment. We found a strong relationship with average local symmetry in visual areas V1-4, PPA, and retrosplenial cortex (RSC). Notably, the parallelism-related signal peaked first in V4, suggesting V4 as the site for extracting paralleism from the visual input. We conclude that local parallelism is a perceptual grouping cue that influences neuronal activity throughout the visual hierarchy, presumably starting at V4. Parallelism plays a key role in the representation of scene categories in PPA.
Collapse
Affiliation(s)
- John Wilder
- University of Toronto, Toronto, Canada
- * E-mail:
| | - Morteza Rezanejad
- University of Toronto, Toronto, Canada
- McGill University, Montreal, Canada
| | - Sven Dickinson
- University of Toronto, Toronto, Canada
- Samsung Toronto AI Research Center, Toronto, Canada
- Vector Institute, Toronto, Canada
| | | | - Allan Jepson
- University of Toronto, Toronto, Canada
- Samsung Toronto AI Research Center, Toronto, Canada
| | | |
Collapse
|
8
|
Ayzenberg V, Kamps FS, Dilks DD, Lourenco SF. Skeletal representations of shape in the human visual cortex. Neuropsychologia 2022; 164:108092. [PMID: 34801519 PMCID: PMC9840386 DOI: 10.1016/j.neuropsychologia.2021.108092] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 11/07/2021] [Accepted: 11/17/2021] [Indexed: 01/17/2023]
Abstract
Shape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented and used by the visual system. Here, we tested the hypothesis that the visual system represents object shape via a skeletal structure. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low-to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on shape processing in human vision, as well as the computational properties of V3 and LO. We discuss how these regions may support two putative roles of shape skeletons: namely, perceptual organization and object recognition.
Collapse
Affiliation(s)
- Vladislav Ayzenberg
- Department of Psychology, Carnegie Mellon University, USA,Corresponding author: (V. Ayzenberg)
| | - Frederik S. Kamps
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, USA
| | | | - Stella F. Lourenco
- Department of Psychology, Emory University, USA,Corresponding author: (S.F. Lourenco)
| |
Collapse
|
9
|
Pramod RT, Arun SP. Improving Machine Vision Using Human Perceptual Representations: The Case of Planar Reflection Symmetry for Object Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:228-241. [PMID: 32750809 PMCID: PMC7611439 DOI: 10.1109/tpami.2020.3008107] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Achieving human-like visual abilities is a holy grail for machine vision, yet precisely how insights from human vision can improve machines has remained unclear. Here, we demonstrate two key conceptual advances: First, we show that most machine vision models are systematically different from human object perception. To do so, we collected a large dataset of perceptual distances between isolated objects in humans and asked whether these perceptual data can be predicted by many common machine vision algorithms. We found that while the best algorithms explain ∼ 70 percent of the variance in the perceptual data, all the algorithms we tested make systematic errors on several types of objects. In particular, machine algorithms underestimated distances between symmetric objects compared to human perception. Second, we show that fixing these systematic biases can lead to substantial gains in classification performance. In particular, augmenting a state-of-the-art convolutional neural network with planar/reflection symmetry scores along multiple axes produced significant improvements in classification accuracy (1-10 percent) across categories. These results show that machine vision can be improved by discovering and fixing systematic differences from human vision.
Collapse
|
10
|
Contour features predict valence and threat judgements in scenes. Sci Rep 2021; 11:19405. [PMID: 34593933 PMCID: PMC8484627 DOI: 10.1038/s41598-021-99044-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/13/2021] [Indexed: 11/29/2022] Open
Abstract
Quickly scanning an environment to determine relative threat is an essential part of survival. Scene gist extracted rapidly from the environment may help people detect threats. Here, we probed this link between emotional judgements and features of visual scenes. We first extracted curvature, length, and orientation statistics of all images in the International Affective Picture System image set and related them to emotional valence scores. Images containing angular contours were rated as negative, and images containing long contours as positive. We then composed new abstract line drawings with specific combinations of length, angularity, and orientation values and asked participants to rate them as positive or negative, and as safe or threatening. Smooth, long, horizontal contour scenes were rated as positive/safe, while short angular contour scenes were rated as negative/threatening. Our work shows that particular combinations of image features help people make judgements about potential threat in the environment.
Collapse
|
11
|
Hayes TR, Henderson JM. Deep saliency models learn low-, mid-, and high-level features to predict scene attention. Sci Rep 2021; 11:18434. [PMID: 34531484 PMCID: PMC8445969 DOI: 10.1038/s41598-021-97879-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/31/2021] [Indexed: 02/08/2023] Open
Abstract
Deep saliency models represent the current state-of-the-art for predicting where humans look in real-world scenes. However, for deep saliency models to inform cognitive theories of attention, we need to know how deep saliency models prioritize different scene features to predict where people look. Here we open the black box of three prominent deep saliency models (MSI-Net, DeepGaze II, and SAM-ResNet) using an approach that models the association between attention, deep saliency model output, and low-, mid-, and high-level scene features. Specifically, we measured the association between each deep saliency model and low-level image saliency, mid-level contour symmetry and junctions, and high-level meaning by applying a mixed effects modeling approach to a large eye movement dataset. We found that all three deep saliency models were most strongly associated with high-level and low-level features, but exhibited qualitatively different feature weightings and interaction patterns. These findings suggest that prominent deep saliency models are primarily learning image features associated with high-level scene meaning and low-level image saliency and highlight the importance of moving beyond simply benchmarking performance.
Collapse
Affiliation(s)
- Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, 95618, USA.
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, 95618, USA
- Department of Psychology, University of California, Davis, 95616, USA
| |
Collapse
|
12
|
Shigene K, Hiasa Y, Otake Y, Soufi M, Janewanthanakul S, Nishimura T, Sato Y, Suetsugu S. Translation of Cellular Protein Localization Using Convolutional Networks. Front Cell Dev Biol 2021; 9:635231. [PMID: 34422790 PMCID: PMC8375474 DOI: 10.3389/fcell.2021.635231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 07/15/2021] [Indexed: 12/15/2022] Open
Abstract
Protein localization in cells has been analyzed by fluorescent labeling using indirect immunofluorescence and fluorescent protein tagging. However, the relationships between the localization of different proteins had not been analyzed using artificial intelligence. Here, we applied convolutional networks for the prediction of localization of the cytoskeletal proteins from the localization of the other proteins. Lamellipodia are one of the actin-dependent subcellular structures involved in cell migration and are mainly generated by the Wiskott-Aldrich syndrome protein (WASP)-family verprolin homologous protein 2 (WAVE2) and the membrane remodeling I-BAR domain protein IRSp53. Focal adhesion is another actin-based structure that contains vinculin protein and promotes lamellipodia formation and cell migration. In contrast, microtubules are not directly related to actin filaments. The convolutional network was trained using images of actin filaments paired with WAVE2, IRSp53, vinculin, and microtubules. The generated images of WAVE2, IRSp53, and vinculin were highly similar to their real images. In contrast, the microtubule images generated from actin filament images were inferior without the generation of filamentous structures, suggesting that microscopic images of actin filaments provide more information about actin-related protein localization. Collectively, this study suggests that image translation by the convolutional network can predict the localization of functionally related proteins, and the convolutional network might be used to describe the relationships between the proteins by their localization.
Collapse
Affiliation(s)
- Kei Shigene
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yuta Hiasa
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yoshito Otake
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Mazen Soufi
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Suphamon Janewanthanakul
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Tamako Nishimura
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yoshinobu Sato
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shiro Suetsugu
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan.,Center for Digital Green-Innovation, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
13
|
Baker N, Kellman PJ. Constant curvature modeling of abstract shape representation. PLoS One 2021; 16:e0254719. [PMID: 34339436 PMCID: PMC8328290 DOI: 10.1371/journal.pone.0254719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 07/01/2021] [Indexed: 11/19/2022] Open
Abstract
How abstract shape is perceived and represented poses crucial unsolved problems in human perception and cognition. Recent findings suggest that the visual system may encode contours as sets of connected constant curvature segments. Here we describe a model for how the visual system might recode a set of boundary points into a constant curvature representation. The model includes two free parameters that relate to the degree to which the visual system encodes shapes with high fidelity vs. the importance of simplicity in shape representations. We conducted two experiments to estimate these parameters empirically. Experiment 1 tested the limits of observers’ ability to discriminate a contour made up of two constant curvature segments from one made up of a single constant curvature segment. Experiment 2 tested observers’ ability to discriminate contours generated from cubic splines (which, mathematically, have no constant curvature segments) from constant curvature approximations of the contours, generated at various levels of precision. Results indicated a clear transition point at which discrimination becomes possible. The results were used to fix the two parameters in our model. In Experiment 3, we tested whether outputs from our parameterized model were predictive of perceptual performance in a shape recognition task. We generated shape pairs that had matched physical similarity but differed in representational similarity (i.e., the number of segments needed to describe the shapes) as assessed by our model. We found that pairs of shapes that were more representationally dissimilar were also easier to discriminate in a forced choice, same/different task. The results of these studies provide evidence for constant curvature shape representation in human visual perception and provide a testable model for how abstract shape descriptions might be encoded.
Collapse
Affiliation(s)
- Nicholas Baker
- Department of Psychology, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Philip J. Kellman
- Department of Psychology, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
14
|
Dvoeglazova M, Koshmanova E, Sawada T. Visual sensitivity to parallel configurations of contours compared with sensitivity to other configurations. Vision Res 2021; 188:149-161. [PMID: 34333200 DOI: 10.1016/j.visres.2021.07.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/11/2021] [Accepted: 07/09/2021] [Indexed: 11/28/2022]
Abstract
People can perceive 3D information from contour drawings and some types of configurations of contours in such drawings are important for 3D perception. We know that our visual system is sensitive to these configurations. Koshmanova & Sawada (2019, Vision Research, 154, 97-104) showed that the sensitivity is higher to a parallel configuration of contours than to a perpendicular configuration of contours. In this study, two psychophysical experiments were conducted that compared the sensitivity to a parallel configuration to two different configurations. In Experiment 1, orientation thresholds were measured with parallel and converging configurations composed of three contours. In Experiment 2, orientation thresholds of configurations composed of two contours were measured with parallel, collinear, and perpendicular configurations. The results of Experiment 1 showed that the visual system is more sensitive to parallel configurations than to converging configurations. The results of Experiment 2 showed that the sensitivity to the parallel configuration is analogous to the sensitivity to the collinear configuration, and it is higher than the sensitivity to the perpendicular configuration. The role that the parallel configuration plays in the 3D perception of contour-drawings is discussed.
Collapse
|
15
|
Morgenstern Y, Hartmann F, Schmidt F, Tiedemann H, Prokott E, Maiello G, Fleming RW. An image-computable model of human visual shape similarity. PLoS Comput Biol 2021; 17:e1008981. [PMID: 34061825 PMCID: PMC8195351 DOI: 10.1371/journal.pcbi.1008981] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 06/11/2021] [Accepted: 04/19/2021] [Indexed: 11/19/2022] Open
Abstract
Shape is a defining feature of objects, and human observers can effortlessly compare shapes to determine how similar they are. Yet, to date, no image-computable model can predict how visually similar or different shapes appear. Such a model would be an invaluable tool for neuroscientists and could provide insights into computations underlying human shape perception. To address this need, we developed a model (‘ShapeComp’), based on over 100 shape features (e.g., area, compactness, Fourier descriptors). When trained to capture the variance in a database of >25,000 animal silhouettes, ShapeComp accurately predicts human shape similarity judgments between pairs of shapes without fitting any parameters to human data. To test the model, we created carefully selected arrays of complex novel shapes using a Generative Adversarial Network trained on the animal silhouettes, which we presented to observers in a wide range of tasks. Our findings show that incorporating multiple ShapeComp dimensions facilitates the prediction of human shape similarity across a small number of shapes, and also captures much of the variance in the multiple arrangements of many shapes. ShapeComp outperforms both conventional pixel-based metrics and state-of-the-art convolutional neural networks, and can also be used to generate perceptually uniform stimulus sets, making it a powerful tool for investigating shape and object representations in the human brain. The ability to describe and compare shapes is crucial in many scientific domains from visual object recognition to computational morphology and computer graphics. Across disciplines, considerable effort has been devoted to the study of shape and its influence on object recognition, yet an important stumbling block is the quantitative characterization of shape similarity. Here we develop a psychophysically validated model that takes as input an object’s shape boundary and provides a high-dimensional output that can be used for predicting visual shape similarity. With this precise control of shape similarity, the model’s description of shape is a powerful tool that can be used across the neurosciences and artificial intelligence to test role of shape in perception and the brain.
Collapse
Affiliation(s)
- Yaniv Morgenstern
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
- * E-mail:
| | - Frieder Hartmann
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Filipp Schmidt
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Henning Tiedemann
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Eugen Prokott
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Guido Maiello
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Roland W. Fleming
- Department of Experimental Psychology, Justus-Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Giessen, Germany
| |
Collapse
|
16
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|
17
|
Wallis TS, Funke CM, Ecker AS, Gatys LA, Wichmann FA, Bethge M. Image content is more important than Bouma's Law for scene metamers. eLife 2019; 8:42512. [PMID: 31038458 PMCID: PMC6491040 DOI: 10.7554/elife.42512] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 03/09/2019] [Indexed: 11/16/2022] Open
Abstract
We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling. As you read this digest, your eyes move to follow the lines of text. But now try to hold your eyes in one position, while reading the text on either side and below: it soon becomes clear that peripheral vision is not as good as we tend to assume. It is not possible to read text far away from the center of your line of vision, but you can see ‘something’ out of the corner of your eye. You can see that there is text there, even if you cannot read it, and you can see where your screen or page ends. So how does the brain generate peripheral vision, and why does it differ from what you see when you look straight ahead? One idea is that the visual system averages information over areas of the peripheral visual field. This gives rise to texture-like patterns, as opposed to images made up of fine details. Imagine looking at an expanse of foliage, gravel or fur, for example. Your eyes cannot make out the individual leaves, pebbles or hairs. Instead, you perceive an overall pattern in the form of a texture. Our peripheral vision may also consist of such textures, created when the brain averages information over areas of space. Wallis, Funke et al. have now tested this idea using an existing computer model that averages visual input in this way. By giving the model a series of photographs to process, Wallis, Funke et al. obtained images that should in theory simulate peripheral vision. If the model mimics the mechanisms that generate peripheral vision, then healthy volunteers should be unable to distinguish the processed images from the original photographs. But in fact, the participants could easily discriminate the two sets of images. This suggests that the visual system does not solely use textures to represent information in the peripheral visual field. Wallis, Funke et al. propose that other factors, such as how the visual system separates and groups objects, may instead determine what we see in our peripheral vision. This knowledge could ultimately benefit patients with eye diseases such as macular degeneration, a condition that causes loss of vision in the center of the visual field and forces patients to rely on their peripheral vision.
Collapse
Affiliation(s)
- Thomas Sa Wallis
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Christina M Funke
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Alexander S Ecker
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany.,Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Leon A Gatys
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Felix A Wichmann
- Neural Information Processing Group, Faculty of Science, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|