1
|
Wang G, Foxwell MJ, Cichy RM, Pitcher D, Kaiser D. Individual differences in internal models explain idiosyncrasies in scene perception. Cognition 2024; 245:105723. [PMID: 38262271 DOI: 10.1016/j.cognition.2024.105723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 01/12/2024] [Accepted: 01/14/2024] [Indexed: 01/25/2024]
Abstract
According to predictive processing theories, vision is facilitated by predictions derived from our internal models of what the world should look like. However, the contents of these models and how they vary across people remains unclear. Here, we use drawing as a behavioral readout of the contents of the internal models in individual participants. Participants were first asked to draw typical versions of scene categories, as descriptors of their internal models. These drawings were converted into standardized 3d renders, which we used as stimuli in subsequent scene categorization experiments. Across two experiments, participants' scene categorization was more accurate for renders tailored to their own drawings compared to renders based on others' drawings or copies of scene photographs, suggesting that scene perception is determined by a match with idiosyncratic internal models. Using a deep neural network to computationally evaluate similarities between scene renders, we further demonstrate that graded similarity to the render based on participants' own typical drawings (and thus to their internal model) predicts categorization performance across a range of candidate scenes. Together, our results showcase the potential of a new method for understanding individual differences - starting from participants' personal expectations about the structure of real-world scenes.
Collapse
Affiliation(s)
- Gongting Wang
- Department of Education and Psychology, Freie Universität Berlin, Germany; Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Germany
| | | | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Germany
| | | | - Daniel Kaiser
- Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Germany; Center for Mind, Brain and Behavior (CMBB), Philipps-Universität Marburg and Justus-Liebig-Universität Gießen, Germany.
| |
Collapse
|
2
|
Hall EH, Geng JJ. Object-based attention during scene perception elicits boundary contraction in memory. Mem Cognit 2024:10.3758/s13421-024-01540-9. [PMID: 38530622 DOI: 10.3758/s13421-024-01540-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2024] [Indexed: 03/28/2024]
Abstract
Boundary contraction and extension are two types of scene transformations that occur in memory. In extension, viewers extrapolate information beyond the edges of the image, whereas in contraction, viewers forget information near the edges. Recent work suggests that image composition influences the direction and magnitude of boundary transformation. We hypothesize that selective attention at encoding is an important driver of boundary transformation effects, selective attention to specific objects at encoding leading to boundary contraction. In this study, one group of participants (N = 36) memorized 15 scenes while searching for targets, while a separate group (N = 36) just memorized the scenes. Both groups then drew the scenes from memory with as much object and spatial detail as they could remember. We asked online workers to provide ratings of boundary transformations in the drawings, as well as how many objects they contained and the precision of remembered object size and location. We found that search condition drawings showed significantly greater boundary contraction than drawings of the same scenes in the memorize condition. Search drawings were significantly more likely to contain target objects, and the likelihood to recall other objects in the scene decreased as a function of their distance from the target. These findings suggest that selective attention to a specific object due to a search task at encoding will lead to significant boundary contraction.
Collapse
Affiliation(s)
- Elizabeth H Hall
- Department of Psychology, University of California Davis, Davis, CA, 95616, USA.
- Center for Mind and Brain, University of California Davis, Davis, CA, 95618, USA.
| | - Joy J Geng
- Department of Psychology, University of California Davis, Davis, CA, 95616, USA
- Center for Mind and Brain, University of California Davis, Davis, CA, 95618, USA
| |
Collapse
|
3
|
Kennedy B, Malladi SN, Tootell RBH, Nasr S. A previously undescribed scene-selective site is the key to encoding ego-motion in naturalistic environments. Res Sq 2024:rs.3.rs-3378081. [PMID: 38260553 PMCID: PMC10802707 DOI: 10.21203/rs.3.rs-3378081/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Current models of scene processing in the human brain include three scene-selective areas: the Parahippocampal Place Area (or the temporal place areas; PPA/TPA), the restrosplenial cortex (or the medial place area; RSC/MPA) and the transverse occipital sulcus (or the occipital place area; TOS/OPA). Here, we challenged this model by showing that at least one other scene-selective site can also be detected within the human posterior intraparietal gyrus. Despite the smaller size of this site compared to the other scene-selective areas, the posterior intraparietal gyrus scene-selective (PIGS) site was detected consistently in a large pool of subjects (n=59; 33 females). The reproducibility of this finding was tested based on multiple criteria, including comparing the results across sessions, utilizing different scanners (3T and 7T) and stimulus sets. Furthermore, we found that this site (but not the other three scene-selective areas) is significantly sensitive to ego-motion in scenes, thus distinguishing the role of PIGS in scene perception relative to other scene-selective areas. These results highlight the importance of including finer scale scene-selective sites in models of scene processing - a crucial step toward a more comprehensive understanding of how scenes are encoded under dynamic conditions.
Collapse
Affiliation(s)
- Bryan Kennedy
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
| | - Sarala N. Malladi
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
| | - Roger B. H. Tootell
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
| | - Shahin Nasr
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
4
|
Kennedy B, Malladi SN, Tootell RBH, Nasr S. A previously undescribed scene-selective site is the key to encoding ego-motion in naturalistic environments. Res Sq 2024:rs.3.rs-3378081. [PMID: 38260553 PMCID: PMC10802707 DOI: 10.21203/rs.3.rs-3378081/v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Current models of scene processing in the human brain include three scene-selective areas: the Parahippocampal Place Area (or the temporal place areas; PPA/TPA), the restrosplenial cortex (or the medial place area; RSC/MPA) and the transverse occipital sulcus (or the occipital place area; TOS/OPA). Here, we challenged this model by showing that at least one other scene-selective site can also be detected within the human posterior intraparietal gyrus. Despite the smaller size of this site compared to the other scene-selective areas, the posterior intraparietal gyrus scene-selective (PIGS) site was detected consistently in a large pool of subjects (n=59; 33 females). The reproducibility of this finding was tested based on multiple criteria, including comparing the results across sessions, utilizing different scanners (3T and 7T) and stimulus sets. Furthermore, we found that this site (but not the other three scene-selective areas) is significantly sensitive to ego-motion in scenes, thus distinguishing the role of PIGS in scene perception relative to other scene-selective areas. These results highlight the importance of including finer scale scene-selective sites in models of scene processing - a crucial step toward a more comprehensive understanding of how scenes are encoded under dynamic conditions.
Collapse
Affiliation(s)
- Bryan Kennedy
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
| | - Sarala N. Malladi
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
| | - Roger B. H. Tootell
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
| | - Shahin Nasr
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
- Department of Radiology, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
5
|
Peacock CE, Hall EH, Henderson JM. Objects are selected for attention based upon meaning during passive scene viewing. Psychon Bull Rev 2023; 30:1874-1886. [PMID: 37095319 DOI: 10.3758/s13423-023-02286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/26/2023] [Indexed: 04/26/2023]
Abstract
While object meaning has been demonstrated to guide attention during active scene viewing and object salience guides attention during passive viewing, it is unknown whether object meaning predicts attention in passive viewing tasks and whether attention during passive viewing is more strongly related to meaning or salience. To answer this question, we used a mixed modeling approach where we computed the average meaning and physical salience of objects in scenes while statistically controlling for the roles of object size and eccentricity. Using eye-movement data from aesthetic judgment and memorization tasks, we then tested whether fixations are more likely to land on high-meaning objects than low-meaning objects while controlling for object salience, size, and eccentricity. The results demonstrated that fixations are more likely to be directed to high meaning objects than low meaning objects regardless of these other factors. Further analyses revealed that fixation durations were positively associated with object meaning irrespective of the other object properties. Overall, these findings provide the first evidence that objects are, in part, selected by meaning for attentional selection during passive scene viewing.
Collapse
Affiliation(s)
- Candace E Peacock
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA.
- Department of Psychology, University of California, Davis, CA, USA.
| | - Elizabeth H Hall
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
- Department of Psychology, University of California, Davis, CA, USA
| | - John M Henderson
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
- Department of Psychology, University of California, Davis, CA, USA
| |
Collapse
|
6
|
Aldegheri G, Gayet S, Peelen MV. Scene context automatically drives predictions of object transformations. Cognition 2023; 238:105521. [PMID: 37354785 DOI: 10.1016/j.cognition.2023.105521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/26/2023]
Abstract
As our viewpoint changes, the whole scene around us rotates coherently. This allows us to predict how one part of a scene (e.g., an object) will change by observing other parts (e.g., the scene background). While human object perception is known to be strongly context-dependent, previous research has largely focused on how scene context can disambiguate fixed object properties, such as identity (e.g., a car is easier to recognize on a road than on a beach). It remains an open question whether object representations are updated dynamically based on the surrounding scene context, for example across changes in viewpoint. Here, we tested whether human observers dynamically and automatically predict the appearance of objects based on the orientation of the background scene. In three behavioral experiments (N = 152), we temporarily occluded objects within scenes that rotated. Upon the objects' reappearance, participants had to perform a perceptual discrimination task, which did not require taking the scene rotation into account. Performance on this orthogonal task strongly depended on whether objects reappeared rotated coherently with the surrounding scene or not. This effect persisted even when a majority of trials violated this real-world contingency between scene and object, showcasing the automaticity of these scene-based predictions. These findings indicate that contextual information plays an important role in predicting object transformations in structured real-world environments.
Collapse
Affiliation(s)
- Giacomo Aldegheri
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Thomas van Aquinostraat 4, Nijmegen 6525 GD, the Netherlands; Department of Psychology, Amsterdam Brain & Cognition Center, University of Amsterdam, Nieuwe Achtergracht 129-B, Amsterdam 1018 WS, the Netherlands.
| | - Surya Gayet
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Thomas van Aquinostraat 4, Nijmegen 6525 GD, the Netherlands; Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, Utrecht 3584 CS, the Netherlands
| | - Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Thomas van Aquinostraat 4, Nijmegen 6525 GD, the Netherlands
| |
Collapse
|
7
|
Klever L, Islam J, Võ MLH, Billino J. Aging attenuates the memory advantage for unexpected objects in real-world scenes. Heliyon 2023; 9:e20241. [PMID: 37809883 PMCID: PMC10560015 DOI: 10.1016/j.heliyon.2023.e20241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 09/14/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
Across the adult lifespan memory processes are subject to pronounced changes. Prior knowledge and expectations might critically shape functional differences; however, corresponding findings have remained ambiguous so far. Here, we chose a tailored approach to scrutinize how schema (in-)congruencies affect older and younger adults' memory for objects embedded in real-world scenes, a scenario close to everyday life memory demands. A sample of 23 older (52-81 years) and 23 younger adults (18-38 years) freely viewed 60 photographs of scenes in which target objects were included that were either congruent or incongruent with the given context information. After a delay, recognition performance for those objects was determined. In addition, recognized objects had to be matched to the scene context in which they were previously presented. While we found schema violations beneficial for object recognition across age groups, the advantage was significantly less pronounced in older adults. We moreover observed an age-related congruency bias for matching objects to their original scene context. Our findings support a critical role of predictive processes for age-related memory differences and indicate enhanced weighting of predictions with age. We suggest that recent predictive processing theories provide a particularly useful framework to elaborate on age-related functional vulnerabilities as well as stability.
Collapse
Affiliation(s)
- Lena Klever
- Experimental Psychology, Justus Liebig University Giessen, Germany
- Center for Mind, Brain, And Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| | - Jasmin Islam
- Experimental Psychology, Justus Liebig University Giessen, Germany
| | - Melissa Le-Hoa Võ
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Jutta Billino
- Experimental Psychology, Justus Liebig University Giessen, Germany
- Center for Mind, Brain, And Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| |
Collapse
|
8
|
Kang J, Park S. Combined representation of visual features in the scene-selective cortex. bioRxiv 2023:2023.07.24.550280. [PMID: 37546776 PMCID: PMC10402097 DOI: 10.1101/2023.07.24.550280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Visual features of separable dimensions like color and shape conjoin to represent an integrated entity. We investigated how visual features bind to form a complex visual scene. Specifically, we focused on features important for visually guided navigation: direction and distance. Previously, separate works have shown that directions and distances of navigable paths are coded in the occipital place area (OPA). Using functional magnetic resonance imaging (fMRI), we tested how separate features are concurrently represented in the OPA. Participants saw eight different types of scenes, in which four of them had one path and the other four had two paths. In single-path scenes, path direction was either to the left or to the right. In double-path scenes, both directions were present. Each path contained a glass wall located either near or far, changing the navigational distance. To test how the OPA represents paths in terms of direction and distance features, we took three approaches. First, the independent-features approach examined whether the OPA codes directions and distances independently in single-path scenes. Second, the integrated-features approach explored how directions and distances are integrated into path units, as compared to pooled features, using double-path scenes. Finally, the integrated-paths approach asked how separate paths are combined into a scene. Using multi-voxel pattern similarity analysis, we found that the OPA's representations of single-path scenes were similar to other single-path scenes of either the same direction or the same distance. Representations of double-path scenes were similar to the combination of two constituent single-paths, as a combined unit of direction and distance rather than pooled representation of all features. These results show that the OPA combines the two features to form path units, which are then used to build multiple-path scenes. Altogether, these results suggest that visually guided navigation may be supported by the OPA that automatically and efficiently combines multiple features relevant for navigation and represent a navigation file.
Collapse
Affiliation(s)
- Jisu Kang
- Department of Psychology, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Soojin Park
- Department of Psychology, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
| |
Collapse
|
9
|
Tanabe-Ishibashi A, Ishibashi R, Hatori Y. Control of bottom-up attention in scene cognition contributes to visual working memory performance. Atten Percept Psychophys 2023:10.3758/s13414-023-02740-2. [PMID: 37337017 DOI: 10.3758/s13414-023-02740-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2023] [Indexed: 06/21/2023]
Abstract
Several studies have investigated the relationship between working memory and attention. However, most of the relevant studies so far investigated top-down attention; only a few have examined possible interactions between bottom-up attention and visual working memory. In the present study, we focused on the visual saliency of different parts of pictures as an index of the degree to which one's bottom-up attention can be drawn towards each of them. We administered the Picture Span Test (PST) to investigate whether salient parts of pictures can influence the performance of visual working memory. The task required participants to judge the semantic congruency of objects in pictures and remember specific parts of pictures. In Experiment 1, we calculated a saliency map for the PST stimuli and found that salient but task-irrelevant parts of pictures could evoke intrusion errors. In Experiment 2, we demonstrated that longer gazing time at target areas results in a higher probability of making correct recognition. In addition, frequent gaze fixation and high normalized scan-path saliency values in task-irrelevant areas were associated with intrusion errors. These results suggest that visual information processed by bottom-up attention may affect working memory.
Collapse
Affiliation(s)
- Azumi Tanabe-Ishibashi
- International Research Institute of Disaster Science, Tohoku University, Tohoku, Miyagi, Japan.
- Institute of Development, Aging and Cancer, Tohoku University, Miyagi, Japan.
| | - Ryo Ishibashi
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Osaka, Japan
| | - Yasuhiro Hatori
- Research Institute of Electrical Communication, Tohoku University, Miyagi, Japan
| |
Collapse
|
10
|
Cheng A, Chen Z, Dilks DD. A stimulus-driven approach reveals vertical luminance gradient as a stimulus feature that drives human cortical scene selectivity. Neuroimage 2023; 269:119935. [PMID: 36764369 PMCID: PMC10044493 DOI: 10.1016/j.neuroimage.2023.119935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/19/2023] [Accepted: 02/07/2023] [Indexed: 02/11/2023] Open
Abstract
Human neuroimaging studies have revealed a dedicated cortical system for visual scene processing. But what is a "scene"? Here, we use a stimulus-driven approach to identify a stimulus feature that selectively drives cortical scene processing. Specifically, using fMRI data from BOLD5000, we examined the images that elicited the greatest response in the cortical scene processing system, and found that there is a common "vertical luminance gradient" (VLG), with the top half of a scene image brighter than the bottom half; moreover, across the entire set of images, VLG systematically increases with the neural response in the scene-selective regions (Study 1). Thus, we hypothesized that VLG is a stimulus feature that selectively engages cortical scene processing, and directly tested the role of VLG in driving cortical scene selectivity using tightly controlled VLG stimuli (Study 2). Consistent with our hypothesis, we found that the scene-selective cortical regions-but not an object-selective region or early visual cortex-responded significantly more to images of VLG over control stimuli with minimal VLG. Interestingly, such selectivity was also found for images with an "inverted" VLG, resembling the luminance gradient in night scenes. Finally, we also tested the behavioral relevance of VLG for visual scene recognition (Study 3); we found that participants even categorized tightly controlled stimuli of both upright and inverted VLG to be a place more than an object, indicating that VLG is also used for behavioral scene recognition. Taken together, these results reveal that VLG is a stimulus feature that selectively engages cortical scene processing, and provide evidence for a recent proposal that visual scenes can be characterized by a set of common and unique visual features.
Collapse
Affiliation(s)
- Annie Cheng
- Department of Psychology, Emory University, Atlanta, GA, USA; Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Zirui Chen
- Department of Psychology, Emory University, Atlanta, GA, USA; Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel D Dilks
- Department of Psychology, Emory University, Atlanta, GA, USA.
| |
Collapse
|
11
|
Odic D, Oppenheimer DM. Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays. Cognition 2023; 230:105291. [PMID: 36183630 DOI: 10.1016/j.cognition.2022.105291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/15/2022] [Accepted: 09/23/2022] [Indexed: 10/14/2022]
Abstract
While the human visual system is sensitive to numerosity, the mechanisms that allow perception to extract and represent the number of objects in a scene remains unknown. Prominent theoretical approaches posit that numerosity perception emerges from passive experience with visual scenes throughout development, and that unsupervised deep neural network models mirror all characteristic behavioral features observed in participants. Here, we derive and test a novel prediction: if the visual number sense emerges from exposure to real-world scenes, then the closer a stimulus aligns with the natural statistics of the real world, the better number perception should be. But - in contrast to this prediction - we observe no such advantage (and sometimes even a notable impairment) in number perception for natural scenes compared to artificial dot displays in college-aged adults. These findings are not accounted for by the difficulty in object identification, visual clutter, the parsability of objects from the rest of the scene, or increased occlusion. This pattern of results represents a fundamental challenge to recent models of numerosity perception based in experiential learning of statistical regularities, and instead suggests that the visual number sense is attuned to abstract number of objects, independent of their underlying correlation with non-numeric features. We discuss our results in the context of recent proposals that suggest that object complexity and entropy may play a role in number perception.
Collapse
|
12
|
Abassi E, Papeo L. Behavioral and neural markers of visual configural processing in social scene perception. Neuroimage 2022; 260:119506. [PMID: 35878724 DOI: 10.1016/j.neuroimage.2022.119506] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 07/18/2022] [Accepted: 07/21/2022] [Indexed: 11/19/2022] Open
Abstract
Research on face perception has revealed highly specialized visual mechanisms such as configural processing, and provided markers of interindividual differences -including disease risks and alterations- in visuo-perceptual abilities that traffic in social cognition. Is face perception unique in degree or kind of mechanisms, and in its relevance for social cognition? Combining functional MRI and behavioral methods, we address the processing of an uncharted class of socially relevant stimuli: minimal social scenes involving configurations of two bodies spatially close and face-to-face as if interacting (hereafter, facing dyads). We report category-specific activity for facing (vs. non-facing) dyads in visual cortex. That activity shows face-like signatures of configural processing -i.e., stronger response to facing (vs. non-facing) dyads, and greater susceptibility to stimulus inversion for facing (vs. non-facing) dyads-, and is predicted by performance-based measures of configural processing in visual perception of body dyads. Moreover, we observe that the individual performance in body-dyad perception is reliable, stable-over-time and correlated with the individual social sensitivity, coarsely captured by the Autism-Spectrum Quotient. Further analyses clarify the relationship between single-body and body-dyad perception. We propose that facing dyads are processed through highly specialized mechanisms -and brain areas-, analogously to other biologically and socially relevant stimuli such as faces. Like face perception, facing-dyad perception can reveal basic (visual) processes that lay the foundations for understanding others, their relationships and interactions.
Collapse
Affiliation(s)
- Etienne Abassi
- Institut des Sciences Cognitives-Marc Jeannerod, UMR5229, Centre National de la Recherche Scientifique (CNRS) and Université Claude Bernard Lyon 1, 67 Bd. Pinel, 69675 Bron France.
| | - Liuba Papeo
- Institut des Sciences Cognitives-Marc Jeannerod, UMR5229, Centre National de la Recherche Scientifique (CNRS) and Université Claude Bernard Lyon 1, 67 Bd. Pinel, 69675 Bron France
| |
Collapse
|
13
|
Baror S, Bar M, Aminoff E. How associative thinking influences scene perception. Conscious Cogn 2022; 103:103377. [PMID: 35841841 DOI: 10.1016/j.concog.2022.103377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 06/14/2022] [Accepted: 06/30/2022] [Indexed: 11/22/2022]
Abstract
Perception of our external environment is not isolated from the influence of our internal thoughts, and past evidence points to a possible common associative mechanism underlying both the perception of scenes and our internal thought. Here, we investigated the nature of the interaction between an associative mindset and scene perception, hypothesizing a functional advantage to an associative thought pattern in the perception of scenes. Experiments 1 and 2 showed that associative thinking facilitates scene perception, which evolved over the course of the experiments. In contrast to scene perception, Experiment 3 showed that associative thinking hinders the perception of mundane objects, in which associative information is minimized. Nevertheless, object perception was facilitated when associative thinking was reduced. This double dissociation suggests that an associative mind is more receptive of externally perceived associative information, and that a match between the orientation of internal and external processing may be key for perception.
Collapse
|
14
|
Hutson JP, Chandran P, Magliano JP, Smith TJ, Loschky LC. Narrative Comprehension Guides Eye Movements in the Absence of Motion. Cogn Sci 2022; 46:e13131. [PMID: 35579883 DOI: 10.1111/cogs.13131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 02/17/2022] [Accepted: 02/19/2022] [Indexed: 11/30/2022]
Abstract
Viewers' attentional selection while looking at scenes is affected by both top-down and bottom-up factors. However, when watching film, viewers typically attend to the movie similarly irrespective of top-down factors-a phenomenon we call the tyranny of film. A key difference between still pictures and film is that film contains motion, which is a strong attractor of attention and highly predictive of gaze during film viewing. The goal of the present study was to test if the tyranny of film is driven by motion. To do this, we created a slideshow presentation of the opening scene of Touch of Evil. Context condition participants watched the full slideshow. No-context condition participants did not see the opening portion of the scene, which showed someone placing a time bomb into the trunk of a car. In prior research, we showed that despite producing very different understandings of the clip, this manipulation did not affect viewers' attention (i.e., the tyranny of film), as both context and no-context participants were equally likely to fixate on the car with the bomb when the scene was presented as a film. The current study found that when the scene was shown as a slideshow, the context manipulation produced differences in attentional selection (i.e., it attenuated attentional synchrony). We discuss these results in the context of the Scene Perception and Event Comprehension Theory, which specifies the relationship between event comprehension and attentional selection in the context of visual narratives.
Collapse
Affiliation(s)
- John P Hutson
- Department of Learning Sciences, Georgia State University
| | | | | | - Tim J Smith
- Department of Psychological Sciences, Birkbeck, University of London
| | | |
Collapse
|
15
|
Hayes TR, Henderson JM. Meaning maps detect the removal of local semantic scene content but deep saliency models do not. Atten Percept Psychophys 2022; 84:647-54. [PMID: 35138579 DOI: 10.3758/s13414-021-02395-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2021] [Indexed: 11/08/2022]
Abstract
Meaning mapping uses human raters to estimate different semantic features in scenes, and has been a useful tool in demonstrating the important role semantics play in guiding attention. However, recent work has argued that meaning maps do not capture semantic content, but like deep learning models of scene attention, represent only semantically-neutral image features. In the present study, we directly tested this hypothesis using a diffeomorphic image transformation that is designed to remove the meaning of an image region while preserving its image features. Specifically, we tested whether meaning maps and three state-of-the-art deep learning models were sensitive to the loss of semantic content in this critical diffeomorphed scene region. The results were clear: meaning maps generated by human raters showed a large decrease in the diffeomorphed scene regions, while all three deep saliency models showed a moderate increase in the diffeomorphed scene regions. These results demonstrate that meaning maps reflect local semantic content in scenes while deep saliency models do something else. We conclude the meaning mapping approach is an effective tool for estimating semantic content in scenes.
Collapse
|
16
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
17
|
Abstract
During scene viewing, semantic information in the scene has been shown to play a dominant role in guiding fixations compared to visual salience (e.g., Henderson & Hayes, 2017). However, scene viewing is sometimes disrupted by cognitive processes unrelated to the scene. For example, viewers sometimes engage in mind-wandering, or having thoughts unrelated to the current task. How do meaning and visual salience account for fixation allocation when the viewer is mind-wandering, and does it differ from when the viewer is on-task? We asked participants to study a series of real-world scenes in preparation for a later memory test. Thought probes occasionally occurred after a subset of scenes to assess whether participants were on-task or mind-wandering. We used salience maps (Graph-Based Visual Saliency; Harel, Koch, & Perona, 2007) and meaning maps (Henderson & Hayes, 2017) to represent the distribution of visual salience and semantic richness in the scene, respectively. Because visual salience and meaning were represented similarly, we could directly compare how well they predicted fixation allocation. Our results indicate that fixations prioritized meaningful over visually salient regions in the scene during mind-wandering just as during attentive viewing. These results held across the entire viewing time. A re-analysis of an independent study (Krasich, Huffman, Faber, & Brockmole Journal of Vision, 20(9), 10, 2020) showed similar results. Therefore, viewers appear to prioritize meaningful regions over visually salient regions in real-world scenes even during mind-wandering.
Collapse
|
18
|
Henderson JM, Hayes TR, Peacock CE, Rehrig G. Meaning maps capture the density of local semantic features in scenes: A reply to Pedziwiatr, Kümmerer, Wallis, Bethge & Teufel (2021). Cognition 2021; 214:104742. [PMID: 33892912 DOI: 10.1016/j.cognition.2021.104742] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 04/13/2021] [Accepted: 04/15/2021] [Indexed: 11/17/2022]
Abstract
Pedziwiatr, Kümmerer, Wallis, Bethge, & Teufel (2021) contend that Meaning Maps do not represent the spatial distribution of semantic features in scenes. We argue that Pesziwiatr et al. provide neither logical nor empirical support for that claim, and we conclude that Meaning Maps do what they were designed to do: represent the spatial distribution of meaning in scenes.
Collapse
Affiliation(s)
- John M Henderson
- Center for Mind and Brain, University of California, Davis, USA; Department of Psychology, University of California, Davis, USA.
| | - Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, USA
| | - Candace E Peacock
- Center for Mind and Brain, University of California, Davis, USA; Department of Psychology, University of California, Davis, USA
| | | |
Collapse
|
19
|
Pedziwiatr MA, Kümmerer M, Wallis TSA, Bethge M, Teufel C. There is no evidence that meaning maps capture semantic information relevant to gaze guidance: Reply to Henderson, Hayes, Peacock, and Rehrig (2021). Cognition 2021; 214:104741. [PMID: 33941376 DOI: 10.1016/j.cognition.2021.104741] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 04/15/2021] [Indexed: 11/17/2022]
Abstract
The concerns raised by Henderson, Hayes, Peacock, and Rehrig (2021) are based on misconceptions of our work. We show that Meaning Maps (MMs) do not predict gaze guidance better than a state-of-the-art saliency model that is based on semantically-neutral, high-level features. We argue that there is therefore no evidence to date that MMs index anything beyond these features. Furthermore, we show that although alterations in meaning cause changes in gaze guidance, MMs fail to capture these alterations. We agree that semantic information is important in the guidance of eye-movements, but the contribution of MMs for understanding its role remains elusive.
Collapse
Affiliation(s)
- Marek A Pedziwiatr
- Cardiff University, Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff, United Kingdom; Queen Mary University of London, Department of Biological and Experimental Psychology, London, United Kingdom.
| | | | - Thomas S A Wallis
- Technical University Darmstadt, Institute for Psychology and Centre for Cognitive Science, Darmstadt, Germany
| | | | - Christoph Teufel
- Cardiff University, Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff, United Kingdom
| |
Collapse
|
20
|
Abstract
The eye movement analysis with hidden Markov models (EMHMM) method provides quantitative measures of individual differences in eye-movement pattern. However, it is limited to tasks where stimuli have the same feature layout (e.g., faces). Here we proposed to combine EMHMM with the data mining technique co-clustering to discover participant groups with consistent eye-movement patterns across stimuli for tasks involving stimuli with different feature layouts. Through applying this method to eye movements in scene perception, we discovered explorative (switching between the foreground and background information or different regions of interest) and focused (mainly looking at the foreground with less switching) eye-movement patterns among Asian participants. Higher similarity to the explorative pattern predicted better foreground object recognition performance, whereas higher similarity to the focused pattern was associated with better feature integration in the flanker task. These results have important implications for using eye tracking as a window into individual differences in cognitive abilities and styles. Thus, EMHMM with co-clustering provides quantitative assessments on eye-movement patterns across stimuli and tasks. It can be applied to many other real-life visual tasks, making a significant impact on the use of eye tracking to study cognitive behavior across disciplines.
Collapse
|
21
|
Suzuki S, Kamps FS, Dilks DD, Treadway MT. Two scene navigation systems dissociated by deliberate versus automatic processing. Cortex 2021; 140:199-209. [PMID: 33992908 DOI: 10.1016/j.cortex.2021.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 11/25/2020] [Accepted: 03/20/2021] [Indexed: 10/21/2022]
Abstract
Successfully navigating the world requires avoiding boundaries and obstacles in one's immediately-visible environment, as well as finding one's way to distant places in the broader environment. Recent neuroimaging studies suggest that these two navigational processes involve distinct cortical scene processing systems, with the occipital place area (OPA) supporting navigation through the local visual environment, and the retrosplenial complex (RSC) supporting navigation through the broader spatial environment. Here we hypothesized that these systems are distinguished not only by the scene information they represent (i.e., the local visual versus broader spatial environment), but also based on the automaticity of the process they involve, with navigation through the broader environment (including RSC) operating deliberately, and navigation through the local visual environment (including OPA) operating automatically. We tested this hypothesis using fMRI and a maze-navigation paradigm, where participants navigated two maze structures (complex or simple, testing representation of the broader spatial environment) under two conditions (active or passive, testing deliberate versus automatic processing). Consistent with the hypothesis that RSC supports deliberate navigation through the broader environment, RSC responded significantly more to complex than simple mazes during active, but not passive navigation. By contrast, consistent with the hypothesis that OPA supports automatic navigation through the local visual environment, OPA responded strongly even during passive navigation, and did not differentiate between active versus passive conditions. Taken together, these findings suggest the novel hypothesis that navigation through the broader spatial environment is deliberate, whereas navigation through the local visual environment is automatic, shedding new light on the dissociable functions of these systems.
Collapse
Affiliation(s)
- Shosuke Suzuki
- Department of Psychology, Emory University, Atlanta, GA, United States
| | - Frederik S Kamps
- Department of Psychology, Emory University, Atlanta, GA, United States; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Daniel D Dilks
- Department of Psychology, Emory University, Atlanta, GA, United States
| | - Michael T Treadway
- Department of Psychology, Emory University, Atlanta, GA, United States; Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States.
| |
Collapse
|
22
|
Melcher D, Huber-Huber C, Wutz A. Enumerating the forest before the trees: The time courses of estimation-based and individuation-based numerical processing. Atten Percept Psychophys 2021; 83:1215-1229. [PMID: 33000437 PMCID: PMC8049909 DOI: 10.3758/s13414-020-02137-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/01/2020] [Indexed: 11/23/2022]
Abstract
Ensemble perception refers to the ability to report attributes of a group of objects, rather than focusing on only one or a few individuals. An everyday example of ensemble perception is the ability to estimate the numerosity of a large number of items. The time course of ensemble processing, including that of numerical estimation, remains a matter of debate, with some studies arguing for rapid, "preattentive" processing and other studies suggesting that ensemble perception improves with longer presentation durations. We used a forward-simultaneous masking procedure that effectively controls stimulus durations to directly measure the temporal dynamics of ensemble estimation and compared it with more precise enumeration of individual objects. Our main finding was that object individuation within the subitizing range (one to four items) took about 100-150 ms to reach its typical capacity limits, whereas estimation (six or more items) showed a temporal resolution of 50 ms or less. Estimation accuracy did not improve over time. Instead, there was an increasing tendency, with longer effective durations, to underestimate the number of targets for larger set sizes (11-35 items). Overall, the time course of enumeration for one or a few single items was dramatically different from that of estimating numerosity of six or more items. These results are consistent with the idea that the temporal resolution of ensemble processing may be as rapid as, or even faster than, individuation of individual items, and support a basic distinction between the mechanisms underlying exact enumeration of small sets (one to four items) from estimation.
Collapse
Affiliation(s)
- David Melcher
- Center for Mind/Brain Sciences and Department of Psychology and Cognitive Sciences, University of Trento, Corso Bettini 31, 38068, Rovereto, Italy.
- Psychology Program, Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Christoph Huber-Huber
- Center for Mind/Brain Sciences and Department of Psychology and Cognitive Sciences, University of Trento, Corso Bettini 31, 38068, Rovereto, Italy
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Andreas Wutz
- Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
- Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| |
Collapse
|
23
|
Ran T, Yuan L, Zhang JB. Scene perception based visual navigation of mobile robot in indoor environment. ISA Trans 2021; 109:389-400. [PMID: 33069374 PMCID: PMC7550175 DOI: 10.1016/j.isatra.2020.10.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 07/28/2020] [Accepted: 10/04/2020] [Indexed: 06/11/2023]
Abstract
Only vision-based navigation is the key of cost reduction and widespread application of indoor mobile robot. Consider the unpredictable nature of artificial environments, deep learning techniques can be used to perform navigation with its strong ability to abstract image features. In this paper, we proposed a low-cost way of only vision-based perception to realize indoor mobile robot navigation, converting the problem of visual navigation to scene classification. Existing related research based on deep scene classification network has lower accuracy and brings more computational burden. Additionally, the navigation system has not yet been fully assessed in the previous work. Therefore, we designed a shallow convolutional neural network (CNN) with higher scene classification accuracy and efficiency to process images captured by a monocular camera. Besides, we proposed an adaptive weighted control (AWC) algorithm and combined with regular control (RC) to improve the robot's motion performance. We demonstrated the capability and robustness of the proposed navigation method by performing extensive experiments in both static and dynamic unknown environments. The qualitative and quantitative results showed that the system performs better compared to previous related work in unknown environments.
Collapse
Affiliation(s)
- T Ran
- School of Mechanical Engineering, Xinjiang University, Urumqi, China.
| | - L Yuan
- School of Mechanical Engineering, Xinjiang University, Urumqi, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing, China.
| | - J B Zhang
- School of Mechanical Engineering, Xinjiang University, Urumqi, China.
| |
Collapse
|
24
|
Abstract
Despite over two decades of research on the neural mechanisms underlying human visual scene, or place, processing, it remains unknown what exactly a “scene” is. Intuitively, we are always inside a scene, while interacting with the outside of objects. Hence, we hypothesize that one diagnostic feature of a scene may be concavity, portraying “inside”, and predict that if concavity is a scene-diagnostic feature, then: 1) images that depict concavity, even non-scene images (e.g., the “inside” of an object – or concave object), will be behaviorally categorized as scenes more often than those that depict convexity, and 2) the cortical scene-processing system will respond more to concave images than to convex images. As predicted, participants categorized concave objects as scenes more often than convex objects, and, using functional magnetic resonance imaging (fMRI), two scene-selective cortical regions (the parahippocampal place area, PPA, and the occipital place area, OPA) responded significantly more to concave than convex objects. Surprisingly, we found no behavioral or neural differences between images of concave versus convex buildings. However, in a follow-up experiment, using tightly-controlled images, we unmasked a selective sensitivity to concavity over convexity of scene boundaries (i.e., walls) in PPA and OPA. Furthermore, we found that even highly impoverished line drawings of concave shapes are behaviorally categorized as scenes more often than convex shapes. Together, these results provide converging behavioral and neural evidence that concavity is a diagnostic feature of visual scenes.
Collapse
Affiliation(s)
- Annie Cheng
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Toronto, ON, Canada
| | - Soojin Park
- Department of Psychology, Yonsei University, Seoul, Republic of Korea.
| | - Daniel D Dilks
- Department of Psychology, Emory University, Atlanta, GA 30322, USA.
| |
Collapse
|
25
|
McCormick C, Dalton MA, Zeidman P, Maguire EA. Characterising the hippocampal response to perception, construction and complexity. Cortex 2021; 137:1-17. [PMID: 33571913 PMCID: PMC8048772 DOI: 10.1016/j.cortex.2020.12.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 11/25/2020] [Accepted: 12/04/2020] [Indexed: 12/31/2022]
Abstract
The precise role played by the hippocampus in supporting cognitive functions such as episodic memory and future thinking is debated, but there is general agreement that it involves constructing representations comprised of numerous elements. Visual scenes have been deployed extensively in cognitive neuroscience because they are paradigmatic multi-element stimuli. However, questions remain about the specificity and nature of the hippocampal response to scenes. Here, we devised a paradigm in which we had participants search pairs of images for either colour or layout differences, thought to be associated with perceptual or spatial constructive processes respectively. Importantly, images depicted either naturalistic scenes or phase-scrambled versions of the same scenes, and were either simple or complex. Using this paradigm during functional MRI scanning, we addressed three questions: 1. Is the hippocampus recruited specifically during scene processing? 2. If the hippocampus is more active in response to scenes, does searching for colour or layout differences influence its activation? 3. Does the complexity of the scenes affect its response? We found that, compared to phase-scrambled versions of the scenes, the hippocampus was more responsive to scene stimuli. Moreover, a clear anatomical distinction was evident, with colour detection in scenes engaging the posterior hippocampus whereas layout detection in scenes recruited the anterior hippocampus. The complexity of the scenes did not influence hippocampal activity. These findings seem to align with perspectives that propose the hippocampus is especially attuned to scenes, and its involvement occurs irrespective of the cognitive process or the complexity of the scenes.
Collapse
Affiliation(s)
- Cornelia McCormick
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3AR, UK
| | - Marshall A Dalton
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3AR, UK
| | - Peter Zeidman
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3AR, UK
| | - Eleanor A Maguire
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, WC1N 3AR, UK.
| |
Collapse
|
26
|
Abstract
The study of visual memory is typically concerned with an image's content: How well, and with what precision, we can recall which objects, people, or features we have seen in the past. But images also vary in their quality: The same object or scene may appear in an image that is sharp and highly resolved, or it may appear in an image that is blurry and faded. How do we remember those properties? Here six experiments demonstrate a new phenomenon of "vividness extension": a tendency to (mis)remember images as though they are "enhanced" versions of themselves - that is, sharper and higher quality than they actually appeared at the time of encoding. Subjects briefly saw images of scenes that varied in how blurry they were, and then adjusted a new image to be as blurry as the original. Unlike an old photograph that fades and blurs, subjects misremembered scenes as more vivid (i.e., less blurry) than those scenes had actually appeared moments earlier. Follow-up experiments extended this phenomenon to saturation and pixelation - with subjects recalling scenes as more colorful and resolved - and ruled out various forms of response bias. We suggest that memory misrepresents the quality of what we have seen, such that the world is remembered as more vivid than it is.
Collapse
|
27
|
Abstract
We live in a rich, three dimensional world with complex arrangements of meaningful objects. For decades, however, theories of visual attention and perception have been based on findings generated from lines and color patches. While these theories have been indispensable for our field, the time has come to move on from this rather impoverished view of the world and (at least try to) get closer to the real thing. After all, our visual environment consists of objects that we not only look at, but constantly interact with. Having incorporated the meaning and structure of scenes, i.e. its "grammar", then allows us to easily understand objects and scenes we have never encountered before. Studying this grammar provides us with the fascinating opportunity to gain new insights into the complex workings of attention, perception, and cognition. In this review, I will discuss how the meaning and the complex, yet predictive structure of real-world scenes influence attention allocation, search, and object identification.
Collapse
Affiliation(s)
- Melissa Le-Hoa Võ
- Department of Psychology, Johann Wolfgang-Goethe-Universität, Frankfurt, Germany. https://www.scenegrammarlab.com/
| |
Collapse
|
28
|
Abstract
How many pleasures can you track? In a previous study, we showed that people can simultaneously track the pleasure they experience from two images. Here, we push further, probing the individual and combined pleasures felt from seeing four images in one glimpse. Participants (N = 25) viewed 36 images spanning the entire range of pleasure. Each trial presented an array of four images, one in each quadrant of the screen, for 200 ms. On 80% of the trials, a central line cue pointed, randomly, at some screen corner either before (precue) or after (postcue) the images were shown. The cue indicated which image (the target) to rate while ignoring the others (distractors). On the other 20% of trials, an X cue requested a rating of the combined pleasure of all four images. Later, for baseline reference, we obtained a single-pleasure rating for each image shown alone. When precued, participants faithfully reported the pleasure of the target. When postcued, however, the mean ratings of images that are intensely pleasurable when seen alone (pleasure >4.5 on a 1-9 scale) dropped below baseline. Regardless of cue timing, the rating of the combined pleasure of four images was a linear transform of the average baseline pleasures of all four images. Thus, while people can faithfully track two pleasures, they cannot track four. Instead, the pleasure of otherwise above-medium-pleasure images is diminished, mimicking the effect of a distracting task.
Collapse
|
29
|
Abstract
Ensemble perception refers to awareness of average properties, e.g. size, of “noisy” elements that often comprise visual arrays in natural scenes. Here, we asked how ensemble perception might be influenced when some but not all array elements are associated with monetary reward. Previous studies show that reward associations can speed object processing, facilitate selection, and enhance working-memory maintenance, suggesting they may bias ensemble judgments. To investigate, participants reported the average element size of brief arrays of different-sized circles. In the learning phase, all circles had the same color, but different colors produced high or low performance-contingent rewards. Then, in an unrewarded test phase, arrays comprised three spatially inter-mixed subsets, each with a different color, including the high-reward color. In different trials, the mean size of the subset with the high-reward color was smaller, larger, or the same as the ensemble mean. Ensemble size estimates were significantly biased by the high-reward-associated subset, showing that value associations modulate ensemble perception. In the test phase of a second experiment, a pattern mask appeared immediately after array presentation to limit top-down processing. Not only was value-biasing eliminated, ensemble accuracy improved, suggesting that value associations distort consciously available ensemble representation via late high-level processing.
Collapse
|
30
|
Sulpizio V, Galati G, Fattori P, Galletti C, Pitzalis S. A common neural substrate for processing scenes and egomotion-compatible visual motion. Brain Struct Funct 2020; 225:2091-2110. [PMID: 32647918 PMCID: PMC7473967 DOI: 10.1007/s00429-020-02112-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 07/02/2020] [Indexed: 12/20/2022]
Abstract
Neuroimaging studies have revealed two separate classes of category-selective regions specialized in optic flow (egomotion-compatible) processing and in scene/place perception. Despite the importance of both optic flow and scene/place recognition to estimate changes in position and orientation within the environment during self-motion, the possible functional link between egomotion- and scene-selective regions has not yet been established. Here we reanalyzed functional magnetic resonance images from a large sample of participants performing two well-known “localizer” fMRI experiments, consisting in passive viewing of navigationally relevant stimuli such as buildings and places (scene/place stimulus) and coherently moving fields of dots simulating the visual stimulation during self-motion (flow fields). After interrogating the egomotion-selective areas with respect to the scene/place stimulus and the scene-selective areas with respect to flow fields, we found that the egomotion-selective areas V6+ and pIPS/V3A responded bilaterally more to scenes/places compared to faces, and all the scene-selective areas (parahippocampal place area or PPA, retrosplenial complex or RSC, and occipital place area or OPA) responded more to egomotion-compatible optic flow compared to random motion. The conjunction analysis between scene/place and flow field stimuli revealed that the most important focus of common activation was found in the dorsolateral parieto-occipital cortex, spanning the scene-selective OPA and the egomotion-selective pIPS/V3A. Individual inspection of the relative locations of these two regions revealed a partial overlap and a similar response profile to an independent low-level visual motion stimulus, suggesting that OPA and pIPS/V3A may be part of a unique motion-selective complex specialized in encoding both egomotion- and scene-relevant information, likely for the control of navigation in a structured environment.
Collapse
Affiliation(s)
- Valentina Sulpizio
- Department of Biomedical and Neuromotor Sciences-DIBINEM, University of Bologna, Piazza di Porta San Donato 2, 40126, Bologna, Italy. .,Department of Cognitive and Motor Rehabilitation and Neuroimaging, Santa Lucia Foundation (IRCCS Fondazione Santa Lucia), Rome, Italy.
| | - Gaspare Galati
- Department of Cognitive and Motor Rehabilitation and Neuroimaging, Santa Lucia Foundation (IRCCS Fondazione Santa Lucia), Rome, Italy.,Brain Imaging Laboratory, Department of Psychology, Sapienza University, Rome, Italy
| | - Patrizia Fattori
- Department of Biomedical and Neuromotor Sciences-DIBINEM, University of Bologna, Piazza di Porta San Donato 2, 40126, Bologna, Italy
| | - Claudio Galletti
- Department of Biomedical and Neuromotor Sciences-DIBINEM, University of Bologna, Piazza di Porta San Donato 2, 40126, Bologna, Italy
| | - Sabrina Pitzalis
- Department of Cognitive and Motor Rehabilitation and Neuroimaging, Santa Lucia Foundation (IRCCS Fondazione Santa Lucia), Rome, Italy.,Department of Movement, Human and Health Sciences, University of Rome ''Foro Italico'', Rome, Italy
| |
Collapse
|
31
|
Rosenholtz R. Demystifying visual awareness: Peripheral encoding plus limited decision complexity resolve the paradox of rich visual experience and curious perceptual failures. Atten Percept Psychophys 2020; 82:901-925. [PMID: 31970709 PMCID: PMC7303063 DOI: 10.3758/s13414-019-01968-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Human beings subjectively experience a rich visual percept. However, when behavioral experiments probe the details of that percept, observers perform poorly, suggesting that vision is impoverished. What can explain this awareness puzzle? Is the rich percept a mere illusion? How does vision work as well as it does? This paper argues for two important pieces of the solution. First, peripheral vision encodes its inputs using a scheme that preserves a great deal of useful information, while losing the information necessary to perform certain tasks. The tasks rendered difficult by the peripheral encoding include many of those used to probe the details of visual experience. Second, many tasks used to probe attentional and working memory limits are, arguably, inherently difficult, and poor performance on these tasks may indicate limits on decision complexity. Two assumptions are critical to making sense of this hypothesis: (1) All visual perception, conscious or not, results from performing some visual task; and (2) all visual tasks face the same limit on decision complexity. Together, peripheral encoding plus decision complexity can explain a wide variety of phenomena, including vision's marvelous successes, its quirky failures, and our rich subjective impression of the visual world.
Collapse
Affiliation(s)
- Ruth Rosenholtz
- MIT Department of Brain & Cognitive Sciences, CSAIL, Cambridge, MA, 02139, USA.
| |
Collapse
|
32
|
Abstract
How do we determine where to focus our attention in real-world scenes? Image saliency theory proposes that our attention is 'pulled' to scene regions that differ in low-level image features. However, models that formalize image saliency theory often contain significant scene-independent spatial biases. In the present studies, three different viewing tasks were used to evaluate whether image saliency models account for variance in scene fixation density based primarily on scene-dependent, low-level feature contrast, or on their scene-independent spatial biases. For comparison, fixation density was also compared to semantic feature maps (Meaning Maps; Henderson & Hayes, Nature Human Behaviour, 1, 743-747, 2017) that were generated using human ratings of isolated scene patches. The squared correlations (R2) between scene fixation density and each image saliency model's center bias, each full image saliency model, and meaning maps were computed. The results showed that in tasks that produced observer center bias, the image saliency models on average explained 23% less variance in scene fixation density than their center biases alone. In comparison, meaning maps explained on average 10% more variance than center bias alone. We conclude that image saliency theory generalizes poorly to real-world scenes.
Collapse
Affiliation(s)
- Taylor R Hayes
- Center for Mind and Brain, University of California, Davis, CA, USA.
| | - John M Henderson
- Center for Mind and Brain, University of California, Davis, CA, USA
- Department of Psychology, University of California, Davis, CA, USA
| |
Collapse
|
33
|
Abstract
How are outliers in an otherwise homogeneous object ensemble represented by our visual system? Are outliers ignored because they are the minority? Or do outliers alter our perception of an otherwise homogeneous ensemble? We have previously demonstrated ensemble representation in human anterior-medial ventral visual cortex (overlapping the scene-selective parahippocampal place area; PPA). In this study we investigated how outliers impact object-ensemble representation in this human brain region as well as visual representation throughout posterior brain regions. We presented a homogeneous ensemble followed by an ensemble containing either identical elements or a majority of identical elements with a few outliers. Human participants ignored the outliers and made a same/different judgment between the two ensembles. In PPA, fMRI adaptation was observed when the outliers in the second ensemble matched the items in the first, even though the majority of the elements in the second ensemble were distinct from those in the first; conversely, release from fMRI adaptation was observed when the outliers in the second ensemble were distinct from the items in the first, even though the majority of the elements in the second ensemble were identical to those in the first. A similarly robust outlier effect was also found in other brain regions, including a shape-processing region in lateral occipital cortex (LO) and task-processing fronto-parietal regions. These brain regions likely work in concert to flag the presence of outliers during visual perception and then weigh the outliers appropriately in subsequent behavioral decisions. To our knowledge, this is the first time the neural mechanisms involved in outlier processing have been systematically documented in the human brain. Such an outlier effect could well provide the neural basis mediating our perceptual experience in situations like "one bad apple spoils the whole bushel".
Collapse
Affiliation(s)
- Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, M1C 1A4, Canada.
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, 06477, USA
| |
Collapse
|
34
|
Trouilloud A, Kauffmann L, Roux-Sibilon A, Rossel P, Boucart M, Mermillod M, Peyrin C. Rapid scene categorization: From coarse peripheral vision to fine central vision. Vision Res 2020; 170:60-72. [PMID: 32259648 DOI: 10.1016/j.visres.2020.02.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 12/12/2019] [Accepted: 02/17/2020] [Indexed: 12/01/2022]
Abstract
Studies on scene perception have shown that the rapid extraction of low spatial frequencies (LSF) allows a coarse parsing of the scene, prior to the analysis of high spatial frequencies (HSF) containing details. Many studies suggest that scene gist recognition can be achieved with only the low resolution of peripheral vision. Our study investigated the advantage of peripheral vision on central vision during a scene categorization task (indoor vs. outdoor). In Experiment 1, we used large scene photographs from which we built one central disk and four circular rings of different eccentricities. The central disk either contained or not an object semantically related to the scene category. Results showed better categorization performances for the peripheral rings, despite the presence of an object in central vision that was semantically related to the scene category that significantly improved categorization performances. In Experiment 2, the central disk and rings were assembled from Central to Peripheral vision (CtP sequence) or from Peripheral to Central vision (PtC sequence). Results revealed better performances for PtC than CtP sequences, except when no central object was present under rapid categorization constraints. As Experiment 3 suggested that the PtC advantage was not explained by a reduction of the visibility of the object in the central disk by the surrounding peripheral rings (CtP sequence), results are interpreted in the context of a predominant coarse-to-fine processing during scene categorization, with greater efficiency and utility of coarse peripheral vision relative to fine central vision during rapid scene categorization.
Collapse
Affiliation(s)
- Audrey Trouilloud
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Louise Kauffmann
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France; Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Alexia Roux-Sibilon
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Pauline Rossel
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Muriel Boucart
- SCALab, University of Lille, Centre National de la Recherche Scientifique, Lille, France
| | - Martial Mermillod
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Carole Peyrin
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France.
| |
Collapse
|
35
|
van Renswoude DR, Raijmakers MEJ, Visser I. Looking (for) patterns: Similarities and differences between infant and adult free scene-viewing patterns. J Eye Mov Res 2020; 13:10.16910/jemr.13.1.2. [PMID: 33828784 PMCID: PMC7881888 DOI: 10.16910/jemr.13.1.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Systematic tendencies such as the center and horizontal bias are known to have a large influence on how and where we move our eyes during static onscreen free scene viewing. However, it is unknown whether these tendencies are learned viewing strategies or are more default tendencies in the way we move our eyes. To gain insight into the origin of these tendencies we explore the systematic tendencies of infants (3 - 20-month-olds, N = 157) and adults (N = 88) in three different scene viewing data sets. We replicated com-mon findings, such as longer fixation durations and shorter saccade amplitudes in infants compared to adults. The leftward bias was never studied in infants, and our results indi-cate that it is not present, while we did replicate the leftward bias in adults. The general pattern of the results highlights the similarity between infant and adult eye movements. Similar to adults, infants' fixation durations increase with viewing time and the depend-encies between successive fixations and saccades show very similar patterns. A straight-forward conclusion to draw from this set of studies is that infant and adult eye movements are mainly driven by similar underlying basic processes.
Collapse
|
36
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|
37
|
Abstract
A growing number of studies suggest that semantic knowledge can influence the control of gaze in scenes. For example, observers are more likely to look toward objects that are semantically related to the currently fixated object. Recent evidence also suggests that an object's functional orientation can bias gaze direction. However, it is unknown whether these semantic and functional relationships can interact to determine gaze control. To address this issue, the present study assessed whether the functional arrangement of multiple objects can influence gaze control. Participants fixated a central object (e.g., a key) flanked by two peripheral objects. After a brief delay, participants were free to shift their gaze toward the peripheral object of their choice. One of the peripheral objects was semantically related to the central object (e.g., a lock), and the objects were arranged to depict a functional or non-functional interaction (e.g., a key pointing toward or away from a lock). When the orientation of the central object was manipulated, participants were more likely to look in the direction this object was pointing. Moreover, the functional arrangement of objects modulated this central orienting bias. However, when the orientation of the peripheral objects was manipulated, only the peripheral objects' semantic relationships influenced gaze control. Together, these findings suggest that functional relationships play an important role in the allocation of gaze, and can interact with semantic relationships to determine gaze control.
Collapse
|
38
|
Owens JW, Chaparro BS, Palmer EM. Exploring website gist through rapid serial visual presentation. Cogn Res Princ Implic 2019; 4:44. [PMID: 31748970 PMCID: PMC6868081 DOI: 10.1186/s41235-019-0192-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 08/05/2019] [Indexed: 11/24/2022]
Abstract
Background Users can make judgments about web pages in a glance. Little research has explored what semantic information can be extracted from a web page within a single fixation or what mental representations users have of web pages, but the scene perception literature provides a framework for understanding how viewers can extract and represent diverse semantic information from scenes in a glance. The purpose of this research was (1) to explore whether semantic information about a web page could be extracted within a single fixation and (2) to explore the effects of size and resolution on extracting this information. Using a rapid serial visual presentation (RSVP) paradigm, Experiment 1 explored whether certain semantic categories of websites (i.e., news, search, shopping, and social networks/blogs) could be detected within a RSVP stream of web page stimuli. Natural scenes, which have been shown to be detectable within a single fixation in the literature, served as a baseline for comparison. Experiment 2 examined the effects of stimulus size and resolution on observers’ ability to detect the presence of website categories using similar methods. Results Findings from this research demonstrate that users have conceptual models of websites that allow detection of web pages from a fixation’s worth of stimulus exposure, when provided additional time for processing. For website categories other than search, detection performance decreased significantly when web elements were no longer discernible due to decreases in size and/or resolution. The implications of this research are that website conceptual models rely more on page elements and less on the spatial relationship between these elements. Conclusions Participants can detect websites accurately when they were displayed for less than a fixation and when the participants were allowed additional processing time. Subjective comments and stimulus onset asynchrony data suggested that participants likely relied on local features for the detection of website targets for several website categories. This notion was supported when the size and/or resolution of stimuli were decreased to the extent that web elements were indistinguishable. This demonstrates that schemas or conceptualizations of websites provided information sufficient to detect websites from approximately 140 ms of stimulus exposure.
Collapse
Affiliation(s)
- Justin W Owens
- Department of Psychology, Wichita State University, Wichita, KS, USA.,Google, Inc., Mountain View, CA, USA
| | - Barbara S Chaparro
- Department of Psychology, Wichita State University, Wichita, KS, USA.,Department of Human Factors and Behavioral Neurobiology, Embry Riddle Aeronautical University, Daytona Beach, FL, USA
| | - Evan M Palmer
- Department of Psychology, Wichita State University, Wichita, KS, USA. .,Department of Psychology, San José State University, San Jose, CA, USA.
| |
Collapse
|
39
|
Turk-Browne NB. The hippocampus as a visual area organized by space and time: A spatiotemporal similarity hypothesis. Vision Res 2019; 165:123-30. [PMID: 31734633 DOI: 10.1016/j.visres.2019.10.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 10/21/2019] [Accepted: 10/23/2019] [Indexed: 02/06/2023]
Abstract
The hippocampus is the canonical memory system in the brain and is not typically considered part of the visual system. Yet, it sits atop the ventral visual stream and has been implicated in certain aspects of vision. Here I review the place of the hippocampal memory system in vision science. After a brief primer on the local circuity, external connectivity, and computational functions of the hippocampus, I explore what can be learned from each field about the other. I first present four areas of vision science (scene perception, imagery, eye movements, attention) that challenge our current understanding of the hippocampus in terms of its role in episodic memory. In the reverse direction, I leverage this understanding to inform vision science in other ways, presenting a working hypothesis about a unique form of visual representation. This spatiotemporal similarity hypothesis states that the hippocampus represents objects according to whether they co-occur in space and/or time, and not whether they look alike, as elsewhere in the visual system. This tuning may reflect hippocampal mechanisms of pattern separation, relational binding, and statistical learning, allowing the hippocampus to generate visual expectations to facilitate search and recognition.
Collapse
|
40
|
Abstract
Comics are complex documents whose reception engages cognitive processes such as scene perception, language processing, and narrative understanding. Possibly because of their complexity, they have rarely been studied in cognitive science. Modeling the stimulus ideally requires a formal description, which can be provided by feature descriptors from computer vision and computational linguistics. With a focus on document analysis, here we review work on the computational modeling of comics. We argue that the development of modern feature descriptors based on deep learning techniques has made sufficient progress to allow the investigation of complex material such as comics for reception studies, including experimentation and computational modeling of cognitive processes.
Collapse
Affiliation(s)
| | - Alexander Dunst
- Department of English and American Studies, University of Paderborn
| |
Collapse
|
41
|
Summerfield C, Luyckx F, Sheahan H. Structure learning and the posterior parietal cortex. Prog Neurobiol 2019; 184:101717. [PMID: 31669186 DOI: 10.1016/j.pneurobio.2019.101717] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 09/09/2019] [Accepted: 09/30/2019] [Indexed: 11/29/2022]
Abstract
We propose a theory of structure learning in the primate brain. We argue that the parietal cortex is critical for learning about relations among the objects and categories that populate a visual scene. We suggest that current deep learning models exhibit poor global scene understanding because they fail to perform the relational inferences that occur in the primate dorsal stream. We review studies of neural coding in primate posterior parietal cortex (PPC), drawing the conclusion that neurons in this brain area represent potentially high-dimensional inputs on a low-dimensional manifold that encodes the relative position of objects or features in physical space, and relations among entities in abstract conceptual space. We argue that this low-dimensional code supports generalisation of relational information, even in nonspatial domains. Finally, we propose that structure learning is grounded in the actions that primates take when they reach for objects or fixate them with their eyes. We sketch a model of how this might occur in neural circuits.
Collapse
Affiliation(s)
- Christopher Summerfield
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK.
| | - Fabrice Luyckx
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| | - Hannah Sheahan
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| |
Collapse
|
42
|
Ferrara K, Landau B, Park S. Impaired behavioral and neural representation of scenes in Williams syndrome. Cortex 2019; 121:264-276. [PMID: 31655392 DOI: 10.1016/j.cortex.2019.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/12/2019] [Accepted: 09/01/2019] [Indexed: 01/08/2023]
Abstract
Boundaries are crucial to our representation of the geometric shape of scenes, which can be used to reorient in space. Behavioral research has shown that children and adults share exquisite sensitivity to a defining feature of a boundary: its vertical extent. Imaging studies have shown that this boundary property is represented in the parahippocampal place area (PPA) among typically developed (TD) adults. Here, we show that sensitivity to the vertical extent of scene boundaries is impaired at both the behavioral and neural level in people with Williams syndrome (WS), a genetic deficit that results in severely impaired spatial functions. Behavioral reorientation was tested in three boundary conditions: a flat Mat, a 5 cm high Curb, and full Walls. Adults with WS could reorient in a rectangular space defined by Wall boundaries, but not Curb or Mat boundaries. In contrast, TD age-matched controls could reorient by all three boundary types and TD 4-year-olds could reorient by either Wall or Curb boundaries. Using fMRI, we find that the WS behavioral deficit is echoed in their neural representation of boundaries. While TD age-matched controls showed distinct neural responses to scenes depicting Mat, Curb, and Wall boundaries in the PPA, people with WS showed only a distinction between the Wall and Mat or Curb, but no distinction between the Mat and Curb. Taken together, these results reveal a close coupling between the representation of boundaries as they are used in behavioral reorientation and neural encoding, suggesting that damage to this key element of spatial representation may have a genetic foundation.
Collapse
Affiliation(s)
- Katrina Ferrara
- Department of Cognitive Science, Johns Hopkins University, USA; Center for Brain Plasticity and Recovery, Georgetown University, USA.
| | - Barbara Landau
- Department of Cognitive Science, Johns Hopkins University, USA.
| | - Soojin Park
- Department of Cognitive Science, Johns Hopkins University, USA; Department of Psychology, Yonsei University, South Korea.
| |
Collapse
|
43
|
Zeni S, Laudanna I, Baruffaldi F, Heimler B, Melcher D, Pavani F. Increased overt attention to objects in early deaf adults: An eye-tracking study of complex naturalistic scenes. Cognition 2019; 194:104061. [PMID: 31514103 DOI: 10.1016/j.cognition.2019.104061] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 08/27/2019] [Accepted: 08/28/2019] [Indexed: 11/25/2022]
Abstract
The study of selective attention in people with profound deafness has repeatedly documented enhanced attention to the peripheral regions of the visual field compared to hearing controls. This finding emerged from covert attention studies (i.e., without eye-movements) involving extremely simplified visual scenes and comprising few visual items. In this study, we aimed to test whether this key finding extends also to overt attention, using a more ecologically valid experimental context in which complex naturalistic images were presented for 3 s. In Experiment 1 (N = 35), all images contained a single central object superimposed on a congruent naturalistic background (e.g., a tiger in the woods). At the end of the visual exploration phase, an incidental memory task probed the participants' recollection of the seen central objects and image backgrounds. Results showed that hearing controls explored and remembered the image backgrounds more than deaf participants, who lingered on the central object to a greater extent. In Experiment 2 we aimed to disentangle if this behaviour of deaf participants reflected a bias in overt space-based attention towards the centre of the image, or instead, enhanced object-centred attention. We tested new participants (N = 42) in the visual exploration task adding images with lateralized objects, as well as images with multiple object or images without any object. Results confirmed increased exploration of objects in deaf participants. Taken together our novel findings show limitations of the well-known peripheral attention bias of deaf people and suggest that visual object-centred attention may also change after prolonged auditory deprivation.
Collapse
Affiliation(s)
- Silvia Zeni
- Center for Mind Brain Sciences, CIMeC, University of Trento, Italy; School of Psychology, University of Nottingham, UK.
| | - Irene Laudanna
- Center for Mind Brain Sciences, CIMeC, University of Trento, Italy; Dep. of Psychology and Cognitive Science, University of Trento, Italy
| | | | - Benedetta Heimler
- The Edmond and Lily Safra Center for Brain Research, Hebrew University of Jerusalem Hadassah Ein-Kerem, Jerusalem, Israel; Department of Medical Neurobiology, Institute for Medical Research Israel-Canada, Faculty of Medicine, Hebrew University of Jerusalem, Hadassah Ein-Kerem, Jerusalem, Israel
| | - David Melcher
- Center for Mind Brain Sciences, CIMeC, University of Trento, Italy; Dep. of Psychology and Cognitive Science, University of Trento, Italy
| | - Francesco Pavani
- Center for Mind Brain Sciences, CIMeC, University of Trento, Italy; Dep. of Psychology and Cognitive Science, University of Trento, Italy; Integrative Multisensory Perception Action & Cognition Team, CRNL, France.
| |
Collapse
|
44
|
Loschky LC, Larson AM, Smith TJ, Magliano JP. The Scene Perception & Event Comprehension Theory (SPECT) Applied to Visual Narratives. Top Cogn Sci 2019; 12:311-351. [PMID: 31486277 PMCID: PMC9328418 DOI: 10.1111/tops.12455] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 08/05/2019] [Accepted: 08/05/2019] [Indexed: 11/29/2022]
Abstract
Understanding how people comprehend visual narratives (including picture stories, comics, and film) requires the combination of traditionally separate theories that span the initial sensory and perceptual processing of complex visual scenes, the perception of events over time, and comprehension of narratives. Existing piecemeal approaches fail to capture the interplay between these levels of processing. Here, we propose the Scene Perception & Event Comprehension Theory (SPECT), as applied to visual narratives, which distinguishes between front‐end and back‐end cognitive processes. Front‐end processes occur during single eye fixations and are comprised of attentional selection and information extraction. Back‐end processes occur across multiple fixations and support the construction of event models, which reflect understanding of what is happening now in a narrative (stored in working memory) and over the course of the entire narrative (stored in long‐term episodic memory). We describe relationships between front‐ and back‐end processes, and medium‐specific differences that likely produce variation in front‐end and back‐end processes across media (e.g., picture stories vs. film). We describe several novel research questions derived from SPECT that we have explored. By addressing these questions, we provide greater insight into how attention, information extraction, and event model processes are dynamically coordinated to perceive and understand complex naturalistic visual events in narratives and the real world. Comprehension of visual narratives like comics, picture stories, and films involves both decoding the visual content and construing the meaningful events they represent. The Scene Perception & Event Comprehension Theory (SPECT) proposes a framework for understanding how a comprehender perceptually negotiates the surface of a visual representation and integrates its meaning into a growing mental model.
Collapse
Affiliation(s)
| | | | - Tim J Smith
- Department of Psychological Sciences, Birkbeck, University of London
| | | |
Collapse
|
45
|
Avivi-Reich M, Fifield B, Schneider BA. Can the diffuseness of sound sources in an auditory scene alter speech perception? Atten Percept Psychophys 2020; 82:1443-58. [PMID: 31410762 DOI: 10.3758/s13414-019-01808-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When amplification is used, sound sources are often presented over multiple loudspeakers, which can alter their timbre, and introduce comb-filtering effects. Increasing the diffuseness of a sound by presenting it over spatially separated loudspeakers might affect the listeners' ability to form a coherent auditory image of it, alter its perceived spatial position, and may even affect the extent to which it competes for the listener's attention. In addition, it can lead to comb-filtering effects that can alter the spectral profiles of sounds arriving at the ears. It is important to understand how these changes affect speech perception. In this study, young adults were asked to repeat nonsense sentences presented in either noise, babble, or speech. Participants were divided into two groups: (1) A Compact-Target Timbre group where the target sentences were presented over a single loudspeaker (compact target), while the masker was either presented over three loudspeakers (diffuse) or over a single loudspeaker (compact); (2) A Diffuse-Target Timbre group, where the target sentences were diffuse while the masker was either compact or diffuse. Timbre had no significant effect in the absence of a timbre contrast between target and masker. However, when there was a timbre contrast, the signal-to-noise ratios needed for 50% correct recognition of the target speech were higher (worse) when the masker was compact, and lower (better) when the target was compact. These results were consistent with the expected effects from comb filtering, and could also reflect a tendency for attention to be drawn towards compact sound sources.
Collapse
|
46
|
Bilalić M, Lindig T, Turella L. Parsing rooms: the role of the PPA and RSC in perceiving object relations and spatial layout. Brain Struct Funct 2019; 224:2505-24. [PMID: 31317256 DOI: 10.1007/s00429-019-01901-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 06/01/2019] [Indexed: 11/25/2022]
Abstract
The perception of a scene involves grasping the global space of the scene, usually called the spatial layout, as well as the objects in the scene and the relations between them. The main brain areas involved in scene perception, the parahippocampal place area (PPA) and retrosplenial cortex (RSC), are supposed to mostly support the processing of spatial layout. Here we manipulated the objects and their relations either by arranging objects within rooms in a common way or by scattering them randomly. The rooms were then varied for spatial layout by keeping or removing the walls of the room, a typical layout manipulation. We then combined a visual search paradigm, where participants actively search for an object within the room, with multivariate pattern analysis (MVPA). Both left and right PPA were sensitive to the layout properties, but the right PPA was also sensitive to the object relations even when the information about objects and their relations is used in the cross-categorization procedure on novel stimuli. The left and right RSC were sensitive to both spatial layout and object relations, but could only use the information about object relations for cross-categorization to novel stimuli. These effects were restricted to the PPA and RSC, as other control brain areas did not display the same pattern of results. Our results underline the importance of employing paradigms that require participants to explicitly retrieve domain-specific processes and indicate that objects and their relations are processed in the scene areas to a larger extent than previously assumed.
Collapse
|
47
|
Green DM, Wilcock JA, Takarangi MKT. The role of arousal in boundary judgement errors. Mem Cognit 2019; 47:968-82. [PMID: 30888643 DOI: 10.3758/s13421-019-00914-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Eyewitnesses to a crime rely heavily on their visual memory; however, there are many ways that the details of visual scenes can be missed, or distorted. In particular, for emotional scenes, the "boundaries" are narrowed at retrieval, whereas central details-such as a weapon-are remembered in greater detail. This phenomenon is known as boundary restriction, the reverse of boundary extension whereby people tend to expand the boundaries of a neutral scene at retrieval. In the present series of experiments, we investigated whether arousal is the element of an emotional scene that leads to increased boundary restriction or reduced boundary extension. We presented neutral images to participants either with or without a stress-inducing noise. In Experiment 1a and 1b, at test, participants viewed the image they originally viewed next to the same image but with narrower or wider boundaries and selected which of the two images they originally viewed. In Experiment 2, at test, participants viewed the identical image they originally viewed, but were told the boundaries had been changed. Participants selected the extent to which the images at test had restricted or extended boundaries compared to their memory of the original image. When the noise stressor was present, participants made more boundary restriction errors-selecting the image with narrower boundaries than the original-and fewer boundary extension errors than when the noise was absent. Our data suggest that arousal plays a key role in boundary judgements.
Collapse
|
48
|
Peacock CE, Hayes TR, Henderson JM. The role of meaning in attentional guidance during free viewing of real-world scenes. Acta Psychol (Amst) 2019; 198:102889. [PMID: 31302302 DOI: 10.1016/j.actpsy.2019.102889] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 06/27/2019] [Accepted: 07/05/2019] [Indexed: 10/26/2022] Open
Abstract
In real-world vision, humans prioritize the most relevant visual information at the expense of other information via attentional selection. The current study sought to understand the role of semantic features and image features on attentional selection during free viewing of real-world scenes. We compared the ability of meaning maps generated from ratings of isolated, context-free image patches and saliency maps generated from the Graph-Based Visual Saliency model to predict the spatial distribution of attention in scenes as measured by eye movements. Additionally, we introduce new contextualized meaning maps in which scene patches were rated based upon how informative or recognizable they were in the context of the scene from which they derived. We found that both context-free and contextualized meaning explained significantly more of the overall variance in the spatial distribution of attention than image salience. Furthermore, meaning explained early attention to a significantly greater extent than image salience, contrary to predictions of the 'saliency first' hypothesis. Finally, both context-free and contextualized meaning predicted attention equivalently. These results support theories in which meaning plays a dominant role in attentional guidance during free viewing of real-world scenes.
Collapse
|
49
|
Wolfe B, Sawyer BD, Kosovicheva A, Reimer B, Rosenholtz R. Detection of brake lights while distracted: Separating peripheral vision from cognitive load. Atten Percept Psychophys 2019; 81:2798-813. [PMID: 31222659 DOI: 10.3758/s13414-019-01795-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Drivers rarely focus exclusively on driving, even with the best of intentions. They are distracted by passengers, navigation systems, smartphones, and driver assistance systems. Driving itself requires performing simultaneous tasks, including lane keeping, looking for signs, and avoiding pedestrians. The dangers of multitasking while driving, and efforts to combat it, often focus on the distraction itself, rather than on how a distracting task can change what the driver can perceive. Critically, some distracting tasks require the driver to look away from the road, which forces the driver to use peripheral vision to detect driving-relevant events. As a consequence, both looking away and being distracted may degrade driving performance. To assess the relative contributions of these factors, we conducted a laboratory experiment in which we separately varied cognitive load and point of gaze. Subjects performed a visual 0-back or 1-back task at one of four fixation locations superimposed on a real-world driving video, while simultaneously monitoring for brake lights in their lane of travel. Subjects were able to detect brake lights in all conditions, but once the eccentricity of the brake lights increased, they responded more slowly and missed more braking events. However, our cognitive load manipulation had minimal effects on detection performance, reaction times, or miss rates for brake lights. These results suggest that, for tasks that require the driver to look off-road, the decrements observed may be due to the need to use peripheral vision to monitor the road, rather than due to the distraction itself.
Collapse
|
50
|
Carrigan AJ, Wardle SG, Rich AN. Do target detection and target localization always go together? Extracting information from briefly presented displays. Atten Percept Psychophys 2019; 81:2685-99. [PMID: 31218599 DOI: 10.3758/s13414-019-01782-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The human visual system is capable of processing an enormous amount of information in a short time. Although rapid target detection has been explored extensively, less is known about target localization. Here we used natural scenes and explored the relationship between being able to detect a target (present vs. absent) and being able to localize it. Across four presentation durations (~ 33-199 ms), participants viewed scenes taken from two superordinate categories (natural and manmade), each containing exemplars from four basic scene categories. In a two-interval forced choice task, observers were asked to detect a Gabor target inserted in one of the two scenes. This was followed by one of two different localization tasks. Participants were asked either to discriminate whether the target was on the left or the right side of the display or to click on the exact location where they had seen the target. Targets could be detected and localized at our shortest exposure duration (~ 33 ms), with a predictable improvement in performance with increasing exposure duration. We saw some evidence at this shortest duration of detection without localization, but further analyses demonstrated that these trials typically reflected coarse or imprecise localization information, rather than its complete absence. Experiment 2 replicated our main findings while exploring the effect of the level of "openness" in the scene. Our results are consistent with the notion that when we are able to extract what objects are present in a scene, we also have information about where each object is, which provides crucial guidance for our goal-directed actions.
Collapse
|