1
|
Salsano I, Petro NM, Picci G, Petts AJ, Glesinger RJ, Horne LK, Coutant AT, Ende GC, John JA, Rice DL, Garrison GM, Kress KA, Santangelo V, Coco MI, Wilson TW. Blending into naturalistic scenes: Cortical regions serving visual search are more strongly activated in congruent contexts. Neuroimage 2025; 311:121214. [PMID: 40222499 PMCID: PMC12036007 DOI: 10.1016/j.neuroimage.2025.121214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Revised: 04/01/2025] [Accepted: 04/11/2025] [Indexed: 04/15/2025] Open
Abstract
Visual attention allows us to navigate complex environments by selecting behaviorally relevant stimuli while suppressing distractors, through a dynamic balance between top-down and bottom-up mechanisms. Extensive attention research has examined the object-context relationship. Some studies have shown that incongruent object-context associations are processed faster, likely due to semantic mismatch-related attentional capture, while others have suggested that schema-driven facilitation may enhance object recognition when the object and context are congruent. Beyond the conflicting findings, translation of this work to real world contexts has been difficult due to the use of non-ecological scenes and stimuli when investigating the object-context congruency relationship. To address this, we employed a goal-directed visual search task and naturalistic indoor scenes during functional MRI (fMRI). Seventy-one healthy adults searched for a target object, either congruent or incongruent within the scene context, following a word cue. We collected accuracy and response time behavioral data, and all fMRI data were processed following standard pipelines, with statistical maps thresholded at p < .05 following multiple comparisons correction. Our results indicated faster response times for incongruent relative to congruent trials, likely reflecting the so-called pop-out effect of schema violations in the incongruent condition. Our neural results indicated that congruent elicited greater activation than incongruent trials in the dorsal frontoparietal attention network and the precuneus, likely reflecting sustained top-down attentional control to locate the targets that blend more seamlessly into the context. These findings highlight the flexible interplay between top-down and bottom-up mechanisms in real-world visual search, emphasizing the dominance of schema-guided top-down processes in congruent contexts and rapid attention capture in incongruent contexts.
Collapse
Affiliation(s)
- Ilenia Salsano
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Nathan M Petro
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Giorgia Picci
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States; Department of Pharmacology & Neuroscience, Creighton University, Omaha, NE, United States
| | - Aubrie J Petts
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Ryan J Glesinger
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Lucy K Horne
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Anna T Coutant
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Grace C Ende
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Jason A John
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Danielle L Rice
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Grant M Garrison
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Kennedy A Kress
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States
| | - Valerio Santangelo
- Santa Lucia Foundation IRCCS, Rome, Italy; Department of Philosophy, Social Sciences & Education, University of Perugia, Perugia, Italy
| | - Moreno I Coco
- Santa Lucia Foundation IRCCS, Rome, Italy; Sapienza University of Rome, Rome, Italy
| | - Tony W Wilson
- Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE, United States; Center for Pediatric Brain Health, Boys Town National Research Hospital, Boys Town, NE, United States; Department of Pharmacology & Neuroscience, Creighton University, Omaha, NE, United States.
| |
Collapse
|
2
|
Clark DPA, Donnelly N. An exploration of the influence of animal and object categories on recall of item location following an incidental learning task. Q J Exp Psychol (Hove) 2025; 78:474-489. [PMID: 38426458 PMCID: PMC11874500 DOI: 10.1177/17470218241238737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 12/15/2023] [Accepted: 02/19/2024] [Indexed: 03/02/2024]
Abstract
The current study explores the role of attention in location memory for animals and objects. Participants completed an incidental learning task where they rated animals and objects with regard to either their ease of collection to win a scavenger hunt (Experiments 1a and b) or their distance from the centre of the computer screen (Experiment 2). The images of animals and objects were pseudo-randomly positioned on the screen in both experiments. After completing the incidental learning task (and a reverse counting distractor task), participants were then given a surprise location memory recall task. In the location memory recall task, items were shown in the centre of the screen and participants used the mouse to indicate the position the item had been shown during the incidental encoding task. The results of both experiments show that location memory for objects was more accurate than for animals. While we cannot definitively identify the mechanism responsible for the difference in the location memory of objects and animals, we propose that differences in the influence of object-based attention at encoding affect location memory when tested at recall.
Collapse
Affiliation(s)
- Dan PA Clark
- Department of Psychology, Liverpool Hope University, Liverpool, UK
| | - Nick Donnelly
- Department of Psychology, Liverpool Hope University, Liverpool, UK
| |
Collapse
|
3
|
Reger M, Vrabie O, Volberg G, Lingnau A. Actions at a glance: The time course of action, object, and scene recognition in a free recall paradigm. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2025:10.3758/s13415-025-01272-6. [PMID: 40011402 DOI: 10.3758/s13415-025-01272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/02/2025] [Indexed: 02/28/2025]
Abstract
Being able to quickly recognize other people's actions lies at the heart of our ability to efficiently interact with our environment. Action recognition has been suggested to rely on the analysis and integration of information from different perceptual subsystems, e.g., for the processing of objects and scenes. However, stimulus presentation times that are required to extract information about actions, objects, and scenes to our knowledge have not yet been directly compared. To address this gap in the literature, we compared the recognition thresholds for actions, objects, and scenes. First, 30 participants were presented with grayscale images depicting different actions at variable presentation times (33-500 ms) and provided written descriptions of each image. Next, ten naïve raters evaluated these descriptions with respect to the presence and accuracy of information related to actions, objects, scenes, and sensory information. Comparing thresholds across presentation times, we found that recognizing actions required shorter presentation times (from 60 ms onwards) than objects (68 ms) and scenes (84 ms). More specific actions required presentation times of approximately 100 ms. Moreover, thresholds were modulated by action category, with the lowest thresholds for locomotion and the highest thresholds for food-related actions. Together, our data suggest that perceptual evidence for actions, objects, and scenes is gathered in parallel when these are presented in the same scene but accumulates faster for actions that reflect static body posture recognition than for objects and scenes.
Collapse
Affiliation(s)
- Maximilian Reger
- Faculty of Human Sciences, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Oleg Vrabie
- Faculty of Human Sciences, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Gregor Volberg
- Faculty of Human Sciences, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Angelika Lingnau
- Faculty of Human Sciences, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany.
| |
Collapse
|
4
|
Duarte SE, Yonelinas AP, Ghetti S, Geng JJ. Multisensory processing impacts memory for objects and their sources. Mem Cognit 2025; 53:646-665. [PMID: 38831161 PMCID: PMC11868352 DOI: 10.3758/s13421-024-01592-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/05/2024]
Abstract
Multisensory object processing improves recognition memory for individual objects, but its impact on memory for neighboring visual objects and scene context remains largely unknown. It is therefore unclear how multisensory processing impacts episodic memory for information outside of the object itself. We conducted three experiments to test the prediction that the presence of audiovisual objects at encoding would improve memory for nearby visual objects, and improve memory for the environmental context in which they occurred. In Experiments 1a and 1b, participants viewed audiovisual-visual object pairs or visual-visual object pairs with a control sound during encoding and were subsequently tested on their memory for each object individually. In Experiment 2, objects were paired with semantically congruent or meaningless control sounds and appeared within four different scene environments. Memory for the environment was tested. Results from Experiments 1a and 1b showed that encoding a congruent audiovisual object did not significantly benefit memory for neighboring visual objects, but Experiment 2 showed that encoding a congruent audiovisual object did improve memory for the environments in which those objects were encoded. These findings suggest that multisensory processing can influence memory beyond the objects themselves and that it has a unique role in episodic memory formation. This is particularly important for understanding how memories and associations are formed in real-world situations, in which objects and their surroundings are often multimodal.
Collapse
Affiliation(s)
- Shea E Duarte
- Department of Psychology, University of California, Davis, CA, 95616, USA.
- Center for Mind and Brain, University of California, Davis, CA, 95618, USA.
| | - Andrew P Yonelinas
- Department of Psychology, University of California, Davis, CA, 95616, USA
- Center for Neuroscience, University of California, Davis, CA, 95618, USA
| | - Simona Ghetti
- Department of Psychology, University of California, Davis, CA, 95616, USA
- Center for Mind and Brain, University of California, Davis, CA, 95618, USA
| | - Joy J Geng
- Department of Psychology, University of California, Davis, CA, 95616, USA
- Center for Mind and Brain, University of California, Davis, CA, 95618, USA
| |
Collapse
|
5
|
Persaud K, Hemmer P. The influence of functional components of natural scenes on episodic memory. Sci Rep 2024; 14:30313. [PMID: 39639108 PMCID: PMC11621360 DOI: 10.1038/s41598-024-81900-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 11/29/2024] [Indexed: 12/07/2024] Open
Abstract
Prior expectation for the structure of natural scenes is perhaps the most influential contributor to episodic memory for objects in scenes. While the influence of functional components of natural scenes on scene perception and visual search has been well studied, far less is known about the independent contributions of these components to episodic memory. In this investigation, we systematically removed three functional components of natural scenes: global-background, local spatial, and local associative information, to evaluate their impact on episodic memory. Results revealed that [partially] removing the global-background negatively impacted recall accuracy following short encoding times but had relatively little impact on memory after longer times. In contrast, systematically removing local spatial and associative relationships of scene objects negatively impacted recall accuracy following short and longer encoding times. These findings suggest that scene background, object spatial arrangements, and object relationships facilitate not only scene perception and object recognition, but also episodic memory. Interestingly, the impact of these components depends on how much encoding time is available to store information in episodic memory. This work has important implications for understanding how the inherent structure and function of the natural world interacts with memory and cognition in naturalistic contexts.
Collapse
Affiliation(s)
- Kimele Persaud
- Department of Psychology, Rutgers University, Newark, USA.
| | - Pernille Hemmer
- Department of Psychology, Rutgers University, New Brunswick, USA
| |
Collapse
|
6
|
Simpson MW, Wu J, Ye Z. Subliminal priming modulates motor sequence learning. Mem Cognit 2024:10.3758/s13421-024-01668-8. [PMID: 39570541 DOI: 10.3758/s13421-024-01668-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2024] [Indexed: 11/22/2024]
Abstract
Sequential behaviour is underpinned by the selection and inhibition of movement at appropriate points in space and time. Sequences embedded among movement patterns must be learnt, yet the contribution of response selection and inhibition to the acquisition of motor sequences remains poorly understood. We addressed this issue by overlaying the serial reaction time task (SRTT) with subliminal masked primes that differentially weighed response tendencies. In Experiment 1, twenty-four healthy young adults, and in Experiment 2, thirty-six participants, performed the SRTT with congruent (same position), incongruent (different position), or neutral (no prime) subliminal masked primes. Each condition featured an embedded eight-digit (Experiment 1) or ten-digit (Experiment 2) second-order sequence, with conditions presented in counterbalanced order during a single session. Sequence specific learning was observed under neutral and congruent prime conditions. Independent of sequence awareness, congruent primes reduced initial response latency and led to greater sequence specific learning compared with neutral primes. However, incongruent primes appeared to attenuate learning (Experiment 1). These results demonstrate that prime congruency modulates sequence specific learning below the threshold of conscious awareness. Congruent primes may elevate the salience of stimulus-response compounds and accentuate learning, but at the cost of increased awareness. Incongruent primes, and the induction of response conflict, attenuate sequence specific learning (Experiment 1) and may prevent the formation of cross-temporal contingencies necessary for implicit motor sequence learning.
Collapse
Affiliation(s)
- Michael William Simpson
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jing Wu
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zheng Ye
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
7
|
Hatori Y, Yuan ZX, Tseng CH, Kuriki I, Shioiri S. Modeling the dynamics of contextual cueing effect by reinforcement learning. J Vis 2024; 24:11. [PMID: 39560623 DOI: 10.1167/jov.24.12.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2024] Open
Abstract
Humans use environmental context for facilitating object searches. The benefit of context for visual search requires learning. Modeling the learning process of context for efficient processing is vital to understanding visual function in everyday environments. We proposed a model that accounts for the contextual cueing effect, which refers to the learning effect of scene context to identify the location of a target item. The model extracted the global feature of a scene and gradually strengthened the relationship between the global feature and its target location with repeated observations. We compared the model and human performance with two visual search experiments (letter arrangements on a gray background or a natural scene). The proposed model successfully simulated the faster reduction of the number of saccades required before target detection for the natural scene background compared with the uniform gray background. We further tested whether the model replicated the known characteristics of the contextual cueing effect in terms of local learning around the target, the effect of the ratio of repeated and novel stimuli, and the superiority of natural scenes.
Collapse
Affiliation(s)
- Yasuhiro Hatori
- Research Institute of Electrical Communication, Tohoku University, Sendai, Japan
- National Institute of Occupational Safety and Health, Japan, Tokyo, Japan
| | - Zheng-Xiong Yuan
- Research Institute of Electrical Communication, Tohoku University, Sendai, Japan
| | - Chia-Huei Tseng
- Research Institute of Electrical Communication, Tohoku University, Sendai, Japan
| | - Ichiro Kuriki
- Research Institute of Electrical Communication, Tohoku University, Sendai, Japan
- Graduate School of Science and Engineering, Saitama University, Saitama, Japan
| | - Satoshi Shioiri
- Research Institute of Electrical Communication, Tohoku University, Sendai, Japan
| |
Collapse
|
8
|
Chong LL, Beck DM. Real-world Statistical Regularity Impacts Inattentional Blindness. Conscious Cogn 2024; 125:103768. [PMID: 39447236 DOI: 10.1016/j.concog.2024.103768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 10/04/2024] [Accepted: 10/07/2024] [Indexed: 10/26/2024]
Abstract
Does the likelihood of us experiencing inattentional blindness depend on whether the scenes are statistically regular (e.g., probable) or not? Previous studies have shown that observers find it harder to perceive real-world statistical irregularities, such as improbable (statistically irregular) scenes (e.g., scenes unlikely to take place in the real world) as opposed to probable (statistically regular) scenes. Moreover, high inattentional blindness rates have been observed for improbable stimuli (e.g., a gorilla on a college campus). However, no one has directly compared noticing rates for probable and improbable scenes. Here we ask if people are more likely to experience inattentional blindness for improbable than probable scenes. In two large-scale experiments in which one group of participants is presented, on the critical trial, with a probable scene and the other group with an improbable scene, we observed higher rates of inattention blindness for participants receiving improbable scenes than those receiving probable scenes.
Collapse
Affiliation(s)
- Ling Lee Chong
- Department of Psychology, University of Illinois, 603 E. Daniel Street, Champaign, IL 61820, United States.
| | - Diane M Beck
- Department of Psychology, University of Illinois, 603 E. Daniel Street, Champaign, IL 61820, United States
| |
Collapse
|
9
|
Liesefeld HR, Lamy D, Gaspelin N, Geng JJ, Kerzel D, Schall JD, Allen HA, Anderson BA, Boettcher S, Busch NA, Carlisle NB, Colonius H, Draschkow D, Egeth H, Leber AB, Müller HJ, Röer JP, Schubö A, Slagter HA, Theeuwes J, Wolfe J. Terms of debate: Consensus definitions to guide the scientific discourse on visual distraction. Atten Percept Psychophys 2024; 86:1445-1472. [PMID: 38177944 PMCID: PMC11552440 DOI: 10.3758/s13414-023-02820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2023] [Indexed: 01/06/2024]
Abstract
Hypothesis-driven research rests on clearly articulated scientific theories. The building blocks for communicating these theories are scientific terms. Obviously, communication - and thus, scientific progress - is hampered if the meaning of these terms varies idiosyncratically across (sub)fields and even across individual researchers within the same subfield. We have formed an international group of experts representing various theoretical stances with the goal to homogenize the use of the terms that are most relevant to fundamental research on visual distraction in visual search. Our discussions revealed striking heterogeneity and we had to invest much time and effort to increase our mutual understanding of each other's use of central terms, which turned out to be strongly related to our respective theoretical positions. We present the outcomes of these discussions in a glossary and provide some context in several essays. Specifically, we explicate how central terms are used in the distraction literature and consensually sharpen their definitions in order to enable communication across theoretical standpoints. Where applicable, we also explain how the respective constructs can be measured. We believe that this novel type of adversarial collaboration can serve as a model for other fields of psychological research that strive to build a solid groundwork for theorizing and communicating by establishing a common language. For the field of visual distraction, the present paper should facilitate communication across theoretical standpoints and may serve as an introduction and reference text for newcomers.
Collapse
Affiliation(s)
- Heinrich R Liesefeld
- Department of Psychology, University of Bremen, Hochschulring 18, D-28359, Bremen, Germany.
| | - Dominique Lamy
- The School of Psychology Sciences and The Sagol School of Neuroscience, Tel Aviv University, Ramat Aviv 69978, POB 39040, Tel Aviv, Israel.
| | | | - Joy J Geng
- University of California Davis, Daivs, CA, USA
| | | | | | | | | | | | | | | | - Hans Colonius
- Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | | | | | | | | | | | - Anna Schubö
- Philipps University Marburg, Marburg, Germany
| | | | | | - Jeremy Wolfe
- Harvard Medical School, Boston, MA, USA
- Brigham & Women's Hospital, Boston, MA, USA
| |
Collapse
|
10
|
Zinchenko A, Geyer T, Zang X, Shi Z, Müller HJ, Conci M. When experience with scenes foils attentional orienting: ERP evidence against flexible target-context mapping in visual search. Cortex 2024; 175:41-53. [PMID: 38703715 DOI: 10.1016/j.cortex.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 02/25/2024] [Accepted: 04/01/2024] [Indexed: 05/06/2024]
Abstract
Visual search is speeded when a target is repeatedly presented in an invariant scene context of nontargets (contextual cueing), demonstrating observers' capability for using statistical long-term memory (LTM) to make predictions about upcoming sensory events, thus improving attentional orienting. In the current study, we investigated whether expectations arising from individual, learned environmental structures can encompass multiple target locations. We recorded event-related potentials (ERPs) while participants performed a contextual cueing search task with repeated and non-repeated spatial item configurations. Notably, a given search display could be associated with either a single target location (standard contextual cueing) or two possible target locations. Our result showed that LTM-guided attention was always limited to only one target position in single- but also in the dual-target displays, as evidenced by expedited reaction times (RTs) and enhanced N1pc and N2pc deflections contralateral to one ("dominant") target of up to two repeating target locations. This contrasts with the processing of non-learned ("minor") target positions (in dual-target displays), which revealed slowed RTs alongside an initial N1pc "misguidance" signal that then vanished in the subsequent N2pc. This RT slowing was accompanied by enhanced N200 and N400 waveforms over fronto-central electrodes, suggesting that control mechanisms regulate the competition between dominant and minor targets. Our study thus reveals a dissociation in processing dominant versus minor targets: While LTM templates guide attention to dominant targets, minor targets necessitate control processes to overcome the automatic bias towards previously learned, dominant target locations.
Collapse
Affiliation(s)
- Artyom Zinchenko
- Department Psychologie, Ludwig-Maximilians-Universität München, München, Germany.
| | - Thomas Geyer
- Department Psychologie, Ludwig-Maximilians-Universität München, München, Germany; NICUM - Neuro Imaging Core Unit, LMU Munich, Germany; MCN - Munich Center for Neurosciences - Brain & Mind, LMU Munich, Germany
| | - Xuelian Zang
- Center for Cognition and Brain Disorders, Affiliated Hospital of Hangzhou Normal University, China; Institutes of Psychological Sciences, College of Education, Hangzhou Normal University, China
| | - Zhuanghua Shi
- Department Psychologie, Ludwig-Maximilians-Universität München, München, Germany; NICUM - Neuro Imaging Core Unit, LMU Munich, Germany
| | - Hermann J Müller
- Department Psychologie, Ludwig-Maximilians-Universität München, München, Germany; MCN - Munich Center for Neurosciences - Brain & Mind, LMU Munich, Germany
| | - Markus Conci
- Department Psychologie, Ludwig-Maximilians-Universität München, München, Germany; MCN - Munich Center for Neurosciences - Brain & Mind, LMU Munich, Germany
| |
Collapse
|
11
|
Hanson SJ, Yadav V, Hanson C. Dense Sample Deep Learning. Neural Comput 2024; 36:1228-1244. [PMID: 38669696 DOI: 10.1162/neco_a_01666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/24/2024] [Indexed: 04/28/2024]
Abstract
Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
Collapse
Affiliation(s)
- Stephen José Hanson
- Rutgers Brain Imaging Center and Psychology Department, Rutgers University, Newark, NJ 07102, U.S.A.
| | - Vivek Yadav
- Rutgers Brain Imaging Center, Rutgers University, Newark, NJ 07102, U.S.A.
| | - Catherine Hanson
- Center for Molecular and Behavioral Neuroscience and Rutgers Brain Imaging Center, Rutgers University, Newark, NJ 07102, U.S.A.
| |
Collapse
|
12
|
Wang G, Foxwell MJ, Cichy RM, Pitcher D, Kaiser D. Individual differences in internal models explain idiosyncrasies in scene perception. Cognition 2024; 245:105723. [PMID: 38262271 DOI: 10.1016/j.cognition.2024.105723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 01/12/2024] [Accepted: 01/14/2024] [Indexed: 01/25/2024]
Abstract
According to predictive processing theories, vision is facilitated by predictions derived from our internal models of what the world should look like. However, the contents of these models and how they vary across people remains unclear. Here, we use drawing as a behavioral readout of the contents of the internal models in individual participants. Participants were first asked to draw typical versions of scene categories, as descriptors of their internal models. These drawings were converted into standardized 3d renders, which we used as stimuli in subsequent scene categorization experiments. Across two experiments, participants' scene categorization was more accurate for renders tailored to their own drawings compared to renders based on others' drawings or copies of scene photographs, suggesting that scene perception is determined by a match with idiosyncratic internal models. Using a deep neural network to computationally evaluate similarities between scene renders, we further demonstrate that graded similarity to the render based on participants' own typical drawings (and thus to their internal model) predicts categorization performance across a range of candidate scenes. Together, our results showcase the potential of a new method for understanding individual differences - starting from participants' personal expectations about the structure of real-world scenes.
Collapse
Affiliation(s)
- Gongting Wang
- Department of Education and Psychology, Freie Universität Berlin, Germany; Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Germany
| | | | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Germany
| | | | - Daniel Kaiser
- Department of Mathematics and Computer Science, Physics, Geography, Justus-Liebig-Universität Gießen, Germany; Center for Mind, Brain and Behavior (CMBB), Philipps-Universität Marburg and Justus-Liebig-Universität Gießen, Germany.
| |
Collapse
|
13
|
Li R, Li J, Wang C, Liu H, Liu T, Wang X, Zou T, Huang W, Yan H, Chen H. Multi-Semantic Decoding of Visual Perception with Graph Neural Networks. Int J Neural Syst 2024; 34:2450016. [PMID: 38372016 DOI: 10.1142/s0129065724500163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Constructing computational decoding models to account for the cortical representation of semantic information plays a crucial role in understanding visual perception. The human visual system processes interactive relationships among different objects when perceiving the semantic contents of natural visions. However, the existing semantic decoding models commonly regard categories as completely separate and independent visually and semantically and rarely consider the relationships from prior information. In this work, a novel semantic graph learning model was proposed to decode multiple semantic categories of perceived natural images from brain activity. The proposed model was validated on the functional magnetic resonance imaging data collected from five normal subjects while viewing 2750 natural images comprising 52 semantic categories. The results showed that the Graph Neural Network-based decoding model achieved higher accuracies than other deep neural network models. Moreover, the co-occurrence probability among semantic categories showed a significant correlation with the decoding accuracy. Additionally, the results suggested that semantic content organized in a hierarchical way with higher visual areas was more closely related to the internal visual experience. Together, this study provides a superior computational framework for multi-semantic decoding that supports the visual integration mechanism of semantic processing.
Collapse
Affiliation(s)
- Rong Li
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Jiyi Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Chong Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Haoxiang Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Tao Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Xuyang Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Ting Zou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Wei Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Hongmei Yan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| | - Huafu Chen
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
- MOE Key Lab for Neuroinformation, High-Field Magnetic Resonance Brain Imaging, Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China
| |
Collapse
|
14
|
Westebbe L, Liang Y, Blaser E. The Accuracy and Precision of Memory for Natural Scenes: A Walk in the Park. Open Mind (Camb) 2024; 8:131-147. [PMID: 38435706 PMCID: PMC10898787 DOI: 10.1162/opmi_a_00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/17/2024] [Indexed: 03/05/2024] Open
Abstract
It is challenging to quantify the accuracy and precision of scene memory because it is unclear what 'space' scenes occupy (how can we quantify error when misremembering a natural scene?). To address this, we exploited the ecologically valid, metric space in which scenes occur and are represented: routes. In a delayed estimation task, participants briefly saw a target scene drawn from a video of an outdoor 'route loop', then used a continuous report wheel of the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net boundary extension/contraction. Interestingly, precision was higher for routes that were more self-similar (as characterized by the half-life, in meters, of a route's Multiscale Structural Similarity index), consistent with previous work finding a 'similarity advantage' where memory precision is regulated according to task demands. Overall, scenes were remembered to within a few meters of their actual location.
Collapse
Affiliation(s)
- Leo Westebbe
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Yibiao Liang
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Erik Blaser
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
15
|
Faurite C, Aprile E, Kauffmann L, Mermillod M, Gallice M, Chiquet C, Cottereau BR, Peyrin C. Interaction between central and peripheral vision: Influence of distance and spatial frequencies. J Vis 2024; 24:3. [PMID: 38190145 PMCID: PMC10777871 DOI: 10.1167/jov.24.1.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/30/2024] [Indexed: 01/09/2024] Open
Abstract
Visual scene perception is based on reciprocal interactions between central and peripheral information. Such interactions are commonly investigated through the semantic congruence effect, which usually reveals a congruence effect of central vision on peripheral vision as strong as the reverse. The aim of the present study was to further investigate the mechanisms underlying central-peripheral visual interactions using a central-peripheral congruence paradigm through three behavioral experiments. We presented simultaneously a central and a peripheral stimulus, that could be either semantically congruent or incongruent. To assess the congruence effect of central vision on peripheral vision, participants had to categorize the peripheral target stimulus while ignoring the central distractor stimulus. To assess the congruence effect of the peripheral vision on central vision, they had to categorize the central target stimulus while ignoring the peripheral distractor stimulus. Experiment 1 revealed that the physical distance between central and peripheral stimuli influences central-peripheral visual interactions: Congruence effect of central vision is stronger when the distance between the target and the distractor is the shortest. Experiments 2 and 3 revealed that the spatial frequency content of distractors also influence central-peripheral interactions: Congruence effect of central vision is observed only when the distractor contained high spatial frequencies while congruence effect of peripheral vision is observed only when the distractor contained low spatial frequencies. These results raise the question of how these influences are exerted (bottom-up vs. top-down) and are discussed based on the retinocortical properties of the visual system and the predictive brain hypothesis.
Collapse
Affiliation(s)
- Cynthia Faurite
- Université Grenoble Alpes, Univ. Savoie Mont Blanc, Grenoble, France
| | - Eva Aprile
- Université Grenoble Alpes, Univ. Savoie Mont Blanc, Grenoble, France
| | - Louise Kauffmann
- Université Grenoble Alpes, Univ. Savoie Mont Blanc, Grenoble, France
| | - Martial Mermillod
- Université Grenoble Alpes, Univ. Savoie Mont Blanc, Grenoble, France
| | - Mathilde Gallice
- Department of Ophthalmology, Grenoble Alpes University Hospital, Grenoble, France
| | - Christophe Chiquet
- Department of Ophthalmology, Grenoble Alpes University Hospital, Grenoble, France
| | - Benoit R Cottereau
- Centre de Recherche Cerveau et Cognition, Université Toulouse III-Paul Sabatier, Toulouse, France
- Centre National de la Recherche Scientifique, Toulouse, France
| | - Carole Peyrin
- Université Grenoble Alpes, Univ. Savoie Mont Blanc, Grenoble, France
| |
Collapse
|
16
|
Peelen MV, Berlot E, de Lange FP. Predictive processing of scenes and objects. NATURE REVIEWS PSYCHOLOGY 2024; 3:13-26. [PMID: 38989004 PMCID: PMC7616164 DOI: 10.1038/s44159-023-00254-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/25/2023] [Indexed: 07/12/2024]
Abstract
Real-world visual input consists of rich scenes that are meaningfully composed of multiple objects which interact in complex, but predictable, ways. Despite this complexity, we recognize scenes, and objects within these scenes, from a brief glance at an image. In this review, we synthesize recent behavioral and neural findings that elucidate the mechanisms underlying this impressive ability. First, we review evidence that visual object and scene processing is partly implemented in parallel, allowing for a rapid initial gist of both objects and scenes concurrently. Next, we discuss recent evidence for bidirectional interactions between object and scene processing, with scene information modulating the visual processing of objects, and object information modulating the visual processing of scenes. Finally, we review evidence that objects also combine with each other to form object constellations, modulating the processing of individual objects within the object pathway. Altogether, these findings can be understood by conceptualizing object and scene perception as the outcome of a joint probabilistic inference, in which "best guesses" about objects act as priors for scene perception and vice versa, in order to concurrently optimize visual inference of objects and scenes.
Collapse
Affiliation(s)
- Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Berlot
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
17
|
Bowers JS, Malhotra G, Dujmović M, Montero ML, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Clarifying status of DNNs as models of human vision. Behav Brain Sci 2023; 46:e415. [PMID: 38054298 DOI: 10.1017/s0140525x23002777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton L Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | | | - Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
18
|
Wiese H, Schipper M, Popova T, Burton AM, Young AW. Personal familiarity of faces, animals, objects, and scenes: Distinct perceptual and overlapping conceptual representations. Cognition 2023; 241:105625. [PMID: 37769520 DOI: 10.1016/j.cognition.2023.105625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/05/2023] [Accepted: 09/14/2023] [Indexed: 10/03/2023]
Abstract
While face, object, and scene recognition are often studied at a basic categorization level (e.g. "a face", "a car", "a kitchen"), we frequently recognise individual items of these categories as unique entities (e.g. "my mother", "my car", "my kitchen"). This recognition of individual identity is essential to appropriate behaviour in our world. However, relatively little is known about how we recognise individually familiar visual stimuli. Using event-related brain potentials, the present study examined whether and to what extent the underlying neural representations of personally familiar items are similar or different across different categories. In three experiments, we examined the recognition of personally highly familiar faces, animals, indoor scenes, and objects. We observed relatively distinct familiarity effects in an early time window (200-400 ms), with a clearly right-lateralized occipito-temporal scalp distribution for human faces and more bilateral and posterior distributions for other stimulus categories, presumably reflecting access to at least partly discrete visual long-term representations. In contrast, we found clearly overlapping familiarity effects in a later time window (starting 400 to 500 ms after stimulus onset), again with a mainly right occipito-temporal scalp distribution, for all stimulus categories. These later effects appear to reflect the sustained activation of conceptual properties relevant to any potential interaction. We conclude that familiarity for items from the various visual stimulus categories tested here is represented differently at the perceptual level, while relatively overlapping conceptual mechanisms allow for the preparation of impending potential interaction with the environment.
Collapse
|
19
|
Li C, Ficco L, Trapp S, Rostalski SM, Korn L, Kovács G. The effect of context congruency on fMRI repetition suppression for objects. Neuropsychologia 2023; 188:108603. [PMID: 37270029 DOI: 10.1016/j.neuropsychologia.2023.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 05/31/2023] [Accepted: 05/31/2023] [Indexed: 06/05/2023]
Abstract
The recognition of objects is strongly facilitated when they are presented in the context of other objects (Biederman, 1972). Such contexts facilitate perception and induce expectations of context-congruent objects (Trapp and Bar, 2015). The neural mechanisms underlying these facilitatory effects of context on object processing, however, are not yet fully understood. In the present study, we investigate how context-induced expectations affect subsequent object processing. We used functional magnetic resonance imaging and measured repetition suppression as a proxy for prediction error processing. Participants viewed pairs of alternating or repeated object images which were preceded by context-congruent, context-incongruent or neutral cues. We found a stronger repetition suppression in congruent as compared to incongruent or neutral cues in the object sensitive lateral occipital cortex. Interestingly, this stronger effect was driven by enhanced responses to alternating stimulus pairs in the congruent contexts, rather than by suppressed responses to repeated stimulus pairs, which emphasizes the contribution of surprise-related response enhancement for the context modulation on RS when expectations are violated. In addition, in the congruent condition, we discovered significant functional connectivity between object-responsive and frontal cortical regions, as well as between object-responsive regions and the fusiform gyrus. Our findings indicate that prediction errors, reflected in enhanced brain responses to violated contextual expectations, underlie the facilitating effect of context during object perception.
Collapse
Affiliation(s)
- Chenglin Li
- School of Psychology, Zhejiang Normal University, China; Department of Biological Psychology and Cognitive Neurosciences, Institute of Psychology, Friedrich-Schiller-Universität Jena, Germany
| | - Linda Ficco
- Department of General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich-Schiller-Universität Jena, Germany; Department of Linguistics and Cultural Evolution, International Max Planck Research School for the Science of Human History, Jena, Germany
| | - Sabrina Trapp
- Macromedia University of Applied Sciences, Munich, Germany
| | - Sophie-Marie Rostalski
- Department of Biological Psychology and Cognitive Neurosciences, Institute of Psychology, Friedrich-Schiller-Universität Jena, Germany
| | - Lukas Korn
- Department of Biological Psychology and Cognitive Neurosciences, Institute of Psychology, Friedrich-Schiller-Universität Jena, Germany
| | - Gyula Kovács
- Department of Biological Psychology and Cognitive Neurosciences, Institute of Psychology, Friedrich-Schiller-Universität Jena, Germany.
| |
Collapse
|
20
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
21
|
Yu X, Zhou Z, Becker SI, Boettcher SEP, Geng JJ. Good-enough attentional guidance. Trends Cogn Sci 2023; 27:391-403. [PMID: 36841692 DOI: 10.1016/j.tics.2023.01.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/27/2023]
Abstract
Theories of attention posit that attentional guidance operates on information held in a target template within memory. The template is often thought to contain veridical target features, akin to a photograph, and to guide attention to objects that match the exact target features. However, recent evidence suggests that attentional guidance is highly flexible and often guided by non-veridical features, a subset of features, or only associated features. We integrate these findings and propose that attentional guidance maximizes search efficiency based on a 'good-enough' principle to rapidly localize candidate target objects. Candidates are then serially interrogated to make target-match decisions using more precise information. We suggest that good-enough guidance optimizes the speed-accuracy-effort trade-offs inherent in each stage of visual search.
Collapse
Affiliation(s)
- Xinger Yu
- Center for Mind and Brain, University of California Davis, Davis, CA, USA; Department of Psychology, University of California Davis, Davis, CA, USA
| | - Zhiheng Zhou
- Center for Mind and Brain, University of California Davis, Davis, CA, USA
| | - Stefanie I Becker
- School of Psychology, University of Queensland, Brisbane, QLD, Australia
| | | | - Joy J Geng
- Center for Mind and Brain, University of California Davis, Davis, CA, USA; Department of Psychology, University of California Davis, Davis, CA, USA.
| |
Collapse
|
22
|
Niimi R, Saiki T, Yokosawa K. Auditory scene context facilitates visual recognition of objects in consistent visual scenes. Atten Percept Psychophys 2023; 85:1267-1275. [PMID: 36977906 DOI: 10.3758/s13414-023-02699-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 03/29/2023]
Abstract
Visual object recognition is facilitated by contextually consistent scenes in which the object is embedded. Scene gist representations extracted from the scenery backgrounds yield this scene consistency effect. Here we examined whether the scene consistency effect is specific to the visual domain or if it is crossmodal. Through four experiments, the accuracy of the naming of briefly presented visual objects was assessed. In each trial, a 4-s sound clip was presented and a visual scene containing the target object was briefly shown at the end of the sound clip. In a consistent sound condition, an environmental sound associated with the scene in which the target object typically appears was presented (e.g., forest noise for a bear target object). In an inconsistent sound condition, a sound clip contextually inconsistent with the target object was presented (e.g., city noise for a bear). In a control sound condition, a nonsensical sound (sawtooth wave) was presented. When target objects were embedded in contextually consistent visual scenes (Experiment 1: a bear in a forest background), consistent sounds increased object-naming accuracy. In contrast, sound conditions did not show a significant effect when target objects were embedded in contextually inconsistent visual scenes (Experiment 2: a bear in a pedestrian crossing background) or in a blank background (Experiments 3 and 4). These results suggested that auditory scene context has weak or no direct influence on visual object recognition. It seems likely that consistent auditory scenes indirectly facilitate visual object recognition by promoting visual scene processing.
Collapse
|
23
|
Jérémie JN, Perrinet LU. Ultrafast Image Categorization in Biology and Neural Models. Vision (Basel) 2023; 7:29. [PMID: 37092462 PMCID: PMC10123664 DOI: 10.3390/vision7020029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/09/2023] [Accepted: 03/15/2023] [Indexed: 03/29/2023] Open
Abstract
Humans are able to categorize images very efficiently, in particular to detect the presence of an animal very quickly. Recently, deep learning algorithms based on convolutional neural networks (CNNs) have achieved higher than human accuracy for a wide range of visual categorization tasks. However, the tasks on which these artificial networks are typically trained and evaluated tend to be highly specialized and do not generalize well, e.g., accuracy drops after image rotation. In this respect, biological visual systems are more flexible and efficient than artificial systems for more general tasks, such as recognizing an animal. To further the comparison between biological and artificial neural networks, we re-trained the standard VGG 16 CNN on two independent tasks that are ecologically relevant to humans: detecting the presence of an animal or an artifact. We show that re-training the network achieves a human-like level of performance, comparable to that reported in psychophysical tasks. In addition, we show that the categorization is better when the outputs of the models are combined. Indeed, animals (e.g., lions) tend to be less present in photographs that contain artifacts (e.g., buildings). Furthermore, these re-trained models were able to reproduce some unexpected behavioral observations from human psychophysics, such as robustness to rotation (e.g., an upside-down or tilted image) or to a grayscale transformation. Finally, we quantified the number of CNN layers required to achieve such performance and showed that good accuracy for ultrafast image categorization can be achieved with only a few layers, challenging the belief that image recognition requires deep sequential analysis of visual objects. We hope to extend this framework to biomimetic deep neural architectures designed for ecological tasks, but also to guide future model-based psychophysical experiments that would deepen our understanding of biological vision.
Collapse
Affiliation(s)
- Jean-Nicolas Jérémie
- Institut de Neurosciences de la Timone (UMR 7289), Aix Marseille University, CNRS, 13005 Marseille, France
| | - Laurent U. Perrinet
- Institut de Neurosciences de la Timone (UMR 7289), Aix Marseille University, CNRS, 13005 Marseille, France
| |
Collapse
|
24
|
Schüz S, Gatt A, Zarrieß S. Rethinking symbolic and visual context in Referring Expression Generation. Front Artif Intell 2023; 6:1067125. [PMID: 37026020 PMCID: PMC10072327 DOI: 10.3389/frai.2023.1067125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/28/2023] [Indexed: 03/31/2023] Open
Abstract
Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains throughsymbolicinformation about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research invisual REGhas turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction betweenpositiveandnegative semantic forcesexerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.
Collapse
Affiliation(s)
- Simeon Schüz
- Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany
- *Correspondence: Simeon Schüz
| | - Albert Gatt
- Natural Language Processing Group, Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
| | - Sina Zarrieß
- Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
25
|
Abstract
Research has recently shown that efficient selection relies on the implicit extraction of environmental regularities, known as statistical learning. Although this has been demonstrated for scenes, similar learning arguably also occurs for objects. To test this, we developed a paradigm that allowed us to track attentional priority at specific object locations irrespective of the object's orientation in three experiments with young adults (all Ns = 80). Experiments 1a and 1b established within-object statistical learning by demonstrating increased attentional priority at relevant object parts (e.g., hammerhead). Experiment 2 extended this finding by demonstrating that learned priority generalized to viewpoints in which learning never took place. Together, these findings demonstrate that as a function of statistical learning, the visual system not only is able to tune attention relative to specific locations in space but also can develop preferential biases for specific parts of an object independently of the viewpoint of that object.
Collapse
Affiliation(s)
- Dirk van Moorselaar
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam.,Institute of Brain and Behaviour Amsterdam (iBBA), The Netherlands
| | - Jan Theeuwes
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam.,Institute of Brain and Behaviour Amsterdam (iBBA), The Netherlands.,William James Center for Research, ISPA-Instituto Universitario
| |
Collapse
|
26
|
Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023; 74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy;
| | - Hans P Op de Beeck
- Leuven Brain Institute, Research Unit Brain & Cognition, KU Leuven, Leuven, Belgium;
| |
Collapse
|
27
|
Do chimpanzees see a face on Mars? A search for face pareidolia in chimpanzees. Anim Cogn 2022; 26:885-905. [PMID: 36583802 DOI: 10.1007/s10071-022-01739-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/05/2022] [Accepted: 12/15/2022] [Indexed: 12/31/2022]
Abstract
We sometimes perceive meaningful patterns or images in random arrangements of colors and shapes. This phenomenon is called pareidolia and has recently been studied intensively, especially face pareidolia. In contrast, there are few comparative-cognitive studies on face pareidolia with nonhuman primates. This study explored behavioral evidence for face pareidolia in chimpanzees using visual search and matching tasks. Faces are processed in a configural manner, and their perception and recognition are hampered by inversion and misalignment of top and bottom parts. We investigated whether the same effect occurs in a visual search for face-like objects. The results showed an effect of misalignment. On the other hand, consistent results were not obtained with the photographs of fruits. When only the top or bottom half of the face-like object was presented, chimpanzees showed better performance for the top-half condition, suggesting the importance of the eye area in face pareidolia. In the positive-control experiments, chimpanzees received the same experiment using human faces and human participants with face-like objects and fruits. As a result, chimpanzees showed an inefficient search for inverted and misaligned faces and humans for manipulated face-like objects. Finally, to examine the role of face awareness, we tested matching a human face to a face-like object in chimpanzees but obtained no substantial evidence that they saw the face-like object as a "face." Based on these results, we discussed the extents and limits of face pareidolia in chimpanzees.
Collapse
|
28
|
Hayes TR, Henderson JM. Scene inversion reveals distinct patterns of attention to semantically interpreted and uninterpreted features. Cognition 2022; 229:105231. [DOI: 10.1016/j.cognition.2022.105231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/03/2022]
|
29
|
Niu J, Zhang Z, Sun Y, Wang X, Ni J, Qin H. The driver's instantaneous situation awareness when the alarm rings during the take-over of vehicle control in automated driving. TRAFFIC INJURY PREVENTION 2022; 23:478-482. [PMID: 36170041 DOI: 10.1080/15389588.2022.2122714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/22/2022] [Accepted: 09/06/2022] [Indexed: 06/16/2023]
Abstract
OBJECTIVE The driver's instantaneous situation awareness in the process of take-over of vehicle control in automated driving has not yet been thoroughly investigated. The proposed research can provide a better understanding of the driver's perceived characteristics and identify the most urgent information requirements of the on-site scenario when the driver's eye sight returns from other distractors to the driving scene. METHODS We conducted an experiment in simulated automated driving to study the participants' ability of instantaneous hazard perception and judgment. The scene pictures, which were displayed in millisecond time, were used to imitate the situations that drivers would see when the distracted drivers returned their gaze from the distractive sources to the road in the simulated automated driving. RESULTS The results show that the driving state, scene representation time and hazard levels affect the instantaneous situation awareness of drivers. In addition, the scene perception accuracy of the group who played games during automated driving is much lower than that of the group that chatted with the copilot. The longer picture-showing duration decreases the accuracy of hazard identification, whereas the shorter picture-showing duration increases the accuracy of hazard perception and the hazard rating score. CONCLUSIONS In conclusion, distraction reduces the accuracy of the instantaneous scene perception of drivers, and drivers behave more cautiously in decision making when the driving situations are more hazardous. This study provides a good theoretical basis for the design of hazard warning information for automated driving.
Collapse
Affiliation(s)
- Jianwei Niu
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Zhen Zhang
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Yipin Sun
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Xiai Wang
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Jie Ni
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China
| | - Hua Qin
- School of Mechanical-electronic and Automobile Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
30
|
Zhang X, Zhao X, Dang J, Liu L. Physical segregation impedes psychological integration: scene inconsistency increases prejudice against minority groups. CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-020-01085-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
Helbing J, Draschkow D, L-H Võ M. Auxiliary Scene-Context Information Provided by Anchor Objects Guides Attention and Locomotion in Natural Search Behavior. Psychol Sci 2022; 33:1463-1476. [PMID: 35942922 DOI: 10.1177/09567976221091838] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Successful adaptive behavior requires efficient attentional and locomotive systems. Previous research has thoroughly investigated how we achieve this efficiency during natural behavior by exploiting prior knowledge related to targets of our actions (e.g., attending to metallic targets when looking for a pot) and to the environmental context (e.g., looking for the pot in the kitchen). Less is known about whether and how individual nontarget components of the environment support natural behavior. In our immersive virtual reality task, 24 adult participants searched for objects in naturalistic scenes in which we manipulated the presence and arrangement of large, static objects that anchor predictions about targets (e.g., the sink provides a prediction for the location of the soap). Our results show that gaze and body movements in this naturalistic setting are strongly guided by these anchors. These findings demonstrate that objects auxiliary to the target are incorporated into the representations guiding attention and locomotion.
Collapse
Affiliation(s)
- Jason Helbing
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| | - Dejan Draschkow
- Brain and Cognition Laboratory, Department of Experimental Psychology, University of Oxford.,Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford
| | - Melissa L-H Võ
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| |
Collapse
|
32
|
Theeuwes J, Bogaerts L, van Moorselaar D. What to expect where and when: how statistical learning drives visual selection. Trends Cogn Sci 2022; 26:860-872. [PMID: 35840476 DOI: 10.1016/j.tics.2022.06.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 05/30/2022] [Accepted: 06/02/2022] [Indexed: 12/26/2022]
Abstract
While the visual environment contains massive amounts of information, we should not and cannot pay attention to all events. Instead, we need to direct attention to those events that have proven to be important in the past and suppress those that were distracting and irrelevant. Experiences molded through a learning process enable us to extract and adapt to the statistical regularities in the world. While previous studies have shown that visual statistical learning (VSL) is critical for representing higher order units of perception, here we review the role of VSL in attentional selection. Evidence suggests that through VSL, attentional priority settings are optimally adjusted to regularities in the environment, without intention and without conscious awareness.
Collapse
Affiliation(s)
- Jan Theeuwes
- Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Institute Brain and Behavior (iBBA), Amsterdam, the Netherlands; William James Center for Research, ISPA-Instituto Universitario, Lisbon, Portugal.
| | - Louisa Bogaerts
- Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Institute Brain and Behavior (iBBA), Amsterdam, the Netherlands; Ghent University, Ghent, Belgium
| | - Dirk van Moorselaar
- Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Institute Brain and Behavior (iBBA), Amsterdam, the Netherlands
| |
Collapse
|
33
|
Wolfe B, Sawyer BD, Rosenholtz R. Toward a Theory of Visual Information Acquisition in Driving. HUMAN FACTORS 2022; 64:694-713. [PMID: 32678682 PMCID: PMC9136385 DOI: 10.1177/0018720820939693] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/09/2020] [Indexed: 06/01/2023]
Abstract
OBJECTIVE The aim of this study is to describe information acquisition theory, explaining how drivers acquire and represent the information they need. BACKGROUND While questions of what drivers are aware of underlie many questions in driver behavior, existing theories do not directly address how drivers in particular and observers in general acquire visual information. Understanding the mechanisms of information acquisition is necessary to build predictive models of drivers' representation of the world and can be applied beyond driving to a wide variety of visual tasks. METHOD We describe our theory of information acquisition, looking to questions in driver behavior and results from vision science research that speak to its constituent elements. We focus on the intersection of peripheral vision, visual attention, and eye movement planning and identify how an understanding of these visual mechanisms and processes in the context of information acquisition can inform more complete models of driver knowledge and state. RESULTS We set forth our theory of information acquisition, describing the gap in understanding that it fills and how existing questions in this space can be better understood using it. CONCLUSION Information acquisition theory provides a new and powerful way to study, model, and predict what drivers know about the world, reflecting our current understanding of visual mechanisms and enabling new theories, models, and applications. APPLICATION Using information acquisition theory to understand how drivers acquire, lose, and update their representation of the environment will aid development of driver assistance systems, semiautonomous vehicles, and road safety overall.
Collapse
|
34
|
Contextual cueing in co-active visual search: Joint action allows acquisition of task-irrelevant context. Atten Percept Psychophys 2022; 84:1114-1129. [PMID: 35437702 DOI: 10.3758/s13414-022-02470-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/26/2022] [Indexed: 11/08/2022]
Abstract
Repeatedly presenting a target within a stable search array facilitates visual search, an effect termed contextual cueing. Previous solo-performance studies have shown that successful acquisition of contextual memories requires explicit allocation of attentional resources to the task-relevant repeated contexts. By contrast, repeated but task-irrelevant contexts could not be learned when presented together with repeated task-relevant contexts due to a blocking effect. Here we investigated if such blocking of context learning could be diminished in a social context, when the task-irrelevant context is task-relevant for a co-actor in a joint action search mode. We adopted the contextual cueing paradigm and extended this to the co-active search mode. Participants learned a context-cued subset of the search displays (color-defined) in the training phase, and their search performance was tested in the transfer phase, where previously irrelevant and relevant subsets were swapped. The experiments were conducted either in a solo search mode (Experiments 1 and 3) or in a co-active search mode (Experiment 2). Consistent with the classical contextual cueing studies, contextual cueing was observed in the training phase of all three experiments. Importantly, however, in the "swapped" test session, a significant contextual cueing effect was manifested only in the co-active search mode, not in the solo search mode. Our findings suggest that social context may widen the scope of attention, thus facilitating the acquisition of task-irrelevant contexts.
Collapse
|
35
|
Comparison of Object Detection in Head-Mounted and Desktop Displays for Congruent and Incongruent Environments. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6010028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As of yet, it is not fully explored how using HMDs impacts basic perceptual tasks, such as object perception. In traditional display setups, the congruency between background environment and object category has been shown to impact response times in object perception tasks. In this study, we investigated whether this well-established effect is comparable when using desktop and HMD devices. In the study, 21 participants used both desktop and HMD setups to perform an object identification task and, subsequently, their subjective presence while experiencing two-distinct virtual environments (a beach and a home environment) was evaluated. Participants were quicker to identify objects in the HMD condition, independent of object-environment congruency, while congruency effects were not impacted. Furthermore, participants reported significantly higher presence in the HMD condition.
Collapse
|
36
|
Brau JM, Sugarman A, Rothlein D, DeGutis J, Esterman M, Fortenbaugh FC. The impact of image degradation and temporal dynamics on sustained attention. J Vis 2022; 22:8. [PMID: 35297998 PMCID: PMC8944397 DOI: 10.1167/jov.22.4.8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Many clinical populations that have sustained attention deficits also have visual deficits. Therefore, it is necessary to understand how the quality of visual input and different forms of image degradation can contribute to worse performance on sustained attention tasks, particularly those with dynamic and complex visual stimuli. This study investigated the impact of image degradation on an adapted version of the gradual-onset continuous performance task (gradCPT), where participants must discriminate between gradually fading city and mountain scenes. Thirty-six normal-vision participants completed the task, which featured two blocks of six resolution and contrast levels. Subjects either completed a version with gradually fading or static image presentations. The results show decreases in image resolution impair performance under both types of temporal dynamics, whereas performance is only impaired under gradual temporal dynamics for decreases in image contrast. Image similarity analyses showed that performance has a higher association with an observer's ability to gather an image's global spatial layout (i.e. gist) than local variations in pixel luminance, particularly under gradual image presentation. This work suggests that gradually fading attention paradigms are sensitive to deficits in primary visual function, potentially leading to these issues being misinterpreted as attentional failures.
Collapse
Affiliation(s)
- Julia M Brau
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,
| | - Alexander Sugarman
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,
| | - David Rothlein
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA.,
| | - Joseph DeGutis
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Harvard Medical School, Cambridge, MA, USA.,
| | - Michael Esterman
- National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA.,Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Boston University School of Medicine, Boston, MA, USA.,
| | - Francesca C Fortenbaugh
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Harvard Medical School, Cambridge, MA, USA.,
| |
Collapse
|
37
|
Simulating background settings during spoken and written sentence comprehension. Psychon Bull Rev 2022; 29:1426-1439. [PMID: 35132579 PMCID: PMC8821844 DOI: 10.3758/s13423-022-02061-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2022] [Indexed: 11/21/2022]
Abstract
Previous findings from the sentence-picture verification task demonstrated that comprehenders simulate visual information about intrinsic attributes of described objects. Of interest is whether comprehenders may also simulate the setting in which an event takes place, such as, for example, the light information. To address this question, four experiments were conducted in which participants (total N = 412) either listened to (Experiment 1) or read (Experiment 3) sentences like “The sun is shining onto a bench” followed by a picture with the matching object (bench) and either the matching lighting condition of the scene (sunlit bench against the sunlit background) or the mismatching one (moonlit bench against the moonlit background). In both experiments, response times (RTs) were shorter when the lighting condition of the pictured scene matched the one implied in the sentence. However, no difference in RTs was observed when the processing of spoken sentences was interfered with visual noise (Experiment 2). Specifically, the results showed that visual interference disrupted incongruent visual content activated by listening to the sentences, as evidenced by faster responses on mismatching trials. Similarly, no difference in RTs was observed when the lighting condition of the pictured scene matched sentence context, but the target object presented for verification mismatched sentence context (Experiment 4). Thus, the locus of simulation effect is on the lighting representation of the target object rather than the lighting representation of the background. These findings support embodied and situated accounts of cognition, suggesting that comprehenders do not simulate objects independently of background settings.
Collapse
|
38
|
Chen C, Zou X, Zeng Z, Cheng Z, Zhang L, Hoi SCH. Exploring Structural Knowledge for Automated Visual Inspection of Moving Trains. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1233-1246. [PMID: 32559172 DOI: 10.1109/tcyb.2020.2998126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning methods are becoming the de-facto standard for generic visual recognition in the literature. However, their adaptations to industrial scenarios, such as visual recognition for machines, product streamlines, etc., which consist of countless components, have not been investigated well yet. Compared with the generic object detection, there is some strong structural knowledge in these scenarios (e.g., fixed relative positions of components, component relationships, etc.). A case worth exploring could be automated visual inspection for trains, where there are various correlated components. However, the dominant object detection paradigm is limited by treating the visual features of each object region separately without considering common sense knowledge among objects. In this article, we propose a novel automated visual inspection framework for trains exploring structural knowledge for train component detection, which is called SKTCD. SKTCD is an end-to-end trainable framework, in which the visual features of train components and structural knowledge (including hierarchical scene contexts and spatial-aware component relationships) are jointly exploited for train component detection. We propose novel residual multiple gated recurrent units (Res-MGRUs) that can optimally fuse the visual features of train components and messages from the structural knowledge in a weighted-recurrent way. In order to verify the feasibility of SKTCD, a dataset that contains high-resolution images captured from moving trains has been collected, in which 18 590 critical train components are manually annotated. Extensive experiments on this dataset and on the PASCAL VOC dataset have demonstrated that SKTCD outperforms the existing challenging baselines significantly. The dataset as well as the source code can be downloaded online (https://github.com/smartprobe/SKCD).
Collapse
|
39
|
Spaak E, Peelen MV, de Lange FP. Scene Context Impairs Perception of Semantically Congruent Objects. Psychol Sci 2022; 33:299-313. [PMID: 35020519 DOI: 10.1177/09567976211032676] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Visual scene context is well-known to facilitate the recognition of scene-congruent objects. Interestingly, however, according to predictive-processing accounts of brain function, scene congruency may lead to reduced (rather than enhanced) processing of congruent objects, compared with incongruent ones, because congruent objects elicit reduced prediction-error responses. We tested this counterintuitive hypothesis in two online behavioral experiments with human participants (N = 300). We found clear evidence for impaired perception of congruent objects, both in a change-detection task measuring response times and in a bias-free object-discrimination task measuring accuracy. Congruency costs were related to independent subjective congruency ratings. Finally, we show that the reported effects cannot be explained by low-level stimulus confounds, response biases, or top-down strategy. These results provide convincing evidence for perceptual congruency costs during scene viewing, in line with predictive-processing theory.
Collapse
Affiliation(s)
- Eelke Spaak
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| | - Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| |
Collapse
|
40
|
Abstract
During natural vision, our brains are constantly exposed to complex, but regularly structured environments. Real-world scenes are defined by typical part-whole relationships, where the meaning of the whole scene emerges from configurations of localized information present in individual parts of the scene. Such typical part-whole relationships suggest that information from individual scene parts is not processed independently, but that there are mutual influences between the parts and the whole during scene analysis. Here, we review recent research that used a straightforward, but effective approach to study such mutual influences: By dissecting scenes into multiple arbitrary pieces, these studies provide new insights into how the processing of whole scenes is shaped by their constituent parts and, conversely, how the processing of individual parts is determined by their role within the whole scene. We highlight three facets of this research: First, we discuss studies demonstrating that the spatial configuration of multiple scene parts has a profound impact on the neural processing of the whole scene. Second, we review work showing that cortical responses to individual scene parts are shaped by the context in which these parts typically appear within the environment. Third, we discuss studies demonstrating that missing scene parts are interpolated from the surrounding scene context. Bridging these findings, we argue that efficient scene processing relies on an active use of the scene's part-whole structure, where the visual brain matches scene inputs with internal models of what the world should look like.
Collapse
Affiliation(s)
- Daniel Kaiser
- Justus-Liebig-Universität Gießen, Germany.,Philipps-Universität Marburg, Germany.,University of York, United Kingdom
| | - Radoslaw M Cichy
- Freie Universität Berlin, Germany.,Humboldt-Universität zu Berlin, Germany.,Bernstein Centre for Computational Neuroscience Berlin, Germany
| |
Collapse
|
41
|
Castellotti S, Scipioni L, Mastandrea S, Del Viva MM. Pupil responses to implied motion in figurative and abstract paintings. PLoS One 2021; 16:e0258490. [PMID: 34634092 PMCID: PMC8504727 DOI: 10.1371/journal.pone.0258490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/28/2021] [Indexed: 11/18/2022] Open
Abstract
Motion can be perceived in static images, such as photos and figurative paintings, representing realistic subjects in motion, with or without directional information (e.g., motion blur or speed lines). Motion impression can be achieved even in non-realistic static images such as motion illusions and abstract paintings. It has been shown that visual motion processing affects the diameter of the pupil, responding differently to real, illusory, and implied motion in photographs (IM). It has been suggested that these different effects might be due to top-down modulations from different cortical areas underlying their processing. It is worthwhile to investigate pupillary response to figurative paintings, since they require an even higher level of interpretation than photos representing the same kind of subjects, given the complexity of cognitive processes involved in the aesthetic experience. Also, pupil responses to abstract paintings allows to study the effect of IM perception in representations devoid of real-life motion cues. We measured pupil responses to IM in figurative and abstract artworks depicting static and dynamic scenes, as rated by a large group of individuals not participating in the following experiment. Since the pupillary response is modulated by the subjective image interpretation, a motion rating test has been used to correct individual pupil data according to whether participants actually perceived the presence of motion in the paintings. Pupil responses to movies showing figurative and abstract subjects, and to motion illusions were also measured, to compare real and illusory motion with painted IM. Movies, both figurative and abstract, elicit the largest pupillary dilation of all static stimuli, whereas motion illusions cause the smallest pupil size, as previously shown. Interestingly, pupil responses to IM depend on the paintings' style. Figurative paintings depicting moving subjects cause more dilation than those representing static figures, and pupil size increases with the strength of IM, as already found with realistic photos. The opposite effect is obtained with abstract artworks. Abstract paintings depicting motion produce less dilation than those depicting stillness. In any case, these results reflect the individual subjective perception of dynamism, as the very same paintings can induce opposite responses in observer which interpreted it as static or dynamic. Overall, our data show that pupil size depends on high-level interpretation of motion in paintings, even when they do not represent real-world scenes. Our findings further suggest that the pupil is modulated by multiple top-down cortical mechanisms, involving the processing of motion, attention, memory, imagination, and other cognitive functions necessary for enjoying a complete aesthetic experience.
Collapse
Affiliation(s)
| | - Lisa Scipioni
- Department of Neurofarba, University of Florence, Florence, Italy
| | | | | |
Collapse
|
42
|
Gronau N. To Grasp the World at a Glance: The Role of Attention in Visual and Semantic Associative Processing. J Imaging 2021; 7:jimaging7090191. [PMID: 34564117 PMCID: PMC8470651 DOI: 10.3390/jimaging7090191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 08/30/2021] [Accepted: 09/15/2021] [Indexed: 11/16/2022] Open
Abstract
Associative relations among words, concepts and percepts are the core building blocks of high-level cognition. When viewing the world ‘at a glance’, the associative relations between objects in a scene, or between an object and its visual background, are extracted rapidly. The extent to which such relational processing requires attentional capacity, however, has been heavily disputed over the years. In the present manuscript, I review studies investigating scene–object and object–object associative processing. I then present a series of studies in which I assessed the necessity of spatial attention to various types of visual–semantic relations within a scene. Importantly, in all studies, the spatial and temporal aspects of visual attention were tightly controlled in an attempt to minimize unintentional attention shifts from ‘attended’ to ‘unattended’ regions. Pairs of stimuli—either objects, scenes or a scene and an object—were briefly presented on each trial, while participants were asked to detect a pre-defined target category (e.g., an animal, a nonsense shape). Response times (RTs) to the target detection task were registered when visual attention spanned both stimuli in a pair vs. when attention was focused on only one of two stimuli. Among non-prioritized stimuli that were not defined as to-be-detected targets, findings consistently demonstrated rapid associative processing when stimuli were fully attended, i.e., shorter RTs to associated than unassociated pairs. Focusing attention on a single stimulus only, however, largely impaired this relational processing. Notably, prioritized targets continued to affect performance even when positioned at an unattended location, and their associative relations with the attended items were well processed and analyzed. Our findings portray an important dissociation between unattended task-irrelevant and task-relevant items: while the former require spatial attentional resources in order to be linked to stimuli positioned inside the attentional focus, the latter may influence high-level recognition and associative processes via feature-based attentional mechanisms that are largely independent of spatial attention.
Collapse
Affiliation(s)
- Nurit Gronau
- Department of Psychology and Department of Cognitive Science Studies, The Open University of Israel, Raanana 4353701, Israel
| |
Collapse
|
43
|
Rolls ET. Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning. Front Comput Neurosci 2021; 15:686239. [PMID: 34366818 PMCID: PMC8335547 DOI: 10.3389/fncom.2021.686239] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
First, neurophysiological evidence for the learning of invariant representations in the inferior temporal visual cortex is described. This includes object and face representations with invariance for position, size, lighting, view and morphological transforms in the temporal lobe visual cortex; global object motion in the cortex in the superior temporal sulcus; and spatial view representations in the hippocampus that are invariant with respect to eye position, head direction, and place. Second, computational mechanisms that enable the brain to learn these invariant representations are proposed. For the ventral visual system, one key adaptation is the use of information available in the statistics of the environment in slow unsupervised learning to learn transform-invariant representations of objects. This contrasts with deep supervised learning in artificial neural networks, which uses training with thousands of exemplars forced into different categories by neuronal teachers. Similar slow learning principles apply to the learning of global object motion in the dorsal visual system leading to the cortex in the superior temporal sulcus. The learning rule that has been explored in VisNet is an associative rule with a short-term memory trace. The feed-forward architecture has four stages, with convergence from stage to stage. This type of slow learning is implemented in the brain in hierarchically organized competitive neuronal networks with convergence from stage to stage, with only 4-5 stages in the hierarchy. Slow learning is also shown to help the learning of coordinate transforms using gain modulation in the dorsal visual system extending into the parietal cortex and retrosplenial cortex. Representations are learned that are in allocentric spatial view coordinates of locations in the world and that are independent of eye position, head direction, and the place where the individual is located. This enables hippocampal spatial view cells to use idiothetic, self-motion, signals for navigation when the view details are obscured for short periods.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom.,Department of Computer Science, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
44
|
Kaiser D, Häberle G, Cichy RM. Coherent natural scene structure facilitates the extraction of task-relevant object information in visual cortex. Neuroimage 2021; 240:118365. [PMID: 34233220 PMCID: PMC8456750 DOI: 10.1016/j.neuroimage.2021.118365] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 04/22/2021] [Accepted: 07/03/2021] [Indexed: 11/24/2022] Open
Abstract
Looking for objects within complex natural environments is a task everybody performs multiple times each day. In this study, we explore how the brain uses the typical composition of real-world environments to efficiently solve this task. We recorded fMRI activity while participants performed two different categorization tasks on natural scenes. In the object task, they indicated whether the scene contained a person or a car, while in the scene task, they indicated whether the scene depicted an urban or a rural environment. Critically, each scene was presented in an "intact" way, preserving its coherent structure, or in a "jumbled" way, with information swapped across quadrants. In both tasks, participants' categorization was more accurate and faster for intact scenes. These behavioral benefits were accompanied by stronger responses to intact than to jumbled scenes across high-level visual cortex. To track the amount of object information in visual cortex, we correlated multi-voxel response patterns during the two categorization tasks with response patterns evoked by people and cars in isolation. We found that object information in object- and body-selective cortex was enhanced when the object was embedded in an intact, rather than a jumbled scene. However, this enhancement was only found in the object task: When participants instead categorized the scenes, object information did not differ between intact and jumbled scenes. Together, these results indicate that coherent scene structure facilitates the extraction of object information in a task-dependent way, suggesting that interactions between the object and scene processing pathways adaptively support behavioral goals.
Collapse
Affiliation(s)
- Daniel Kaiser
- Department of Psychology, University of York, York, UK.
| | - Greta Häberle
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany; Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany; Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| |
Collapse
|
45
|
Nuthmann A, Clayden AC, Fisher RB. The effect of target salience and size in visual search within naturalistic scenes under degraded vision. J Vis 2021; 21:2. [PMID: 33792616 PMCID: PMC8024777 DOI: 10.1167/jov.21.4.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We address two questions concerning eye guidance during visual search in naturalistic scenes. First, search has been described as a task in which visual salience is unimportant. Here, we revisit this question by using a letter-in-scene search task that minimizes any confounding effects that may arise from scene guidance. Second, we investigate how important the different regions of the visual field are for different subprocesses of search (target localization, verification). In Experiment 1, we manipulated both the salience (low vs. high) and the size (small vs. large) of the target letter (a "T"), and we implemented a foveal scotoma (radius: 1°) in half of the trials. In Experiment 2, observers searched for high- and low-salience targets either with full vision or with a central or peripheral scotoma (radius: 2.5°). In both experiments, we found main effects of salience with better performance for high-salience targets. In Experiment 1, search was faster for large than for small targets, and high-salience helped more for small targets. When searching with a foveal scotoma, performance was relatively unimpaired regardless of the target's salience and size. In Experiment 2, both visual-field manipulations led to search time costs, but the peripheral scotoma was much more detrimental than the central scotoma. Peripheral vision proved to be important for target localization, and central vision for target verification. Salience affected eye movement guidance to the target in both central and peripheral vision. Collectively, the results lend support for search models that incorporate salience for predicting eye-movement behavior.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Germany.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK., http://orcid.org/0000-0003-3338-3434
| | - Adam C Clayden
- School of Engineering, Arts, Science and Technology, University of Suffolk, UK.,Psychology Department, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, UK.,
| | | |
Collapse
|
46
|
The interplay between gaze and consistency in scene viewing: Evidence from visual search by young and older adults. Atten Percept Psychophys 2021; 83:1954-1970. [PMID: 33748905 PMCID: PMC8213592 DOI: 10.3758/s13414-021-02242-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2021] [Indexed: 11/08/2022]
Abstract
Searching for an object in a complex scene is influenced by high-level factors such as how much the item would be expected in that setting (semantic consistency). There is also evidence that a person gazing at an object directs our attention towards it. However, there has been little previous research that has helped to understand how we integrate top-down cues such as semantic consistency and gaze to direct attention when searching for an object. Also, there are separate lines of evidence to suggest that older adults may be more influenced by semantic factors and less by gaze cues compared to younger counterparts, but this has not been investigated before in an integrated task. In the current study we analysed eye-movements of 34 younger and 30 older adults as they searched for a target object in complex visual scenes. Younger adults were influenced by semantic consistency in their attention to objects, but were more influenced by gaze cues. In contrast, older adults were more guided by semantic consistency in directing their attention, and showed less influence from gaze cues. These age differences in use of high-level cues were apparent early in processing (time to first fixation and probability of immediate fixation) but not in later processing (total time looking at objects and time to make a response). Overall, this pattern of findings indicates that people are influenced by both social cues and prior expectations when processing a complex scene, and the relative importance of these factors depends on age.
Collapse
|
47
|
Theeuwes J. Self-explaining roads: What does visual cognition tell us about designing safer roads? Cogn Res Princ Implic 2021; 6:15. [PMID: 33661408 PMCID: PMC8030273 DOI: 10.1186/s41235-021-00281-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 02/17/2021] [Indexed: 11/10/2022] Open
Abstract
In 1995, Theeuwes and Godthelp published a paper called "self-explaining roads," in which they argued for the development of a new concept for approaching safe road design. Since this publication, self-explaining roads (SER) became one of the leading principles in road design worldwide. The underlying notion is that roads should be designed in such a way that road users immediately know how to behave and what to expect on these roads. In other words, the environment should be designed such that it elicits adequate and safe behavior. The present paper describes in detail the theoretical basis for the idea of SER and explains why this has such a large effect on human behavior. It is argued that the notion is firmly rooted in the theoretical framework of statistical learning, subjective road categorization and the associated expectations. The paper illustrates some successful implementation and describes recent developments worldwide.
Collapse
Affiliation(s)
- Jan Theeuwes
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Van der Boechorststraat 7, 1081 BT, Amsterdam, The Netherlands.
- Institute Brain and Behavior Amsterdam (iBBA), Amsterdam, The Netherlands.
| |
Collapse
|
48
|
Bouwkamp FG, de Lange FP, Spaak E. No exploitation of temporal sequence context during visual search. ROYAL SOCIETY OPEN SCIENCE 2021; 8:201565. [PMID: 33959327 PMCID: PMC8074974 DOI: 10.1098/rsos.201565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 01/14/2021] [Indexed: 06/12/2023]
Abstract
The human visual system can rapidly extract regularities from our visual environment, generating predictive context. It has been shown that spatial predictive context can be used during visual search. We set out to see whether observers can additionally exploit temporal predictive context based on sequence order, using an extended version of a contextual cueing paradigm. Though we replicated the contextual cueing effect, repeating search scenes in a structured order versus a random order yielded no additional behavioural benefit. This was also true when we looked specifically at participants who revealed a sensitivity to spatial predictive context. We argue that spatial predictive context during visual search is more readily learned and subsequently exploited than temporal predictive context, potentially rendering the latter redundant. In conclusion, unlike spatial context, temporal context is not automatically extracted and used during visual search.
Collapse
Affiliation(s)
- Floortje G. Bouwkamp
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Floris P. de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eelke Spaak
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
49
|
Tatler BW. Searching in CCTV: effects of organisation in the multiplex. Cogn Res Princ Implic 2021; 6:11. [PMID: 33599890 PMCID: PMC7892658 DOI: 10.1186/s41235-021-00277-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/03/2021] [Indexed: 11/10/2022] Open
Abstract
CCTV plays a prominent role in public security, health and safety. Monitoring large arrays of CCTV camera feeds is a visually and cognitively demanding task. Arranging the scenes by geographical proximity in the surveilled environment has been recommended to reduce this demand, but empirical tests of this method have failed to find any benefit. The present study tests an alternative method for arranging scenes, based on psychological principles from literature on visual search and scene perception: grouping scenes by semantic similarity. Searching for a particular scene in the array-a common task in reactive and proactive surveillance-was faster when scenes were arranged by semantic category. This effect was found only when scenes were separated by gaps for participants who were not made aware that scenes in the multiplex were grouped by semantics (Experiment 1), but irrespective of whether scenes were separated by gaps or not for participants who were made aware of this grouping (Experiment 2). When target frequency varied between scene categories-mirroring unequal distributions of crime over space-the benefit of organising scenes by semantic category was enhanced for scenes in the most frequently searched-for category, without any statistical evidence for a cost when searching for rarely searched-for categories (Experiment 3). The findings extend current understanding of the role of within-scene semantics in visual search, to encompass between-scene semantic relationships. Furthermore, the findings suggest that arranging scenes in the CCTV control room by semantic category is likely to assist operators in finding specific scenes during surveillance.
Collapse
Affiliation(s)
- Benjamin W Tatler
- School of Psychology, University of Aberdeen, Aberdeen, AB24 3FX, Scotland, UK.
| |
Collapse
|
50
|
Tkachenko N, Procter R, Jarvis S. Quantifying people's experience during flood events with implications for hazard risk communication. PLoS One 2021; 16:e0244801. [PMID: 33411829 PMCID: PMC7790401 DOI: 10.1371/journal.pone.0244801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Accepted: 12/16/2020] [Indexed: 11/18/2022] Open
Abstract
Semantic drift is a well-known concept in distributional semantics, which is used to demonstrate gradual, long-term changes in meanings and sentiments of words and is largely detectable by studying the composition of large corpora. In our previous work, which used ontological relationships between words and phrases, we established that certain kinds of semantic micro-changes can be found in social media emerging around natural hazard events, such as floods. Our previous results confirmed that semantic drift in social media can be used to for early detection of floods and to increase the volume of 'useful' geo-referenced data for event monitoring. In this work we use deep learning in order to determine whether images associated with 'semantically drifted' social media tags reflect changes in crowd navigation strategies during floods. Our results show that alternative tags can be used to differentiate naïve and experienced crowds witnessing flooding of various degrees of severity.
Collapse
Affiliation(s)
- Nataliya Tkachenko
- Smith School of Enterprise and the Environment, School of Geography and the Environment, Oxford University Centre for the Environment, University of Oxford, Oxford, United Kingdom
- The Alan Turing Institute, The British Library, London, United Kingdom
| | - Rob Procter
- The Alan Turing Institute, The British Library, London, United Kingdom
- Department of Computer Science, University of Warwick, Coventry, United Kingdom
| | - Stephen Jarvis
- College of Engineering and Physical Sciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|