1
|
Bräutigam LC, Leuthold H, Mackenzie IG, Mittelstädt V. Exploring behavioral adjustments of proportion congruency manipulations in an Eriksen flanker task with visual and auditory distractor modalities. Mem Cognit 2024; 52:91-114. [PMID: 37548866 PMCID: PMC10806239 DOI: 10.3758/s13421-023-01447-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2023] [Indexed: 08/08/2023]
Abstract
The present study investigated global behavioral adaptation effects to conflict arising from different distractor modalities. Three experiments were conducted using an Eriksen flanker paradigm with constant visual targets, but randomly varying auditory or visual distractors. In Experiment 1, the proportion of congruent to incongruent trials was varied for both distractor modalities, whereas in Experiments 2A and 2B, this proportion congruency (PC) manipulation was applied to trials with one distractor modality (inducer) to test potential behavioral transfer effects to trials with the other distractor modality (diagnostic). In all experiments, mean proportion congruency effects (PCEs) were present in trials with a PC manipulation, but there was no evidence of transfer to diagnostic trials in Experiments 2A and 2B. Distributional analyses (delta plots) provided further evidence for distractor modality-specific global behavioral adaptations by showing differences in the slope of delta plots with visual but not auditory distractors when increasing the ratio of congruent trials. Thus, it is suggested that distractor modalities constrain global behavioral adaptation effects due to the learning of modality-specific memory traces (e.g., distractor-target associations) and/or the modality-specific cognitive control processes (e.g., suppression of modality-specific distractor-based activation). Moreover, additional analyses revealed partial transfer of the congruency sequence effect across trials with different distractor modalities suggesting that distractor modality may differentially affect local and global behavioral adaptations.
Collapse
Affiliation(s)
- Linda C Bräutigam
- Department of Psychology, University of Tübingen, Schleichstrasse 4, 72076, Tübingen, Germany.
| | - Hartmut Leuthold
- Department of Psychology, University of Tübingen, Schleichstrasse 4, 72076, Tübingen, Germany
| | - Ian G Mackenzie
- Department of Psychology, University of Tübingen, Schleichstrasse 4, 72076, Tübingen, Germany
| | - Victor Mittelstädt
- Department of Psychology, University of Tübingen, Schleichstrasse 4, 72076, Tübingen, Germany
| |
Collapse
|
2
|
Fu D, Abawi F, Carneiro H, Kerzel M, Chen Z, Strahl E, Liu X, Wermter S. A Trained Humanoid Robot can Perform Human-Like Crossmodal Social Attention and Conflict Resolution. Int J Soc Robot 2023; 15:1-16. [PMID: 37359433 PMCID: PMC10067521 DOI: 10.1007/s12369-023-00993-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/08/2023] [Indexed: 04/05/2023]
Abstract
To enhance human-robot social interaction, it is essential for robots to process multiple social cues in a complex real-world environment. However, incongruency of input information across modalities is inevitable and could be challenging for robots to process. To tackle this challenge, our study adopted the neurorobotic paradigm of crossmodal conflict resolution to make a robot express human-like social attention. A behavioural experiment was conducted on 37 participants for the human study. We designed a round-table meeting scenario with three animated avatars to improve ecological validity. Each avatar wore a medical mask to obscure the facial cues of the nose, mouth, and jaw. The central avatar shifted its eye gaze while the peripheral avatars generated sound. Gaze direction and sound locations were either spatially congruent or incongruent. We observed that the central avatar's dynamic gaze could trigger crossmodal social attention responses. In particular, human performance was better under the congruent audio-visual condition than the incongruent condition. Our saliency prediction model was trained to detect social cues, predict audio-visual saliency, and attend selectively for the robot study. After mounting the trained model on the iCub, the robot was exposed to laboratory conditions similar to the human experiment. While the human performance was overall superior, our trained model demonstrated that it could replicate attention responses similar to humans.
Collapse
Affiliation(s)
- Di Fu
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Fares Abawi
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Hugo Carneiro
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Matthias Kerzel
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Ziwei Chen
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Erik Strahl
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Xun Liu
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Stefan Wermter
- Department of Informatics, University of Hamburg, Hamburg, Germany
| |
Collapse
|
3
|
Influence of different cues on the color-flavor incongruency effect during packaging searching. Food Qual Prefer 2022. [DOI: 10.1016/j.foodqual.2021.104521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
4
|
Cui J, Sawamura D, Sakuraba S, Saito R, Tanabe Y, Miura H, Sugi M, Yoshida K, Watanabe A, Tokikuni Y, Yoshida S, Sakai S. Effect of Audiovisual Cross-Modal Conflict during Working Memory Tasks: A Near-Infrared Spectroscopy Study. Brain Sci 2022; 12:brainsci12030349. [PMID: 35326305 PMCID: PMC8946709 DOI: 10.3390/brainsci12030349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/01/2022] [Accepted: 03/01/2022] [Indexed: 12/04/2022] Open
Abstract
Cognitive conflict effects are well characterized within unimodality. However, little is known about cross-modal conflicts and their neural bases. This study characterizes the two types of visual and auditory cross-modal conflicts through working memory tasks and brain activities. The participants consisted of 31 healthy, right-handed, young male adults. The Paced Auditory Serial Addition Test (PASAT) and the Paced Visual Serial Addition Test (PVSAT) were performed under distractor and no distractor conditions. Distractor conditions comprised two conditions in which either the PASAT or PVSAT was the target task, and the other was used as a distractor stimulus. Additionally, oxygenated hemoglobin (Oxy-Hb) concentration changes in the frontoparietal regions were measured during tasks. The results showed significantly lower PASAT performance under distractor conditions than under no distractor conditions, but not in the PVSAT. Oxy-Hb changes in the bilateral ventrolateral prefrontal cortex (VLPFC) and inferior parietal cortex (IPC) significantly increased in the PASAT with distractor compared with no distractor conditions, but not in the PVSAT. Furthermore, there were significant positive correlations between Δtask performance accuracy and ΔOxy-Hb in the bilateral IPC only in the PASAT. Visual cross-modal conflict significantly impairs auditory task performance, and bilateral VLPFC and IPC are key regions in inhibiting visual cross-modal distractors.
Collapse
Affiliation(s)
- Jiahong Cui
- Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (J.C.); (R.S.); (H.M.); (A.W.); (Y.T.)
| | - Daisuke Sawamura
- Department of Rehabilitation Science, Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (K.Y.); (S.S.)
- Correspondence:
| | - Satoshi Sakuraba
- Department of Rehabilitation Sciences, Health Sciences University of Hokkaido, Sapporo 061-0293, Japan; (S.S.); (S.Y.)
| | - Ryuji Saito
- Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (J.C.); (R.S.); (H.M.); (A.W.); (Y.T.)
| | - Yoshinobu Tanabe
- Department of Rehabilitation, Shinsapporo Paulo Hospital, Sapporo 004-0002, Japan;
| | - Hiroshi Miura
- Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (J.C.); (R.S.); (H.M.); (A.W.); (Y.T.)
| | - Masaaki Sugi
- Department of Rehabilitation, Tokeidai Memorial Hospital, Sapporo 060-0031, Japan;
| | - Kazuki Yoshida
- Department of Rehabilitation Science, Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (K.Y.); (S.S.)
| | - Akihiro Watanabe
- Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (J.C.); (R.S.); (H.M.); (A.W.); (Y.T.)
| | - Yukina Tokikuni
- Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (J.C.); (R.S.); (H.M.); (A.W.); (Y.T.)
| | - Susumu Yoshida
- Department of Rehabilitation Sciences, Health Sciences University of Hokkaido, Sapporo 061-0293, Japan; (S.S.); (S.Y.)
| | - Shinya Sakai
- Department of Rehabilitation Science, Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812, Japan; (K.Y.); (S.S.)
| |
Collapse
|
5
|
Sun PW, Hines A. Listening Effort Informed Quality of Experience Evaluation. Front Psychol 2022; 12:767840. [PMID: 35069342 PMCID: PMC8766726 DOI: 10.3389/fpsyg.2021.767840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/31/2021] [Indexed: 11/15/2022] Open
Abstract
Perceived quality of experience for speech listening is influenced by cognitive processing and can affect a listener's comprehension, engagement and responsiveness. Quality of Experience (QoE) is a paradigm used within the media technology community to assess media quality by linking quantifiable media parameters to perceived quality. The established QoE framework provides a general definition of QoE, categories of possible quality influencing factors, and an identified QoE formation pathway. These assist researchers to implement experiments and to evaluate perceived quality for any applications. The QoE formation pathways in the current framework do not attempt to capture cognitive effort effects and the standard experimental assessments of QoE minimize the influence from cognitive processes. The impact of cognitive processes and how they can be captured within the QoE framework have not been systematically studied by the QoE research community. This article reviews research from the fields of audiology and cognitive science regarding how cognitive processes influence the quality of listening experience. The cognitive listening mechanism theories are compared with the QoE formation mechanism in terms of the quality contributing factors, experience formation pathways, and measures for experience. The review prompts a proposal to integrate mechanisms from audiology and cognitive science into the existing QoE framework in order to properly account for cognitive load in speech listening. The article concludes with a discussion regarding how an extended framework could facilitate measurement of QoE in broader and more realistic application scenarios where cognitive effort is a material consideration.
Collapse
Affiliation(s)
- Pheobe Wenyi Sun
- QxLab, School of Computer Science, University College Dublin, Dublin, Ireland
| | - Andrew Hines
- QxLab, School of Computer Science, University College Dublin, Dublin, Ireland
| |
Collapse
|
6
|
Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review. Symmetry (Basel) 2021. [DOI: 10.3390/sym13020214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined.
Collapse
|
7
|
Higgen FL, Ruppel P, Görner M, Kerzel M, Hendrich N, Feldheim J, Wermter S, Zhang J, Gerloff C. Crossmodal Pattern Discrimination in Humans and Robots: A Visuo-Tactile Case Study. Front Robot AI 2020; 7:540565. [PMID: 33501309 PMCID: PMC7805622 DOI: 10.3389/frobt.2020.540565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 12/02/2020] [Indexed: 12/03/2022] Open
Abstract
The quality of crossmodal perception hinges on two factors: The accuracy of the independent unimodal perception and the ability to integrate information from different sensory systems. In humans, the ability for cognitively demanding crossmodal perception diminishes from young to old age. Here, we propose a new approach to research to which degree the different factors contribute to crossmodal processing and the age-related decline by replicating a medical study on visuo-tactile crossmodal pattern discrimination utilizing state-of-the-art tactile sensing technology and artificial neural networks (ANN). We implemented two ANN models to specifically focus on the relevance of early integration of sensory information during the crossmodal processing stream as a mechanism proposed for efficient processing in the human brain. Applying an adaptive staircase procedure, we approached comparable unimodal classification performance for both modalities in the human participants as well as the ANN. This allowed us to compare crossmodal performance between and within the systems, independent of the underlying unimodal processes. Our data show that unimodal classification accuracies of the tactile sensing technology are comparable to humans. For crossmodal discrimination of the ANN the integration of high-level unimodal features on earlier stages of the crossmodal processing stream shows higher accuracies compared to the late integration of independent unimodal classifications. In comparison to humans, the ANN show higher accuracies than older participants in the unimodal as well as the crossmodal condition, but lower accuracies than younger participants in the crossmodal task. Taken together, we can show that state-of-the-art tactile sensing technology is able to perform a complex tactile recognition task at levels comparable to humans. For crossmodal processing, human inspired early sensory integration seems to improve the performance of artificial neural networks. Still, younger participants seem to employ more efficient crossmodal integration mechanisms than modeled in the proposed ANN. Our work demonstrates how collaborative research in neuroscience and embodied artificial neurocognitive models can help to derive models to inform the design of future neurocomputational architectures.
Collapse
Affiliation(s)
- Focko L. Higgen
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Philipp Ruppel
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Michael Görner
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Matthias Kerzel
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Norman Hendrich
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Jan Feldheim
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Wermter
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Jianwei Zhang
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Christian Gerloff
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|