1
|
Amaral P, Silva F, Santos V. Recognition of Grasping Patterns Using Deep Learning for Human-Robot Collaboration. SENSORS (BASEL, SWITZERLAND) 2023; 23:8989. [PMID: 37960688 PMCID: PMC10650364 DOI: 10.3390/s23218989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/15/2023]
Abstract
Recent advances in the field of collaborative robotics aim to endow industrial robots with prediction and anticipation abilities. In many shared tasks, the robot's ability to accurately perceive and recognize the objects being manipulated by the human operator is crucial to make predictions about the operator's intentions. In this context, this paper proposes a novel learning-based framework to enable an assistive robot to recognize the object grasped by the human operator based on the pattern of the hand and finger joints. The framework combines the strengths of the commonly available software MediaPipe in detecting hand landmarks in an RGB image with a deep multi-class classifier that predicts the manipulated object from the extracted keypoints. This study focuses on the comparison between two deep architectures, a convolutional neural network and a transformer, in terms of prediction accuracy, precision, recall and F1-score. We test the performance of the recognition system on a new dataset collected with different users and in different sessions. The results demonstrate the effectiveness of the proposed methods, while providing valuable insights into the factors that limit the generalization ability of the models.
Collapse
Affiliation(s)
- Pedro Amaral
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal;
| | - Filipe Silva
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal;
| | - Vítor Santos
- Department of Mechanical Engineering (DEM), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal;
| |
Collapse
|
2
|
Wang X, Santos VJ. Gaze-Based Shared Autonomy Framework With Real-Time Action Primitive Recognition for Robot Manipulators. IEEE Trans Neural Syst Rehabil Eng 2023; 31:4306-4317. [PMID: 37906485 DOI: 10.1109/tnsre.2023.3328888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Robots capable of robust, real-time recognition of human intent during manipulation tasks could be used to enhance human-robot collaboration for activities of daily living. Eye gaze-based control interfaces offer a non-invasive way to infer intent and reduce the cognitive burden on operators of complex robots. Eye gaze is traditionally used for "gaze triggering" (GT) in which staring at an object, or sequence of objects, triggers pre-programmed robotic movements. We propose an alternative approach: a neural network-based "action prediction" (AP) mode that extracts gaze-related features to recognize, and often predict, an operator's intended action primitives. We integrated the AP mode into a shared autonomy framework capable of 3D gaze reconstruction, real-time intent inference, object localization, obstacle avoidance, and dynamic trajectory planning. Using this framework, we conducted a user study to directly compare the performance of the GT and AP modes using traditional subjective performance metrics, such as Likert scales, as well as novel objective performance metrics, such as the delay of recognition. Statistical analyses suggested that the AP mode resulted in more seamless robotic movement than the state-of-the-art GT mode, and that participants generally preferred the AP mode.
Collapse
|
3
|
Higa S, Yamada K, Kamisato S. Intelligent Eye-Controlled Electric Wheelchair Based on Estimating Visual Intentions Using One-Dimensional Convolutional Neural Network and Long Short-Term Memory. SENSORS (BASEL, SWITZERLAND) 2023; 23:4028. [PMID: 37112369 PMCID: PMC10145036 DOI: 10.3390/s23084028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/11/2023] [Accepted: 04/14/2023] [Indexed: 06/19/2023]
Abstract
When an electric wheelchair is operated using gaze motion, eye movements such as checking the environment and observing objects are also incorrectly recognized as input operations. This phenomenon is called the "Midas touch problem", and classifying visual intentions is extremely important. In this paper, we develop a deep learning model that estimates the user's visual intention in real time and an electric wheelchair control system that combines intention estimation and the gaze dwell time method. The proposed model consists of a 1DCNN-LSTM that estimates visual intention from feature vectors of 10 variables, such as eye movement, head movement, and distance to the fixation point. The evaluation experiments classifying four types of visual intentions show that the proposed model has the highest accuracy compared to other models. In addition, the results of the driving experiments of the electric wheelchair implementing the proposed model show that the user's efforts to operate the wheelchair are reduced and that the operability of the wheelchair is improved compared to the traditional method. From these results, we concluded that visual intentions could be more accurately estimated by learning time series patterns from eye and head movement data.
Collapse
Affiliation(s)
- Sho Higa
- Graduate School of Engineering and Science, University of the Ryukyus, Nishihara 903-0213, Japan
| | - Koji Yamada
- Department of Engineering, University of the Ryukyus, Nishihara 903-0213, Japan;
| | - Shihoko Kamisato
- Department of Information and Communication Systems Engineering, National Institute of Technology, Okinawa College, Nago 905-2171, Japan;
| |
Collapse
|
4
|
Gao Z, Wu S, Wan Z, Agaian S. A Hybrid Method for Implicit Intention Inference Based on Punished-Weighted Naïve Bayes. IEEE Trans Neural Syst Rehabil Eng 2023; 31:1826-1836. [PMID: 37030670 DOI: 10.1109/tnsre.2023.3259550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Gaze-based implicit intention inference provides a new human-robot interaction for people with disabilities to accomplish activities of daily living independently. Existing gaze-based intention inference is mainly implemented by the data-driven method without prior object information in intention expression, which yields low inference accuracy. Aiming to improve the inference accuracy, we propose a gaze-based hybrid method by integrating model-driven and data-driven intention inference tailored to disability applications. Specifically, intention is considered as the combination of verbs and nouns. The objects corresponding to the nouns are regarded as intention-interpreting objects and served as prior knowledge, i.e., punished factors. The punished factor considers the object information, i.e., the priority in object selection. Class-specific attribute weighted naïve Bayes model learned through training data is presented to represent the relationship among intentions and objects. An intention inference engine is developed by combining the human prior knowledge, and the data-driven class-specific attribute weighted naïve Bayes model. Computer simulations: (i) verify the contribution of each critical component of the proposed model, (ii) evaluate the inference accuracy of the proposed model, and (iii) show that the proposed method is superior to state-of-the-art intention inference methods in terms of accuracy.
Collapse
|
5
|
Johnson CM, Ruiz-Mendoza C, Schoenbeck C. Conspecific "gaze following" in bottlenose dolphins. Anim Cogn 2022; 25:1219-1229. [PMID: 36063306 PMCID: PMC9617818 DOI: 10.1007/s10071-022-01665-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 05/23/2022] [Accepted: 08/06/2022] [Indexed: 11/29/2022]
Abstract
"Gaze following"—when one individual witnesses another shift its orientation, and then re-orients in the same direction—has been observed in a wide range of species. Related work with dolphins has to date focused on human–dolphin interactions. In this conspecific study, we examined a group of dolphins orienting, in passing, to gateways between their pools, as opportunities for witnesses to demonstrate "gaze following". Seven bottlenose dolphins were synchronously videotaped on six underwater cameras, for 21 h over three days, and the recordings analyzed by trained observers. The identities of all animals present, their partner state, and whether and to what degree they had altered their access to the gate (e.g., from Monocular to Binocular, or Binocular to Visio-Echoic) was recorded. Compared to animals that did not witness such a change, witnesses of an increase in access by another dolphin were significantly more likely to also act to increase their own access. We observed 460 such cases of "gaze following" in these animals. Dolphins who were partnered (showed sustained swimming within 1 body length) were significantly more likely, than non-partnered animals, to "gaze follow". Dolphins also showed a significant tendency toward matching the kind of access they observed. No significant difference was found in the presence of animals in the back pools, during changes in orientation that were followed, versus in those that were not. These findings support adding bottlenose dolphins to the growing list of species that display conspecific "gaze following".
Collapse
Affiliation(s)
- Christine M Johnson
- Department of Cognitive Science, University of California, Gilman Drive, La Jolla, San Diego, 9500, USA.
| | - Christina Ruiz-Mendoza
- Department of Cognitive Science, University of California, Gilman Drive, La Jolla, San Diego, 9500, USA
| | - Clara Schoenbeck
- Marine Science Program, Scripps Institution of Oceanography, UCSD, Kennel Way, La Jolla, San Diego, CA, 8622, USA
| |
Collapse
|
6
|
Yuan Y, Liu J, Wu Z, Zhou G, Sommer W, Yue Z. Does Eye Gaze Uniquely Trigger Spatial Orienting to Socially Relevant Information? A Behavioral and ERP Study. Brain Sci 2022; 12:brainsci12091133. [PMID: 36138869 PMCID: PMC9497197 DOI: 10.3390/brainsci12091133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/16/2022] [Accepted: 08/22/2022] [Indexed: 11/16/2022] Open
Abstract
Using behavioral and event-related potential (ERP) measures, the present study examined whether eye gaze triggers a unique form of attentional orienting toward threat-relevant targets. A threatening or neutral target was presented after a non-predictive gaze or an arrow cue. In Experiment 1, reaction times indicated that eye gaze and arrow cues triggered different attention orienting towards threatening targets, which was confirmed by target-elicited P3b latency in Experiment 2. Specifically, for targets preceded by arrow and gaze cues, P3b peak latency was shorter for neutral targets than threatening targets. However, the latency differences were significantly smaller for gaze cues than for arrow cues. Moreover, target-elicited N2 amplitude indicated a significantly stronger cue validity effect of eye gaze than that of arrows. These findings suggest that eye gaze uniquely triggers spatial attention orienting to socially threatening information.
Collapse
Affiliation(s)
- Yichen Yuan
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinqun Liu
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou 510006, China
| | - Zehua Wu
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou 510006, China
| | - Guomei Zhou
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou 510006, China
| | - Werner Sommer
- Institut für Psychologie, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
- Department of Psychology, Zhejiang Normal University, Jinhua 321004, China
- Correspondence: (W.S.); (Z.Y.)
| | - Zhenzhu Yue
- Guangdong Provincial Key Laboratory of Social Cognitive Neuroscience and Mental Health, Department of Psychology, Sun Yat-sen University, Guangzhou 510006, China
- Correspondence: (W.S.); (Z.Y.)
| |
Collapse
|
7
|
Abstract
With the increasing need for eye tracking in head-mounted virtual reality displays, the gaze-based modality has the potential to predict user intention and unlock intuitive new interaction schemes. In the present work, we explore whether gaze-based data and hand-eye coordination data can predict a user’s interaction intention with the digital world, which could be used to develop predictive interfaces. We validate it on the eye-tracking data collected from 10 participants in item selection and teleporting tasks in virtual reality. We demonstrate successful prediction of the onset of item selection and teleporting with an 0.943 F1-Score using a Gradient Boosting Decision Tree, which is the best among the four classifiers compared, while the model size of the Support Vector Machine is the smallest. It is also proven that hand-eye-coordination-related features can improve interaction intention recognition in virtual reality environments.
Collapse
|
8
|
Peacock CE, Zhang T, David-John B, Murdison TS, Boring MJ, Benko H, Jonker TR. Gaze dynamics are sensitive to target orienting for working memory encoding in virtual reality. J Vis 2022; 22:2. [PMID: 34982104 PMCID: PMC8742516 DOI: 10.1167/jov.22.1.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Numerous studies have demonstrated that visuospatial attention is a requirement for successful working memory encoding. It is unknown, however, whether this established relationship manifests in consistent gaze dynamics as people orient their visuospatial attention toward an encoding target when searching for information in naturalistic environments. To test this hypothesis, participants' eye movements were recorded while they searched for and encoded objects in a virtual apartment (Experiment 1). We decomposed gaze into 61 features that capture gaze dynamics and a trained sliding window logistic regression model that has potential for use in real-time systems to predict when participants found target objects for working memory encoding. A model trained on group data successfully predicted when people oriented to a target for encoding for the trained task (Experiment 1) and for a novel task (Experiment 2), where a new set of participants found objects and encoded an associated nonword in a cluttered virtual kitchen. Six of these features were predictive of target orienting for encoding, even during the novel task, including decreased distances between subsequent fixation/saccade events, increased fixation probabilities, and slower saccade decelerations before encoding. This suggests that as people orient toward a target to encode new information at the end of search, they decrease task-irrelevant, exploratory sampling behaviors. This behavior was common across the two studies. Together, this research demonstrates how gaze dynamics can be used to capture target orienting for working memory encoding and has implications for real-world use in technology and special populations.
Collapse
Affiliation(s)
| | - Ting Zhang
- Reality Labs Research; Redmond, WA, USA.,
| | | | | | | | | | | |
Collapse
|
9
|
Fang X, Sun Y, Zheng X, Wang X, Deng X, Wang M. Assessing Deception in Questionnaire Surveys With Eye-Tracking. Front Psychol 2021; 12:774961. [PMID: 34880817 PMCID: PMC8646095 DOI: 10.3389/fpsyg.2021.774961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 10/26/2021] [Indexed: 11/24/2022] Open
Abstract
Deceit often occurs in questionnaire surveys, which leads to the misreporting of data and poor reliability. The purpose of this study is to explore whether eye-tracking could contribute to the detection of deception in questionnaire surveys, and whether the eye behaviors that appeared in instructed lying still exist in spontaneous lying. Two studies were conducted to explore eye movement behaviors in instructed and spontaneous lying conditions. The results showed that pupil size and fixation behaviors are both reliable indicators to detect lies in questionnaire surveys. Blink and saccade behaviors do not seem to predict deception. Deception resulted in increased pupil size, fixation count and duration. Meanwhile, respondents focused on different areas of the questionnaire when lying versus telling the truth. Furthermore, in the actual deception situation, the linear support vector machine (SVM) deception classifier achieved an accuracy of 74.09%. In sum, this study indicates the eye-tracking signatures of lying are not restricted to instructed deception, demonstrates the potential of using eye-tracking to detect deception in questionnaire surveys, and contributes to the questionnaire surveys of sensitive issues.
Collapse
Affiliation(s)
- Xinyue Fang
- School of Mechanical Engineering, Sichuan University, Chengdu, China
| | - Yiteng Sun
- School of Design, South China University of Technology, Guangzhou, China
| | - Xinyi Zheng
- School of Mechanical Engineering, Sichuan University, Chengdu, China
| | - Xinrong Wang
- School of Mechanical Engineering, Sichuan University, Chengdu, China
| | - Xuemei Deng
- School of Mechanical Engineering, Sichuan University, Chengdu, China
| | - Mei Wang
- School of Mechanical Engineering, Sichuan University, Chengdu, China
| |
Collapse
|
10
|
Bolarinwa J, Eimontaite I, Mitchell T, Dogramadzi S, Caleb-Solly P. Assessing the Role of Gaze Tracking in Optimizing Humans-In-The-Loop Telerobotic Operation Using Multimodal Feedback. Front Robot AI 2021; 8:578596. [PMID: 34671646 PMCID: PMC8521448 DOI: 10.3389/frobt.2021.578596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 08/02/2021] [Indexed: 12/01/2022] Open
Abstract
A key challenge in achieving effective robot teleoperation is minimizing teleoperators’ cognitive workload and fatigue. We set out to investigate the extent to which gaze tracking data can reveal how teleoperators interact with a system. In this study, we present an analysis of gaze tracking, captured as participants completed a multi-stage task: grasping and emptying the contents of a jar into a container. The task was repeated with different combinations of visual, haptic, and verbal feedback. Our aim was to determine if teleoperation workload can be inferred by combining the gaze duration, fixation count, task completion time, and complexity of robot motion (measured as the sum of robot joint steps) at different stages of the task. Visual information of the robot workspace was captured using four cameras, positioned to capture the robot workspace from different angles. These camera views (aerial, right, eye-level, and left) were displayed through four quadrants (top-left, top-right, bottom-left, and bottom-right quadrants) of participants’ video feedback computer screen, respectively. We found that the gaze duration and the fixation count were highly dependent on the stage of the task and the feedback scenario utilized. The results revealed that combining feedback modalities reduced the cognitive workload (inferred by investigating the correlation between gaze duration, fixation count, task completion time, success or failure of task completion, and robot gripper trajectories), particularly in the task stages that require more precision. There was a significant positive correlation between gaze duration and complexity of robot joint movements. Participants’ gaze outside the areas of interest (distractions) was not influenced by feedback scenarios. A learning effect was observed in the use of the controller for all participants as they repeated the task with different feedback combination scenarios. To design a system for teleoperation, applicable in healthcare, we found that the analysis of teleoperators’ gaze can help understand how teleoperators interact with the system, hence making it possible to develop the system from the teleoperators’ stand point.
Collapse
Affiliation(s)
- Joseph Bolarinwa
- Bristol Robotics Laboratory, University of the West of England (UWE), Bristol, United Kingdom
| | - Iveta Eimontaite
- Bristol Robotics Laboratory, University of the West of England (UWE), Bristol, United Kingdom
| | - Tom Mitchell
- Creative Technologies Lab, University of the West of England (UWE), Bristol, United Kingdom
| | - Sanja Dogramadzi
- Bristol Robotics Laboratory, University of the West of England (UWE), Bristol, United Kingdom
| | - Praminda Caleb-Solly
- Bristol Robotics Laboratory, University of the West of England (UWE), Bristol, United Kingdom
| |
Collapse
|
11
|
Ghiglino D, Willemse C, De Tommaso D, Wykowska A. Mind the Eyes: Artificial Agents' Eye Movements Modulate Attentional Engagement and Anthropomorphic Attribution. Front Robot AI 2021; 8:642796. [PMID: 34124174 PMCID: PMC8192967 DOI: 10.3389/frobt.2021.642796] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
Artificial agents are on their way to interact with us daily. Thus, the design of embodied artificial agents that can easily cooperate with humans is crucial for their deployment in social scenarios. Endowing artificial agents with human-like behavior may boost individuals' engagement during the interaction. We tested this hypothesis in two screen-based experiments. In the first one, we compared attentional engagement displayed by participants while they observed the same set of behaviors displayed by an avatar of a humanoid robot and a human. In the second experiment, we assessed the individuals' tendency to attribute anthropomorphic traits towards the same agents displaying the same behaviors. The results of both experiments suggest that individuals need less effort to process and interpret an artificial agent's behavior when it closely resembles one of a human being. Our results support the idea that including subtle hints of human-likeness in artificial agents' behaviors would ease the communication between them and the human counterpart during interactive scenarios.
Collapse
Affiliation(s)
- Davide Ghiglino
- Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy
- DIBRIS, Università Degli Studi di Genova, Genova, Italy
| | - Cesco Willemse
- Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy
| | - Davide De Tommaso
- Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy
| | - Agnieszka Wykowska
- Social Cognition in Human-Robot Interaction, Istituto Italiano di Tecnologia, Genova, Italy
| |
Collapse
|
12
|
Koochaki F, Najafizadeh L. A Data-Driven Framework for Intention Prediction via Eye Movement With Applications to Assistive Systems. IEEE Trans Neural Syst Rehabil Eng 2021; 29:974-984. [PMID: 34038364 DOI: 10.1109/tnsre.2021.3083815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Fast and accurate human intention prediction can significantly advance the performance of assistive devices for patients with limited motor or communication abilities. Among available modalities, eye movement can be valuable for inferring the user's intention, as it can be tracked non-invasively. However, existing limited studies in this domain do not provide the level of accuracy required for the reliable operation of assistive systems. By taking a data-driven approach, this paper presents a new framework that utilizes the spatial and temporal patterns of eye movement along with deep learning to predict the user's intention. In the proposed framework, the spatial patterns of gaze are identified by clustering the gaze points based on their density over displayed images in order to find the regions of interest (ROIs). The temporal patterns of gaze are identified via hidden Markov models (HMMs) to find the transition sequence between ROIs. Transfer learning is utilized to identify the objects of interest in the displayed images. Finally, models are developed to predict the user's intention after completing the task as well as at early stages of the task. The proposed framework is evaluated in an experiment involving predicting intended daily-life activities. Results indicate that an average classification accuracy of 97.42% is achieved, which is considerably higher than existing gaze-based intention prediction studies.
Collapse
|
13
|
Singh R, Miller T, Newn J, Velloso E, Vetere F, Sonenberg L. Combining gaze and AI planning for online human intention recognition. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2020.103275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
Scrivner C, Choe KW, Henry J, Lyu M, Maestripieri D, Berman MG. Violence reduces attention to faces and draws attention to points of contact. Sci Rep 2019; 9:17779. [PMID: 31780726 PMCID: PMC6883035 DOI: 10.1038/s41598-019-54327-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/09/2019] [Indexed: 11/09/2022] Open
Abstract
Although violence is a frequently researched topic, little is known about how different social features influence information gathering from violent interactions. Regions of an interaction that provide contextual information should receive more attention. We predicted the most informative features of a violent social interaction would be faces, points of contact, and objects being held. To test this, we tracked the eyes of 90 participants as they viewed images of social interactions that varied with respect to violence. When viewing violent interactions, participants attended significantly less to faces and significantly more to points of contact. Moreover, first-fixation analysis suggests that some of these biases are present from the beginning of scene-viewing. These findings are the first to demonstrate the visual relevance of faces and contact points in gathering information from violent social interactions. These results also question the attentional dominance of faces in active social scenes, highlighting the importance of using a variety of stimuli and contexts in social cognition research.
Collapse
Affiliation(s)
- Coltan Scrivner
- Department of Comparative Human Development, The University of Chicago, Chicago, IL, USA. .,Institute for Mind and Biology, The University of Chicago, Chicago, IL, USA.
| | - Kyoung Whan Choe
- Department of Psychology, The University of Chicago, Chicago, IL, USA.,Mansueto Institute for Urban Innovation, The University of Chicago, Chicago, IL, USA
| | - Joseph Henry
- Institute for Mind and Biology, The University of Chicago, Chicago, IL, USA.,Department of Psychology, The University of Chicago, Chicago, IL, USA
| | - Muxuan Lyu
- Department of Psychology, The University of Chicago, Chicago, IL, USA
| | - Dario Maestripieri
- Department of Comparative Human Development, The University of Chicago, Chicago, IL, USA.,Institute for Mind and Biology, The University of Chicago, Chicago, IL, USA
| | - Marc G Berman
- Department of Psychology, The University of Chicago, Chicago, IL, USA.,Grossman Institute for Neuroscience, Quantitative Biology, and Human Behavior, Chicago, IL, USA
| |
Collapse
|
15
|
Zhang M, Ma KT, Lim JH, Zhao Q, Feng J. Anticipating Where People will Look Using Adversarial Networks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:1783-1796. [PMID: 30273143 DOI: 10.1109/tpami.2018.2871688] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We introduce a new problem of gaze anticipation on future frames which extends the conventional gaze prediction problem to go beyond current frames. To solve this problem, we propose a new generative adversarial network based model, Deep Future Gaze (DFG), encompassing two pathways: DFG-P is to anticipate gaze prior maps conditioned on the input frame which provides task influences; DFG-G is to learn to model both semantic and motion information in future frame generation. DFG-P and DFG-G are then fused to anticipate future gazes. DFG-G consists of two networks: a generator and a discriminator. The generator uses a two-stream spatial-temporal convolution architecture (3D-CNN) for explicitly untangling the foreground and background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by distinguishing the synthetic frames of the generator from the real frames. Experimental results on the publicly available egocentric and third person video datasets show that DFG significantly outperforms all competitive baselines. We also demonstrate that DFG achieves better performance of gaze prediction on current frames in egocentric and third person videos than state-of-the-art methods.
Collapse
|
16
|
Koochaki F, Najafizadeh L. Eye Gaze-based Early Intent Prediction Utilizing CNN-LSTM. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2019:1310-1313. [PMID: 31946133 DOI: 10.1109/embc.2019.8857054] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In assistive technologies designed for patients with extremely limited motor or communication capabilities, it is of significant importance to accurately predict the intention of the user, in a timely manner. This paper presents a new framework for the early prediction of the user's intent via their eye gaze. The seen objects in the displayed images, and the order of their selection are identified from the spatial and temporal information of the gaze. By employing a combination of convolution neuronal network (CNN) and long short term memory (LSTM), early prediction of the user's intention is enabled. The proposed framework is tested using experimental data obtained from eight subjects. Results demonstrate an average accuracy of 82.27% across all considered intended tasks for early prediction, confirming the effectiveness of the proposed method.
Collapse
|
17
|
Gramazio CC, Huang J, Laidlaw DH. An Analysis of Automated Visual Analysis Classification: Interactive Visualization Task Inference of Cancer Genomics Domain Experts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2270-2283. [PMID: 28783637 DOI: 10.1109/tvcg.2017.2734659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We show how mouse interaction log classification can help visualization toolsmiths understand how their tools are used "in the wild" through an evaluation of MAGI - a cancer genomics visualization tool. Our primary contribution is an evaluation of twelve visual analysis task classifiers, which compares predictions to task inferences made by pairs of genomics and visualization experts. Our evaluation uses common classifiers that are accessible to most visualization evaluators: -nearest neighbors, linear support vector machines, and random forests. By comparing classifier predictions to visual analysis task inferences made by experts, we show that simple automated task classification can have up to 73 percent accuracy and can separate meaningful logs from "junk" logs with up to 91 percent accuracy. Our second contribution is an exploration of common MAGI interaction trends using classification predictions, which expands current knowledge about ecological cancer genomics visualization tasks. Our third contribution is a discussion of how automated task classification can inform iterative tool design. These contributions suggest that mouse interaction log analysis is a viable method for (1) evaluating task requirements of client-side-focused tools, (2) allowing researchers to study experts on larger scales than is typically possible with in-lab observation, and (3) highlighting potential tool evaluation bias.
Collapse
|
18
|
Shukla D, Erkent Ö, Piater J. Learning Semantics of Gestural Instructions for Human-Robot Collaboration. Front Neurorobot 2018; 12:7. [PMID: 29615888 PMCID: PMC5868127 DOI: 10.3389/fnbot.2018.00007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 02/02/2018] [Indexed: 11/13/2022] Open
Abstract
Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions.
Collapse
Affiliation(s)
- Dadhichi Shukla
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Özgür Erkent
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Justus Piater
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
19
|
Nie QY, Ding X, Chen J, Conci M. Social attention directs working memory maintenance. Cognition 2017; 171:85-94. [PMID: 29121587 DOI: 10.1016/j.cognition.2017.10.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 10/31/2017] [Accepted: 10/31/2017] [Indexed: 11/29/2022]
Abstract
Visual working memory (vWM) performance is enhanced when a memorized object is cued after encoding. This so-called retro-cue effect is typically observed with a predictive (80% valid), retrospective cue. The current study examined whether a nonpredictive (50% valid) retro-cue can similarly enhance internal memory representations in cases where the cue conveys social signals. To this end, gaze cues were presented during the retention interval of a change-detection task, which are capable to engender a mutual attentional focus of two individuals towards one location. In line with our prediction, Experiment 1 demonstrated that a polygon presented at the gazed-at location was remembered better than that at both non-gazed and gazed-away locations. Experiments 2 and 3 showed that low-level motion cues did not elicit attentional orienting in a comparable manner as the gaze cue, and these differences in cuing were found to be reliable and independent of memory load. Furthermore, the gaze retro-cue effect disappeared when the face was inverted (Experiment 4). In sum, these results clearly show that sharing the focus of another individual establishes a point of reference from which visual information is restored with priority, suggesting that a gaze retro-cue leads to social attention, thus, modulating vWM maintenance in a reflexive, automatic manner.
Collapse
Affiliation(s)
- Qi-Yang Nie
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Xiaowei Ding
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, People's Republic of China.
| | - Jianyong Chen
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, People's Republic of China.
| | - Markus Conci
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
20
|
Baraglia J, Cakmak M, Nagai Y, Rao RPN, Asada M. Efficient human-robot collaboration: When should a robot take initiative? Int J Rob Res 2017. [DOI: 10.1177/0278364916688253] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Jimmy Baraglia
- Graduate School of Engineering, Department of Adaptive Machine Science, Osaka University, Japan
| | - Maya Cakmak
- Computer Science & Engineering, University of Washington, USA
| | - Yukie Nagai
- Graduate School of Engineering, Department of Adaptive Machine Science, Osaka University, Japan
| | - Rajesh PN Rao
- Computer Science & Engineering, University of Washington, USA
| | - Minoru Asada
- Graduate School of Engineering, Department of Adaptive Machine Science, Osaka University, Japan
| |
Collapse
|
21
|
Andrist S, Collier W, Gleicher M, Mutlu B, Shaffer D. Look together: analyzing gaze coordination with epistemic network analysis. Front Psychol 2015; 6:1016. [PMID: 26257677 PMCID: PMC4508484 DOI: 10.3389/fpsyg.2015.01016] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2015] [Accepted: 07/06/2015] [Indexed: 11/13/2022] Open
Abstract
When conversing and collaborating in everyday situations, people naturally and interactively align their behaviors with each other across various communication channels, including speech, gesture, posture, and gaze. Having access to a partner's referential gaze behavior has been shown to be particularly important in achieving collaborative outcomes, but the process in which people's gaze behaviors unfold over the course of an interaction and become tightly coordinated is not well understood. In this paper, we present work to develop a deeper and more nuanced understanding of coordinated referential gaze in collaborating dyads. We recruited 13 dyads to participate in a collaborative sandwich-making task and used dual mobile eye tracking to synchronously record each participant's gaze behavior. We used a relatively new analysis technique—epistemic network analysis—to jointly model the gaze behaviors of both conversational participants. In this analysis, network nodes represent gaze targets for each participant, and edge strengths convey the likelihood of simultaneous gaze to the connected target nodes during a given time-slice. We divided collaborative task sequences into discrete phases to examine how the networks of shared gaze evolved over longer time windows. We conducted three separate analyses of the data to reveal (1) properties and patterns of how gaze coordination unfolds throughout an interaction sequence, (2) optimal time lags of gaze alignment within a dyad at different phases of the interaction, and (3) differences in gaze coordination patterns for interaction sequences that lead to breakdowns and repairs. In addition to contributing to the growing body of knowledge on the coordination of gaze behaviors in joint activities, this work has implications for the design of future technologies that engage in situated interactions with human users.
Collapse
Affiliation(s)
- Sean Andrist
- Department of Computer Sciences, University of Wisconsin-Madison Madison, WI, USA
| | - Wesley Collier
- Department of Educational Psychology, University of Wisconsin-Madison Madison, WI, USA
| | - Michael Gleicher
- Department of Computer Sciences, University of Wisconsin-Madison Madison, WI, USA
| | - Bilge Mutlu
- Department of Computer Sciences, University of Wisconsin-Madison Madison, WI, USA
| | - David Shaffer
- Department of Educational Psychology, University of Wisconsin-Madison Madison, WI, USA
| |
Collapse
|