1
|
Robinson N, Tidd B, Campbell D, Kulić D, Corke P. Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION 2022. [DOI: 10.1145/3570731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Robotic vision for human-robot interaction and collaboration is a critical process for robots to collect and interpret detailed information related to human actions, goals, and preferences, enabling robots to provide more useful services to people. This survey and systematic review presents a comprehensive analysis on robotic vision in human-robot interaction and collaboration over the last 10 years. From a detailed search of 3850 articles, systematic extraction and evaluation was used to identify and explore 310 papers in depth. These papers described robots with some level of autonomy using robotic vision for locomotion, manipulation and/or visual communication to collaborate or interact with people. This paper provides an in-depth analysis of current trends, common domains, methods and procedures, technical processes, data sets and models, experimental testing, sample populations, performance metrics and future challenges. This manuscript found that robotic vision was often used in action and gesture recognition, robot movement in human spaces, object handover and collaborative actions, social communication and learning from demonstration. Few high-impact and novel techniques from the computer vision field had been translated into human-robot interaction and collaboration. Overall, notable advancements have been made on how to develop and deploy robots to assist people.
Collapse
Affiliation(s)
- Nicole Robinson
- Australian Research Council Centre of Excellence for Robotic Vision, School of Electrical Engineering & Robotics, QUT Centre for Robotics, Queensland University of Technology. Faculty of Engineering, Turner Institute for Brain and Mental Health, Monash University, Australia
| | - Brendan Tidd
- Australian Research Council Centre of Excellence for Robotic Vision, School of Electrical Engineering & Robotics, QUT Centre for Robotics, Queensland University of Technology, Australia
| | - Dylan Campbell
- Visual Geometry Group, Department of Engineering Science, University of Oxford, United Kingdom
| | - Dana Kulić
- Australian Research Council Centre of Excellence for Robotic Vision, Faculty of Engineering, Monash University, Australia
| | - Peter Corke
- Australian Research Council Centre of Excellence for Robotic Vision, School of Electrical Engineering & Robotics, QUT Centre for Robotics, Queensland University of Technology, Australia
| |
Collapse
|
2
|
Few-shot re-identification of the speaker by social robots. Auton Robots 2022. [DOI: 10.1007/s10514-022-10073-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
AbstractNowadays advanced machine learning, computer vision, audio analysis and natural language understanding systems can be widely used for improving the perceptive and reasoning capabilities of the social robots. In particular, artificial intelligence algorithms for speaker re-identification make the robot aware of its interlocutor and able to personalize the conversation according to the information gathered in real-time and in the past interactions with the speaker. Anyway, this kind of application requires to train neural networks having available only a few samples for each speaker. Within this context, in this paper we propose a social robot equipped with a microphone sensor and a smart deep learning algorithm for few-shot speaker re-identification, able to run in real time over an embedded platform mounted on board of the robot. The proposed system has been experimentally evaluated over the VoxCeleb1 dataset, demonstrating a remarkable re-identification accuracy by varying the number of samples per speaker, the number of known speakers and the duration of the samples, and over the SpReW dataset, showing its robustness in real noisy environments. Finally, a quantitative evaluation of the processing time over the embedded platform proves that the processing pipeline is almost immediate, resulting in a pleasant user experience.
Collapse
|
3
|
Kathleen B, Víctor FC, Amandine M, Aurélie C, Elisabeth P, Michèle G, Rachid A, Hélène C. Addressing joint action challenges in HRI: Insights from psychology and philosophy. Acta Psychol (Amst) 2022; 222:103476. [PMID: 34974283 DOI: 10.1016/j.actpsy.2021.103476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 11/19/2021] [Accepted: 12/15/2021] [Indexed: 11/24/2022] Open
Abstract
The vast expansion of research in human-robot interactions (HRI) these last decades has been accompanied by the design of increasingly skilled robots for engaging in joint actions with humans. However, these advances have encountered significant challenges to ensure fluent interactions and sustain human motivation through the different steps of joint action. After exploring current literature on joint action in HRI, leading to a more precise definition of these challenges, the present article proposes some perspectives borrowed from psychology and philosophy showing the key role of communication in human interactions. From mutual recognition between individuals to the expression of commitment and social expectations, we argue that communicative cues can facilitate coordination, prediction, and motivation in the context of joint action. The description of several notions thus suggests that some communicative capacities can be implemented in the context of joint action for HRI, leading to an integrated perspective of robotic communication.
Collapse
Affiliation(s)
- Belhassein Kathleen
- CLLE, UMR5263, Toulouse University, CNRS, UT2J, France; LAAS-CNRS, UPR8001, Toulouse University, CNRS, France
| | | | | | | | | | | | - Alami Rachid
- LAAS-CNRS, UPR8001, Toulouse University, CNRS, France
| | - Cochet Hélène
- CLLE, UMR5263, Toulouse University, CNRS, UT2J, France
| |
Collapse
|
4
|
|
5
|
Design and Implementation of the Voice Command Recognition and the Sound Source Localization System for Human–Robot Interaction. ROBOTICA 2021. [DOI: 10.1017/s0263574720001496] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
SUMMARYHuman–robot interaction (HRI) is becoming more important nowadays. In this paper, a low-cost communication system for HRI is designed and implemented on the Scout robot and a robotic face. A hidden Markov model-based voice command detection system is proposed and a non-native database has been collected by Persian speakers, which contains 10 desired English commands. The experimental results confirm that the proposed system is capable to recognize the voice commands, and properly performs the task or expresses the right answer. Comparing the system with a trained system on the Julius native database shows a better true detection (about 10%).
Collapse
|
6
|
Hernandez JD, Sobti S, Sciola A, Moll M, Kavraki LE. Increasing Robot Autonomy via Motion Planning and an Augmented Reality Interface. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.2967280] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
7
|
Xing Z, Yu F, Du J, Walker JS, Paulson CB, Mani NS, Song L. Conversational Interfaces for Health: Bibliometric Analysis of Grants, Publications, and Patents. J Med Internet Res 2019; 21:e14672. [PMID: 31738171 PMCID: PMC6887814 DOI: 10.2196/14672] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 10/10/2019] [Accepted: 10/19/2019] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Conversational interfaces (CIs) in different modalities have been developed for health purposes, such as health behavioral intervention, patient self-management, and clinical decision support. Despite growing research evidence supporting CIs' potential, CI-related research is still in its infancy. There is a lack of systematic investigation that goes beyond publication review and presents the state of the art from perspectives of funding agencies, academia, and industry by incorporating CI-related public funding and patent activities. OBJECTIVE This study aimed to use data systematically extracted from multiple sources (ie, grant, publication, and patent databases) to investigate the development, research, and fund application of health-related CIs and associated stakeholders (ie, countries, organizations, and collaborators). METHODS A multifaceted search query was executed to retrieve records from 9 databases. Bibliometric analysis, social network analysis, and term co-occurrence analysis were conducted on the screened records. RESULTS This review included 42 funded projects, 428 research publications, and 162 patents. The total dollar amount of grants awarded was US $30,297,932, of which US $13,513,473 was awarded by US funding agencies and US $16,784,459 was funded by the Europe Commission. The top 3 funding agencies in the United States were the National Science Foundation, National Institutes of Health, and Agency for Healthcare Research and Quality. Boston Medical Center was awarded the largest combined grant size (US $2,246,437) for 4 projects. The authors of the publications were from 58 countries and 566 organizations; the top 3 most productive organizations were Northeastern University (United States), Universiti Teknologi MARA (Malaysia), and the French National Center for Scientific Research (CNRS; France). US researchers produced 114 publications. Although 82.0% (464/566) of the organizations engaged in interorganizational collaboration, 2 organizational research-collaboration clusters were observed with Northeastern University and CNRS as the central nodes. About 112 organizations from the United States and China filed 87.7% patents. IBM filed most patents (N=17). Only 5 patents were co-owned by different organizations, and there was no across-country collaboration on patenting activity. The terms patient, child, elderly, and robot were frequently discussed in the 3 record types. The terms related to mental and chronic issues were discussed mainly in grants and publications. The terms regarding multimodal interactions were widely mentioned as users' communication modes with CIs in the identified records. CONCLUSIONS Our findings provided an overview of the countries, organizations, and topic terms in funded projects, as well as the authorship, collaboration, content, and related information of research publications and patents. There is a lack of broad cross-sector partnerships among grant agencies, academia, and industry, particularly in the United States. Our results suggest a need to improve collaboration among public and private sectors and health care organizations in research and patent activities.
Collapse
Affiliation(s)
- Zhaopeng Xing
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Fei Yu
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Health Science Library, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jian Du
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Jennifer S Walker
- Health Science Library, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Claire B Paulson
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Nandita S Mani
- Health Science Library, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Lixin Song
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
8
|
Liu R, Zhang X. A review of methodologies for natural-language-facilitated human–robot cooperation. INT J ADV ROBOT SYST 2019. [DOI: 10.1177/1729881419851402] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Natural-language-facilitated human–robot cooperation refers to using natural language to facilitate interactive information sharing and task executions with a common goal constraint between robots and humans. Recently, natural-language-facilitated human–robot cooperation research has received increasing attention. Typical natural-language-facilitated human–robot cooperation scenarios include robotic daily assistance, robotic health caregiving, intelligent manufacturing, autonomous navigation, and robot social accompany. However, a thorough review, which can reveal latest methodologies of using natural language to facilitate human–robot cooperation, is missing. In this review, we comprehensively investigated natural-language-facilitated human–robot cooperation methodologies, by summarizing natural-language-facilitated human–robot cooperation research as three aspects (natural language instruction understanding, natural language-based execution plan generation, knowledge-world mapping). We also made in-depth analysis on theoretical methods, applications, and model advantages and disadvantages. Based on our paper review and perspective, future directions of natural-language-facilitated human–robot cooperation research were discussed.
Collapse
Affiliation(s)
- Rui Liu
- Robotics Institute (RI), Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaoli Zhang
- Intelligent Robotics and Systems Lab (IRSL), Colorado School of Mines, Golden, CO, USA
| |
Collapse
|
9
|
Analyzing the software architectures supporting HCI/HMI processes through a systematic review of the literature. TELEMATICS AND INFORMATICS 2019. [DOI: 10.1016/j.tele.2018.09.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Zhou D, Shi M, Chao F, Lin CM, Yang L, Shang C, Zhou C. Use of human gestures for controlling a mobile robot via adaptive CMAC network and fuzzy logic controller. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
11
|
Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton Robots 2018. [DOI: 10.1007/s10514-017-9645-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
12
|
Maurtua I, Fernández I, Tellaeche A, Kildal J, Susperregi L, Ibarguren A, Sierra B. Natural multimodal communication for human–robot collaboration. INT J ADV ROBOT SYST 2017. [DOI: 10.1177/1729881417716043] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This article presents a semantic approach for multimodal interaction between humans and industrial robots to enhance the dependability and naturalness of the collaboration between them in real industrial settings. The fusion of several interaction mechanisms is particularly relevant in industrial applications in which adverse environmental conditions might affect the performance of vision-based interaction (e.g. poor or changing lighting) or voice-based interaction (e.g. environmental noise). Our approach relies on the recognition of speech and gestures for the processing of requests, dealing with information that can potentially be contradictory or complementary. For disambiguation, it uses semantic technologies that describe the robot characteristics and capabilities as well as the context of the scenario. Although the proposed approach is generic and applicable in different scenarios, this article explains in detail how it has been implemented in two real industrial cases in which a robot and a worker collaborate in assembly and deburring operations.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Basilio Sierra
- RSAIT Group, University of the Basque Country, Lejona, Vizcaya, Spain
| |
Collapse
|
13
|
Li S, Zhang X, Webb JD. 3-D-Gaze-Based Robotic Grasping Through Mimicking Human Visuomotor Function for People With Motion Impairments. IEEE Trans Biomed Eng 2017; 64:2824-2835. [PMID: 28278455 DOI: 10.1109/tbme.2017.2677902] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE The goal of this paper is to achieve a novel 3-D-gaze-based human-robot-interaction modality, with which a user with motion impairment can intuitively express what tasks he/she wants the robot to do by directly looking at the object of interest in the real world. Toward this goal, we investigate 1) the technology to accurately sense where a person is looking in real environments and 2) the method to interpret the human gaze and convert it into an effective interaction modality. Looking at a specific object reflects what a person is thinking related to that object, and the gaze location contains essential information for object manipulation. METHODS A novel gaze vector method is developed to accurately estimate the 3-D coordinates of the object being looked at in real environments, and a novel interpretation framework that mimics human visuomotor functions is designed to increase the control capability of gaze in object grasping tasks. RESULTS High tracking accuracy was achieved using the gaze vector method. Participants successfully controlled a robotic arm for object grasping by directly looking at the target object. CONCLUSION Human 3-D gaze can be effectively employed as an intuitive interaction modality for robotic object manipulation. SIGNIFICANCE It is the first time that 3-D gaze is utilized in a real environment to command a robot for a practical application. Three-dimensional gaze tracking is promising as an intuitive alternative for human-robot interaction especially for disabled and elderly people who cannot handle the conventional interaction modalities.
Collapse
|
14
|
Jacob MG, Wachs JP. Optimal Modality Selection for Cooperative Human-Robot Task Completion. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:3388-3400. [PMID: 26731783 DOI: 10.1109/tcyb.2015.2506985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Human-robot cooperation in complex environments must be fast, accurate, and resilient. This requires efficient communication channels where robots need to assimilate information using a plethora of verbal and nonverbal modalities such as hand gestures, speech, and gaze. However, even though hybrid human-robot communication frameworks and multimodal communication have been studied, a systematic methodology for designing multimodal interfaces does not exist. This paper addresses the gap by proposing a novel methodology to generate multimodal lexicons which maximizes multiple performance metrics over a wide range of communication modalities (i.e., lexicons). The metrics are obtained through a mixture of simulation and real-world experiments. The methodology is tested in a surgical setting where a robot cooperates with a surgeon to complete a mock abdominal incision and closure task by delivering surgical instruments. Experimental results show that predicted optimal lexicons significantly outperform predicted suboptimal lexicons (p <; 0.05) in all metrics validating the predictability of the methodology. The methodology is validated in two scenarios (with and without modeling the risk of a human-robot collision) and the differences in the lexicons are analyzed.
Collapse
|
15
|
Hernandez-Belmonte UH, Ayala-Ramirez V. Real-Time Hand Posture Recognition for Human-Robot Interaction Tasks. SENSORS 2016; 16:s16010036. [PMID: 26742041 PMCID: PMC4732069 DOI: 10.3390/s16010036] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 12/09/2015] [Accepted: 12/18/2015] [Indexed: 12/03/2022]
Abstract
In this work, we present a multiclass hand posture classifier useful for human-robot interaction tasks. The proposed system is based exclusively on visual sensors, and it achieves a real-time performance, whilst detecting and recognizing an alphabet of four hand postures. The proposed approach is based on the real-time deformable detector, a boosting trained classifier. We describe a methodology to design the ensemble of real-time deformable detectors (one for each hand posture that can be classified). Given the lack of standard procedures for performance evaluation, we also propose the use of full image evaluation for this purpose. Such an evaluation methodology provides us with a more realistic estimation of the performance of the method. We have measured the performance of the proposed system and compared it to the one obtained by using only the sampled window approach. We present detailed results of such tests using a benchmark dataset. Our results show that the system can operate in real time at about a 10-fps frame rate.
Collapse
Affiliation(s)
| | - Victor Ayala-Ramirez
- Universidad de Guanajuato DICIS, Carr. Salamanca-Valle Km. 3.5 + 1.8, Palo Blanco, Salamanca, C.P. 36885, Mexico.
| |
Collapse
|
16
|
|
17
|
Wongphati M, Osawa H, Imai M. User-defined gestures for controlling primitive motions of an end effector. Adv Robot 2015. [DOI: 10.1080/01691864.2014.978371] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Pradel KC, Wu W, Ding Y, Wang ZL. Solution-derived ZnO homojunction nanowire films on wearable substrates for energy conversion and self-powered gesture recognition. NANO LETTERS 2014; 14:6897-6905. [PMID: 25423258 DOI: 10.1021/nl5029182] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Emerging applications in wearable technology, pervasive computing, human-machine interfacing, and implantable biomedical devices demand an appropriate power source that can sustainably operate for extended periods of time with minimal intervention (Wang, Z. L.; et al. Angew. Chem., Int. Ed. 2012, 51, 11700). Self-powered nanosystems, which harvest operating energy from its host (i.e., the human body), may be feasible due to their extremely low power consumption (Tian, B. Z.; et al. Nature 2007, 449, 885. Javey, A.; et al. Nature 2003, 424, 654. Cui, Y.; et al. Science 2001, 291, 851). Here we report materials and designs for wearable-on-skin piezoelectric devices based on ultrathin (2 μm) solution-derived ZnO p-n homojunction films for the first time. The depletion region formed at the p-n homojunction effectively reduces internal screening of strain-induced polarization charges by free carriers in both n-ZnO and Sb-doped p-ZnO, resulting in significantly enhanced piezoelectric output compared to a single layer device. The p-n structure can be further grown on polymeric substrates conformable to a human wrist and used to convert movement of the flexor tendons into distinguishable electrical signals for gesture recognition. The ZnO homojunction piezoelectric devices may have applications in powering nanodevices, bioprobes, and self-powered human-machine interfacing.
Collapse
Affiliation(s)
- Ken C Pradel
- School of Materials Science and Engineering, Georgia Institute of Technology , Atlanta, Georgia 30332-0245, United States
| | | | | | | |
Collapse
|