1
|
Plabst L, Niebling F, Oberdorfer S, Ortega F. Order Up! Multimodal Interaction Techniques for Notifications in Augmented Reality. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2258-2267. [PMID: 40053630 DOI: 10.1109/tvcg.2025.3549186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
As augmented reality (AR) headsets become increasingly integrated into professional and social settings, a critical challenge emerges: how can users effectively manage and interact with the frequent notifications they receive? With adults receiving nearly 200 notifications daily on their smartphones, which serve as primary computing devices for many, translating this interaction to AR systems is paramount. Unlike traditional devices, AR systems augment the physical world, requiring interaction techniques that blend seamlessly with real-world behaviors. This study explores the complexities of multimodal interaction with notifications in AR. We investigated user preferences, usability, workload, and performance during a virtual cooking task, where participants managed customer orders while interacting with notifications. Various interaction techniques were tested: Point and Pinch, Gaze and Pinch, Point and Voice, Gaze and Voice, and Touch. Our findings reveal significant impacts on workload, performance, and usability based on the interaction method used. We identify key issues in multimodal interaction and offer guidance for optimizing these techniques in AR environments.
Collapse
|
2
|
Leon GM, Bezerianos A, Gladin O, Isenberg P. Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:941-951. [PMID: 39250400 DOI: 10.1109/tvcg.2024.3456335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
We present the results of an exploratory study on how pairs interact with speech commands and touch gestures on a wall-sized display during a collaborative sensemaking task. Previous work has shown that speech commands, alone or in combination with other input modalities, can support visual data exploration by individuals. However, it is still unknown whether and how speech commands can be used in collaboration, and for what tasks. To answer these questions, we developed a functioning prototype that we used as a technology probe. We conducted an in-depth exploratory study with 10 participant pairs to analyze their interaction choices, the interplay between the input modalities, and their collaboration. While touch was the most used modality, we found that participants preferred speech commands for global operations, used them for distant interaction, and that speech interaction contributed to the awareness of the partner's actions. Furthermore, the likelihood of using speech commands during collaboration was related to the personality trait of agreeableness. Regarding collaboration styles, participants interacted with speech equally often whether they were in loosely or closely coupled collaboration. While the partners stood closer to each other during close collaboration, they did not distance themselves to use speech commands. From our findings, we derive and contribute a set of design considerations for collaborative and multimodal interactive data analysis systems. All supplemental materials are available at https://osf.io/8gpv2.
Collapse
|
3
|
Filipov V, Arleo A, Miksch S. Are We There Yet? A Roadmap of Network Visualization from Surveys to Task Taxonomies. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2023; 42:e14794. [PMID: 38505648 PMCID: PMC10947241 DOI: 10.1111/cgf.14794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Networks are abstract and ubiquitous data structures, defined as a set of data points and relationships between them. Network visualization provides meaningful representations of these data, supporting researchers in understanding the connections, gathering insights, and detecting and identifying unexpected patterns. Research in this field is focusing on increasingly challenging problems, such as visualizing dynamic, complex, multivariate, and geospatial networked data. This ever-growing, and widely varied, body of research led to several surveys being published, each covering one or more disciplines of network visualization. Despite this effort, the variety and complexity of this research represents an obstacle when surveying the domain and building a comprehensive overview of the literature. Furthermore, there exists a lack of clarification and uniformity between the terminology used in each of the surveys, which requires further effort when mapping and categorizing the plethora of different visualization techniques and approaches. In this paper, we aim at providing researchers and practitioners alike with a "roadmap" detailing the current research trends in the field of network visualization. We design our contribution as a meta-survey where we discuss, summarize, and categorize recent surveys and task taxonomies published in the context of network visualization. We identify more and less saturated disciplines of research and consolidate the terminology used in the surveyed literature. We also survey the available task taxonomies, providing a comprehensive analysis of their varying support to each network visualization discipline and by establishing and discussing a classification for the individual tasks. With this combined analysis of surveys and task taxonomies, we provide an overarching structure of the field, from which we extrapolate the current state of research and promising directions for future work.
Collapse
|
4
|
Chojecki P, Strazdas D, Przewozny D, Gard N, Runde D, Hoerner N, Al-Hamadi A, Eisert P, Bosse S. Assessing the Value of Multimodal Interfaces: A Study on Human-Machine Interaction in Weld Inspection Workstations. SENSORS (BASEL, SWITZERLAND) 2023; 23:5043. [PMID: 37299770 PMCID: PMC10255088 DOI: 10.3390/s23115043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/09/2023] [Accepted: 05/15/2023] [Indexed: 06/12/2023]
Abstract
Multimodal user interfaces promise natural and intuitive human-machine interactions. However, is the extra effort for the development of a complex multisensor system justified, or can users also be satisfied with only one input modality? This study investigates interactions in an industrial weld inspection workstation. Three unimodal interfaces, including spatial interaction with buttons augmented on a workpiece or a worktable, and speech commands, were tested individually and in a multimodal combination. Within the unimodal conditions, users preferred the augmented worktable, but overall, the interindividual usage of all input technologies in the multimodal condition was ranked best. Our findings indicate that the implementation and the use of multiple input modalities is valuable and that it is difficult to predict the usability of individual input modalities for complex systems.
Collapse
Affiliation(s)
- Paul Chojecki
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Dominykas Strazdas
- Neuro-Information Technology, Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany; (D.S.); (A.A.-H.)
| | - David Przewozny
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Niklas Gard
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Detlef Runde
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Niklas Hoerner
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Ayoub Al-Hamadi
- Neuro-Information Technology, Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany; (D.S.); (A.A.-H.)
| | - Peter Eisert
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| | - Sebastian Bosse
- Fraunhofer HHI, 10587 Berlin, Germany; (D.P.); (N.G.); (D.R.); (N.H.); (P.E.); (S.B.)
| |
Collapse
|
5
|
Wang Y, Hou Z, Shen L, Wu T, Wang J, Huang H, Zhang H, Zhang D. Towards Natural Language-Based Visualization Authoring. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1222-1232. [PMID: 36197854 DOI: 10.1109/tvcg.2022.3209357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
A key challenge to visualization authoring is the process of getting familiar with the complex user interfaces of authoring tools. Natural Language Interface (NLI) presents promising benefits due to its learnability and usability. However, supporting NLIs for authoring tools requires expertise in natural language processing, while existing NLIs are mostly designed for visual analytic workflow. In this paper, we propose an authoring-oriented NLI pipeline by introducing a structured representation of users' visualization editing intents, called editing actions, based on a formative study and an extensive survey on visualization construction tools. The editing actions are executable, and thus decouple natural language interpretation and visualization applications as an intermediate layer. We implement a deep learning-based NL interpreter to translate NL utterances into editing actions. The interpreter is reusable and extensible across authoring tools. The authoring tools only need to map the editing actions into tool-specific operations. To illustrate the usages of the NL interpreter, we implement an Excel chart editor and a proof-of-concept authoring tool, VisTalk. We conduct a user study with VisTalk to understand the usage patterns of NL-based authoring systems. Finally, we discuss observations on how users author charts with natural language, as well as implications for future research.
Collapse
|
6
|
Ma C, Liu Q, Dang Y. Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement. Front Psychol 2021; 12:769509. [PMID: 34819900 PMCID: PMC8606411 DOI: 10.3389/fpsyg.2021.769509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.
Collapse
Affiliation(s)
- Chengming Ma
- College of Communication, Northwest Normal University, Lanzhou, China
| | - Qian Liu
- College of Communication, Northwest Normal University, Lanzhou, China
| | - Yaqi Dang
- College of Communication, Northwest Normal University, Lanzhou, China
| |
Collapse
|
7
|
Analyzing the Synergy between HCI and TRIZ in Product Innovation through a Systematic Review of the Literature. ADVANCES IN HUMAN-COMPUTER INTERACTION 2021. [DOI: 10.1155/2021/6616962] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The boundary between tangible and digital products is getting more fused while rapidly evolving systems for interaction require novel processes that allow for rapidly developed designs, evaluations, and interaction strategies to facilitate efficient and unique user interactions with computer systems. Accordingly, the literature suggests combining creativity enhancement tools or methods with human-computer interaction (HCI) design. The TRIZ base of knowledge appears to be one of the viable options, as shown in the fragmental indications reported in well-acknowledged design textbooks. The goal of this paper is to present a systematic review of the literature to identify and analyze the published approaches and recommendations to support the synergy between HCI and TRIZ from the perspective of product innovation related to HCI, with the aim of providing a first comprehensive classification and discussing about observable differences and gaps. The method followed is the guidelines related to systematic literature review methods. As results, out of 444 initial results, only 17 studies reported the outcomes of the synergy between HCI and TRIZ. The 7 of these studies explored the feasibility of the combination of HCI and TRIZ. The 10 studies attempted to combine and derive approaches in these two fields, and the outcomes defined 3 different integration strategies between HCI and TRIZ. Some conclusions achieved are that the generic solutions to support the synergy between HCI and TRIZ are still rare in the literature. The extraction and combination of different tools caused the randomization of the evaluation criteria, and the performance of the proposals has not been comprehensively evaluated. However, the findings can help inform future developments and provide valuable information about the benefits and drawbacks of different approaches.
Collapse
|
8
|
Srinivasan A, Stasko J, Keefe DF, Tory M. How to Ask What to Say?: Strategies for Evaluating Natural Language Interfaces for Data Visualization. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2020; 40:96-103. [PMID: 32544054 DOI: 10.1109/mcg.2020.2986902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we discuss challenges and strategies for evaluating natural language interfaces (NLIs) for data visualization. Through an examination of prior studies and reflecting on own experiences in evaluating visualization NLIs, we highlight benefits and considerations of three task framing strategies: Jeopardy-style facts, open-ended tasks, and target replication tasks. We hope the discussions in this article can guide future researchers working on visualization NLIs and help them avoid common challenges and pitfalls when evaluating these systems. Finally, to motivate future research, we highlight topics that call for further investigation including development of new evaluation metrics, and considering the type of natural language input (spoken versus typed), among others.
Collapse
|