1
|
Zhu J. Visual contextual perception and user emotional feedback in visual communication design. BMC Psychol 2025; 13:313. [PMID: 40156061 PMCID: PMC11951664 DOI: 10.1186/s40359-025-02615-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/13/2025] [Indexed: 04/01/2025] Open
Abstract
BACKGROUND With the advent of the information era, the significance of visual communication design has escalated within the realm of increasingly prevalent network applications. Addressing the deficiency observed in prevailing sentiment analysis approaches in visual communication design, which predominantly leverage the holistic image information while overlooking the nuances inherent in the localized regions that accentuate emotion, coupled with the inadequacy in semantically mining diverse channel features. METHODS This paper introduces a dual-attention multilayer feature fusion-based methodology denoted as DA-MLCNN. Initially, a multilayer convolutional neural network (CNN) feature extraction architecture is devised to effectuate the amalgamation of both overall and localized features, thereby extracting both high-level and low-level features inherent in the image. Furthermore, the integration of a spatial attention mechanism fortifies the low-level features, while a channel attention mechanism bolsters the high-level features. Ultimately, the features augmented by the attention mechanisms are harmonized to yield semantically enriched discerning visual features for training sentiment classifiers. RESULTS This culminates in attaining classification accuracies of 79.8% and 55.8% on the Twitter 2017 and Emotion ROI datasets, respectively. Furthermore, the method attains classification accuracies of 89%, 94%, and 91% for the three categories of sadness, surprise, and joy on the Emotion ROI dataset. CONCLUSIONS The efficacy demonstrated on dichotomous and multicategorical emotion image datasets underscores the capacity of the proposed approach to acquire more discriminative visual features, thereby enhancing the landscape of visual sentiment analysis. The elevated performance of the visual sentiment analysis method serves to catalyze innovative advancements in visual communication design, offering designers expanded prospects and possibilities.
Collapse
Affiliation(s)
- Jiayi Zhu
- Academy of Arts, Qujing Normal University, Qujing, Yunnan, 655011, China.
| |
Collapse
|
2
|
Chaudhury R, Chilana PK. Designing Visual and Interactive Self-Monitoring Interventions to Facilitate Learning: Insights From Informal Learners and Experts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1542-1556. [PMID: 38373125 DOI: 10.1109/tvcg.2024.3366469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Informal learners of computational skills often find it difficult to self-direct their learning pursuits, which may be spread across different mediums and study sessions. Inspired by self-monitoring interventions from domains such as health and productivity, we investigate key requirements for helping informal learners better self-reflect on their learning experiences. We carried out two elicitation studies with article-based and interactive probes to explore a range of manual, automatic, and semi-automatic design approaches for capturing and presenting a learner's data. We found that although automatically generated visual overviews of learning histories are initially promising for increasing awareness, learners prefer having controls to manipulate overviews through personally relevant filtering options to better reflect on their past, plan for future sessions, and communicate with others for feedback. To validate our findings and expand our understanding of designing self-monitoring tools for use in real settings, we gathered further insights from experts, who shed light on factors to consider in terms of data collection techniques, designing for reflections, and carrying out field studies. Our findings have several implications for designing learner-centered self-monitoring interventions that can be both useful and engaging for informal learners.
Collapse
|
3
|
Liu Q, Jiang X, Jiang R. Classroom Behavior Recognition Using Computer Vision: A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2025; 25:373. [PMID: 39860742 PMCID: PMC11769068 DOI: 10.3390/s25020373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 01/04/2025] [Accepted: 01/06/2025] [Indexed: 01/27/2025]
Abstract
Behavioral computing based on visual cues has become increasingly important, as it can capture and annotate teachers' and students' classroom states on a large scale and in real time. However, there is a lack of consensus on the research status and future trends of computer vision-based classroom behavior recognition. The present study conducted a systematic literature review of 80 peer-reviewed journal articles following the Preferred Reporting Items for Systematic Assessment and Meta-Analysis (PRISMA) guidelines. Three research questions were addressed concerning goal orientation, recognition techniques, and research challenges. Results showed that: (1) computer vision-supported classroom behavior recognition focused on four categories: physical action, learning engagement, attention, and emotion. Physical actions and learning engagement have been the primary recognition targets; (2) behavioral categorizations have been defined in various ways and lack connections to instructional content and events; (3) existing studies have focused on college students, especially in a natural classical classroom; (4) deep learning was the main recognition method, and the YOLO series was applicable for multiple behavioral purposes; (5) moreover, we identified challenges in experimental design, recognition methods, practical applications, and pedagogical research in computer vision. This review will not only inform the recognition and application of computer vision to classroom behavior but also provide insights for future research.
Collapse
Affiliation(s)
- Qingtang Liu
- Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China; (X.J.); (R.J.)
- Hubei Research Center for Educational Informatization, Central China Normal University, Wuhan 430079, China
| | - Xinyu Jiang
- Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China; (X.J.); (R.J.)
- Hubei Research Center for Educational Informatization, Central China Normal University, Wuhan 430079, China
| | - Ruyi Jiang
- Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China; (X.J.); (R.J.)
- Hubei Research Center for Educational Informatization, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
4
|
Cortinas-Lorenzo K, Lacey G. Toward Explainable Affective Computing: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13101-13121. [PMID: 37220061 DOI: 10.1109/tnnls.2023.3270027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Affective computing has an unprecedented potential to change the way humans interact with technology. While the last decades have witnessed vast progress in the field, multimodal affective computing systems are generally black box by design. As affective systems start to be deployed in real-world scenarios, such as education or healthcare, a shift of focus toward improved transparency and interpretability is needed. In this context, how do we explain the output of affective computing models? and how to do so without limiting predictive performance? In this article, we review affective computing work from an explainable AI (XAI) perspective, collecting and synthesizing relevant papers into three major XAI approaches: premodel (applied before training), in-model (applied during training), and postmodel (applied after training). We present and discuss the most fundamental challenges in the field, namely, how to relate explanations back to multimodal and time-dependent data, how to integrate context and inductive biases into explanations using mechanisms such as attention, generative modeling, or graph-based methods, and how to capture intramodal and cross-modal interactions in post hoc explanations. While explainable affective computing is still nascent, existing methods are promising, contributing not only toward improved transparency but, in many cases, surpassing state-of-the-art results. Based on these findings, we explore directions for future research and discuss the importance of data-driven XAI and explanation goals, and explainee needs definition, as well as causability or the extent to which a given method leads to human understanding.
Collapse
|
5
|
Wu Y, Xu Y, Gao S, Wang X, Song W, Nie Z, Fan X, Li Q. LiveRetro: Visual Analytics for Strategic Retrospect in Livestream E-Commerce. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:1117-1127. [PMID: 37874716 DOI: 10.1109/tvcg.2023.3326911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
Livestream e-commerce integrates live streaming and online shopping, allowing viewers to make purchases while watching. However, effective marketing strategies remain a challenge due to limited empirical research and subjective biases from the absence of quantitative data. Current tools fail to capture the interdependence between live performances and feedback. This study identified computational features, formulated design requirements, and developed LiveRetro, an interactive visual analytics system. It enables comprehensive retrospective analysis of livestream e-commerce for streamers, viewers, and merchandise. LiveRetro employs enhanced visualization and time-series forecasting models to align performance features and feedback, identifying influences at channel, merchandise, feature, and segment levels. Through case studies and expert interviews, the system provides deep insights into the relationship between live performance and streaming statistics, enabling efficient strategic analysis from multiple perspectives.
Collapse
|
6
|
Huang Z, He Q, Maher K, Deng X, Lai YK, Ma C, Qin SF, Liu YJ, Wang H. SpeechMirror: A Multimodal Visual Analytics System for Personalized Reflection of Online Public Speaking Effectiveness. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:606-616. [PMID: 37871082 DOI: 10.1109/tvcg.2023.3326932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement. This paper introduces SpeechMirror, a visual analytics system facilitating reflection on a speech based on insights from a collection of online speeches. The system estimates the impact of different speech techniques on effectiveness and applies them to a speech to give users awareness of the performance of speech techniques. A similarity recommendation approach based on speech factors or script content supports guided exploration to expand knowledge of presentation evidence and accelerate the discovery of speech delivery possibilities. SpeechMirror provides intuitive visualizations and interactions for users to understand speech factors. Among them, SpeechTwin, a novel multimodal visual summary of speech, supports rapid understanding of critical speech factors and comparison of different speech samples, and SpeechPlayer augments the speech video by integrating visualization of the speaker's body language with interaction, for focused analysis. The system utilizes visualizations suited to the distinct nature of different speech factors for user comprehension. The proposed system and visualization techniques were evaluated with domain experts and amateurs, demonstrating usability for users with low visualization literacy and its efficacy in assisting users to develop insights for potential improvement.
Collapse
|
7
|
Villegas-Ch. W, García-Ortiz J, Urbina-Camacho I, Mera-Navarrete A. Proposal for a System for the Identification of the Concentration of Students Who Attend Online Educational Models. COMPUTERS 2023. [DOI: 10.3390/computers12040074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
Currently, e-learning has revolutionized the way students learn by offering access to quality education in a model that does not depend on a specific space and time. However, due to the e-learning method where no tutor can directly control the group of students, they can be distracted for various reasons, which greatly affects their learning capacity. Several scientific works try to improve the quality of online education, but a holistic approach is necessary to address this problem. Identifying students’ attention spans is important in understanding how students process and retain information. Attention is a critical cognitive process that affects a student’s ability to learn. Therefore, it is important to use a variety of techniques and tools to assess student attention, such as standardized tests, behavioral observation, and assessment of academic achievement. This work proposes a system that uses devices such as cameras to monitor the attention level of students in real time during online classes. The results are used with feedback as a heuristic value to analyze the performance of the students, as well as the teaching standards of the teachers.
Collapse
Affiliation(s)
- William Villegas-Ch.
- Escuela de Ingeniería en Tecnologías de la Información, FICA, Universidad de Las Américas, Quito 170125, Ecuador
| | - Joselin García-Ortiz
- Escuela de Ingeniería en Tecnologías de la Información, FICA, Universidad de Las Américas, Quito 170125, Ecuador
| | - Isabel Urbina-Camacho
- Facultad de Filosofía, Letras y Ciencias de la Educación, Universidad Central del Ecuador, Quito 170129, Ecuador
| | | |
Collapse
|
8
|
Afzal S, Ghani S, Hittawe MM, Rashid SF, Knio OM, Hadwiger M, Hoteit I. Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3576935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey paper, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization papers included in our survey based on different taxonomies used in visualization and visual analytics research. We review these papers in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.
Collapse
Affiliation(s)
- Shehzad Afzal
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Sohaib Ghani
- King Abdullah University of Science & Technology, Saudi Arabia
| | | | | | - Omar M Knio
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Markus Hadwiger
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Ibrahim Hoteit
- King Abdullah University of Science & Technology, Saudi Arabia
| |
Collapse
|
9
|
Ye Z, Chen M. Visualizing Ensemble Predictions of Music Mood. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:864-874. [PMID: 36170399 DOI: 10.1109/tvcg.2022.3209379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Music mood classification has been a challenging problem in comparison with other music classification problems (e.g., genre, composer, or period). One solution for addressing this challenge is to use an ensemble of machine learning models. In this paper, we show that visualization techniques can effectively convey the popular prediction as well as uncertainty at different music sections along the temporal axis while enabling the analysis of individual ML models in conjunction with their application to different musical data. In addition to the traditional visual designs, such as stacked line graph, ThemeRiver, and pixel-based visualization, we introduce a new variant of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe and measure the most popular prediction more easily than stacked line graph and ThemeRiver. Together with pixel-based visualization, dual-flux ThemeRiver plots can also assist in model-development workflows, in addition to annotating music using ensemble model predictions.
Collapse
|
10
|
Wu A, Deng D, Cheng F, Wu Y, Liu S, Qu H. In Defence of Visual Analytics Systems: Replies to Critics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1026-1036. [PMID: 36179000 DOI: 10.1109/tvcg.2022.3209360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The last decade has witnessed many visual analytics (VA) systems that make successful applications to wide-ranging domains like urban analytics and explainable AI. However, their research rigor and contributions have been extensively challenged within the visualization community. We come in defence of VA systems by contributing two interview studies for gathering critics and responses to those criticisms. First, we interview 24 researchers to collect criticisms the review comments on their VA work. Through an iterative coding and refinement process, the interview feedback is summarized into a list of 36 common criticisms. Second, we interview 17 researchers to validate our list and collect their responses, thereby discussing implications for defending and improving the scientific values and rigor of VA systems. We highlight that the presented knowledge is deep, extensive, but also imperfect, provocative, and controversial, and thus recommend reading with an inclusive and critical eye. We hope our work can provide thoughts and foundations for conducting VA research and spark discussions to promote the research field forward more rigorously and vibrantly.
Collapse
|
11
|
Research on Classroom Emotion Recognition Algorithm Based on Visual Emotion Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:6453499. [PMID: 35978909 PMCID: PMC9377850 DOI: 10.1155/2022/6453499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/23/2022]
Abstract
In this paper, we construct a classroom emotion recognition algorithm by classifying visual emotions for improving the quality of classroom teaching. We assign weights to the training images through an attention mechanism network and then add a designed loss function so that it can focus on the feature parts of face images that are not obscured and can characterize the target emotion, thus improving the accuracy of facial emotion recognition under obscuration. Analyze the salient expression features of classroom students and establish a classification criteria and criteria library. The videos of classroom students' facial expressions are collected, a multi-task convolutional neural network (MTCNN) is used for face detection and image segmentation, and the ones with better feature morphology are selected to build a standard database. A visual motion analysis method with the fusion of overall and local features of the image is proposed. To validate the effectiveness of the designed MTCNN model, two mainstream classification networks, VGG16 and ResNet18, were tested and compared with MTCNN by training on RAF-DB, masked dataset, and the classroom dataset constructed in this paper, and the final accuracy after training was 78.26% and 75.03% for ResNet18 and VGG16, respectively. The results show that the MTCNN proposed in this paper has a better recognition effect. The test results of the loss function also show that it can effectively improve the recognition accuracy, and the MTCNN model has an accuracy of 93.53% for recognizing students' facial emotions. Finally, the dataset is extended with the training method of expression features, and the experimental study shows that the method performs well and can carry out recognition effectively.
Collapse
|
12
|
Sharma A, Sharma K, Kumar A. Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06913-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Hierarchical scale convolutional neural network for facial expression recognition. Cogn Neurodyn 2022; 16:847-858. [PMID: 35847532 PMCID: PMC9279531 DOI: 10.1007/s11571-021-09761-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 09/07/2021] [Accepted: 11/22/2021] [Indexed: 11/03/2022] Open
Abstract
Recognition of facial expressions plays an important role in understanding human behavior, classroom assessment, customer feedback, education, business, and many other human-machine interaction applications. Some researchers have realized that using features corresponding to different scales can improve the recognition accuracy, but there is a lack of a systematic study to utilize the scale information. In this work, we proposed a hierarchical scale convolutional neural network (HSNet) for facial expression recognition, which can systematically enhance the information extracted from the kernel, network, and knowledge scale. First, inspired by that the facial expression can be defined by different size facial action units and the power of sparsity, we proposed dilation Inception blocks to enhance kernel scale information extraction. Second, to supervise relatively shallow layers for learning more discriminated features from different size feature maps, we proposed a feature guided auxiliary learning approach to utilize high-level semantic features to guide the shallow layers learning. Last, since human cognitive ability can progressively be improved by learned knowledge, we mimicked such ability by knowledge transfer learning from related tasks. Extensive experiments on lab-controlled, synthesized, and in-the-wild databases showed that the proposed method substantially boosts performance, and achieved state-of-the-art accuracy on most databases. Ablation studies proved the effectiveness of modules in the proposed method.
Collapse
|
14
|
Savchenko AV, Makarov IA. Neural Network Model for Video-Based Analysis of Student’s Emotions in E-Learning. OPTICAL MEMORY AND NEURAL NETWORKS 2022; 31. [PMCID: PMC9529160 DOI: 10.3103/s1060992x22030055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this paper, we consider a problem of an automatic analysis of the emotional state of students during online classes based on video surveillance data. This problem is actual in the field of e-learning. We propose a novel neural network model for recognition of students’ emotions based on video images of their faces and use it to construct an algorithm for classifying the individual and group emotions of students by video clips. At the first step, it performs detection of the faces and extracts their features followed by grouping the face of each student. To increase the accuracy, we propose to match students’ names selected with the aid of the algorithms of the text recognition. At the second step, specially learned efficient neural networks perform the extraction of emotional features of each selected person, their aggregation with the aid of statistical functions, and the subsequent classification. At the final step, it is possible to visualize fragments of the video lesson with the most pronounced emotions of the student. Our experiments with some datasets from EmotiW (Emotion Recognition in the Wild) show that the accuracy of the developed algorithms is comparable with their known analogous. However, when classifying emotions, the computational performance of these algorithms is higher.
Collapse
Affiliation(s)
- A. V. Savchenko
- grid.410682.90000 0004 0578 2005Laboratory of Algorithms and Technologies for Network Analysis, Higher School of Economics (HSE) University, 603093 Nizhny Novgorod, Russia
| | - I. A. Makarov
- Artificial Intelligence Research Institute (AIRI), 117246 Moscow, Russia
| |
Collapse
|
15
|
Chen Z, Ye S, Chu X, Xia H, Zhang H, Qu H, Wu Y. Augmenting Sports Videos with VisCommentator. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:824-834. [PMID: 34587045 DOI: 10.1109/tvcg.2021.3114806] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characterizes augmented sports videos at an element-level (what the constituents are) and clip-level (how those constituents are organized). We do so by systematically reviewing 233 examples of augmented sports videos collected from TV channels, teams, and leagues. The design space guides selection of data insights and visualizations for various purposes. Informed by the design space and close collaboration with domain experts, we design VisCommentator, a fast prototyping tool, to eases the creation of augmented table tennis videos by leveraging machine learning-based data extractors and design space-based visualization recommendations. With VisCommentator, sports analysts can create an augmented video by selecting the data to visualize instead of manually drawing the graphical marks. Our system can be generalized to other racket sports (e.g., tennis, badminton) once the underlying datasets and models are available. A user study with seven domain experts shows high satisfaction with our system, confirms that the participants can reproduce augmented sports videos in a short period, and provides insightful implications into future improvements and opportunities.
Collapse
|
16
|
Wang X, He J, Jin Z, Yang M, Wang Y, Qu H. M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:802-812. [PMID: 34587037 DOI: 10.1109/tvcg.2021.3114794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2 Lens, to visualize and explain multimodal models for sentiment analysis. M2 Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2 Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.
Collapse
|
17
|
Sun G, Li T, Liang R. SurVizor: visualizing and understanding the key content of surveillance videos. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00803-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
18
|
Assessment of Cognitive Student Engagement Using Heart Rate Data in Distance Learning during COVID-19. EDUCATION SCIENCES 2021. [DOI: 10.3390/educsci11090540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Student engagement allows educational institutions to make better decisions regarding teaching methodologies, methods for evaluating the quality of education, and ways to provide timely feedback. Due to the COVID-19 pandemic, identifying cognitive student engagement in distance learning has been a challenge in higher education institutions. In this study, we implemented a non-self-report method assessing students’ heart rate data to identify the cognitive engagement during active learning activities. Additionally, as a supplementary tool, we applied a previously validated self-report method. This study was performed in distance learning lessons on a group of university students in Bogota, Colombia. After data analysis, we validated five hypotheses and compared the results from both methods. The results confirmed that the heart rate assessment had a statistically significant difference with respect to the baseline during active learning activities, and this variance could be positive or negative. In addition, the results show that if students are previously advised that they will have to develop an a new task after a passive learning activity (such as a video projection), their heart rate will tend to increase and consequently, their cognitive engagement will also increase. We expect this study to provide input for future research assessing student cognitive engagement using physiological parameters as a tool.
Collapse
|
19
|
Predicting individual emotion from perception-based non-contact sensor big data. Sci Rep 2021; 11:2317. [PMID: 33504868 PMCID: PMC7840765 DOI: 10.1038/s41598-021-81958-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 01/13/2021] [Indexed: 11/08/2022] Open
Abstract
This study proposes a system for estimating individual emotions based on collected indoor environment data for human participants. At the first step, we develop wireless sensor nodes, which collect indoor environment data regarding human perception, for monitoring working environments. The developed system collects indoor environment data obtained from the developed sensor nodes and the emotions data obtained from pulse and skin temperatures as big data. Then, the proposed system estimates individual emotions from collected indoor environment data. This study also investigates whether sensory data are effective for estimating individual emotions. Indoor environmental data obtained by developed sensors and emotions data obtained from vital data were logged over a period of 60 days. Emotions were estimated from indoor environmental data by machine learning method. The experimental results show that the proposed system achieves about 80% or more estimation correspondence by using multiple types of sensors, thereby demonstrating the effectiveness of the proposed system. Our obtained result that emotions can be determined with high accuracy from environmental data is a useful finding for future research approaches.
Collapse
|