1
|
Wyss A, Morgenshtern G, Hirsch-Husler A, Bernard J. DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing - A Design Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:54-64. [PMID: 39312428 DOI: 10.1109/tvcg.2024.3456329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In medical diagnostics of both early disease detection and routine patient care, particle-based contamination of in-vitro diagnostics consumables poses a significant threat to patients. Objective data-driven decision-making on the severity of contamination is key for reducing patient risk, while saving time and cost in quality assessment. Our collaborators introduced us to their quality control process, including particle data acquisition through image recognition, feature extraction, and attributes reflecting the production context of particles. Shortcomings in the current process are limitations in exploring thousands of images, data-driven decision making, and ineffective knowledge externalization. Following the design study methodology, our contributions are a characterization of the problem space and requirements, the development and validation of DaedalusData, a comprehensive discussion of our study's learnings, and a generalizable framework for knowledge externalization. DaedalusData is a visual analytics system that enables domain experts to explore particle contamination patterns, label particles in label alphabets, and externalize knowledge through semi-supervised label-informed data projections. The results of our case study and user study show high usability of DaedalusData and its efficient support of experts in generating comprehensive overviews of thousands of particles, labeling of large quantities of particles, and externalizing knowledge to augment the dataset further. Reflecting on our approach, we discuss insights on dataset augmentation via human knowledge externalization, and on the scalability and trade-offs that come with the adoption of this approach in practice.
Collapse
|
2
|
Shi J, Chen X, Xie Y, Zhang H, Sun Y. Delicately Reinforced k-Nearest Neighbor Classifier Combined With Expert Knowledge Applied to Abnormity Forecast in Electrolytic Cell. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3027-3037. [PMID: 37494170 DOI: 10.1109/tnnls.2023.3280963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
As the profit and safety requirements become higher and higher, it is more and more necessary to realize an advanced intelligent analysis for abnormity forecast of the synthetical balance of material and energy (AF-SBME) on aluminum reduction cells (ARCs). Without loss of generality, AF-SBME belongs to classification problems. Its advanced intelligent analysis can be realized by high-performance data-driven classifiers. However, AF-SBME has some difficulties, including a high requirement for interpretability of data-driven classifiers, a small number, and decreasing-over-time correctness of training samples. In this article, based on a preferable data-driven classifier, which is called a reinforced k -nearest neighbor (R-KNN) classifier, a delicately R-KNN combined with expert knowledge (DR-KNN/CE) is proposed. It improves R-KNN in two ways, including using expert knowledge as external assistance and enhancing self-ability to mine and synthesize data knowledge. The related experiments on AF-SBME, where the relevant data are directly sampled from practical production, have demonstrated that the proposed DR-KNN/CE not only makes an effective improvement for R-KNN, but also has a more advanced performance compared with other existing high-performance data-driven classifiers.
Collapse
|
3
|
Zhou Z, Wang W, Guo M, Wang Y, Gotz D. A Design Space for Surfacing Content Recommendations in Visual Analytic Platforms. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:84-94. [PMID: 36194706 DOI: 10.1109/tvcg.2022.3209445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Recommendation algorithms have been leveraged in various ways within visualization systems to assist users as they perform of a range of information tasks. One common focus for these techniques has been the recommendation of content, rather than visual form, as a means to assist users in the identification of information that is relevant to their task context. A wide variety of techniques have been proposed to address this general problem, with a range of design choices in how these solutions surface relevant information to users. This paper reviews the state-of-the-art in how visualization systems surface recommended content to users during users' visual analysis; introduces a four-dimensional design space for visual content recommendation based on a characterization of prior work; and discusses key observations regarding common patterns and future research opportunities.
Collapse
|
4
|
Hoque MN, He W, Shekar AK, Gou L, Ren L. Visual Concept Programming: A Visual Analytics Approach to Injecting Human Intelligence at Scale. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:74-83. [PMID: 36166533 DOI: 10.1109/tvcg.2022.3209466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Data-centric AI has emerged as a new research area to systematically engineer the data to land AI models for real-world applications. As a core method for data-centric AI, data programming helps experts inject domain knowledge into data and label data at scale using carefully designed labeling functions (e.g., heuristic rules, logistics). Though data programming has shown great success in the NLP domain, it is challenging to program image data because of a) the challenge to describe images using visual vocabulary without human annotations and b) lacking efficient tools for data programming of images. We present Visual Concept Programming, a first-of-its-kind visual analytics approach of using visual concepts to program image data at scale while requiring a few human efforts. Our approach is built upon three unique components. It first uses a self-supervised learning approach to learn visual representation at the pixel level and extract a dictionary of visual concepts from images without using any human annotations. The visual concepts serve as building blocks of labeling functions for experts to inject their domain knowledge. We then design interactive visualizations to explore and understand visual concepts and compose labeling functions with concepts without writing code. Finally, with the composed labeling functions, users can label the image data at scale and use the labeled data to refine the pixel-wise visual representation and concept quality. We evaluate the learned pixel-wise visual representation for the downstream task of semantic segmentation to show the effectiveness and usefulness of our approach. In addition, we demonstrate how our approach tackles real-world problems of image retrieval for autonomous driving.
Collapse
|
5
|
Liu H, Dai H, Chen J, Xu J, Tao Y, Lin H. Interactive similar patient retrieval for visual summary of patient outcomes. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00898-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Representation and analysis of time-series data via deep embedding and visual exploration. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00890-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
7
|
Vajiac C, Chau DH, Olligschlaeger A, Mackenzie R, Nair P, Lee MC, Li Y, Park N, Rabbany R, Faloutsos C. TRAFFICVIS: Visualizing Organized Activity and Spatio-Temporal Patterns for Detecting and Labeling Human Trafficking. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; PP:1-10. [PMID: 36201417 DOI: 10.1109/tvcg.2022.3209403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Law enforcement and domain experts can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. How can we explain clustering results intuitively and interactively, visualizing potential evidence for experts to analyze? We present TRAFFICVIS, the first interface for cluster-level HT detection and labeling. Developed through months of participatory design with domain experts, TRAFFICVIS provides coordinated views in conjunction with carefully chosen backend algorithms to effectively show spatio-temporal and text patterns to a wide variety of anti-HT stakeholders. We build upon state-of-the-art text clustering algorithms by incorporating shared metadata as a signal of connected and possibly suspicious activity, then visualize the results. Domain experts can use TRAFFICVIS to label clusters as HT, or other, suspicious, but non-HT activity such as spam and scam, quickly creating labeled datasets to enable further HT research. Through domain expert feedback and a usage scenario, we demonstrate TRAFFICVIS's efficacy. The feedback was overwhelmingly positive, with repeated high praises for the usability and explainability of our tool, the latter being vital for indicting possible criminals.
Collapse
|
8
|
|
9
|
Hinterreiter A, Ruch P, Stitz H, Ennemoser M, Bernard J, Strobelt H, Streit M. ConfusionFlow: A Model-Agnostic Visualization for Temporal Analysis of Classifier Confusion. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1222-1236. [PMID: 32746284 DOI: 10.1109/tvcg.2020.3012063] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Classifiers are among the most widely used supervised machine learning algorithms. Many classification models exist, and choosing the right one for a given task is difficult. During model selection and debugging, data scientists need to assess classifiers' performances, evaluate their learning behavior over time, and compare different models. Typically, this analysis is based on single-number performance measures such as accuracy. A more detailed evaluation of classifiers is possible by inspecting class errors. The confusion matrix is an established way for visualizing these class errors, but it was not designed with temporal or comparative analysis in mind. More generally, established performance analysis systems do not allow a combined temporal and comparative analysis of class-level information. To address this issue, we propose ConfusionFlow, an interactive, comparative visualization tool that combines the benefits of class confusion matrices with the visualization of performance characteristics over time. ConfusionFlow is model-agnostic and can be used to compare performances for different model types, model architectures, and/or training and test datasets. We demonstrate the usefulness of ConfusionFlow in a case study on instance selection strategies in active learning. We further assess the scalability of ConfusionFlow and present a use case in the context of neural network pruning.
Collapse
|
10
|
Jia S, Li Z, Chen N, Zhang J. Towards Visual Explainable Active Learning for Zero-Shot Classification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:791-801. [PMID: 34587036 DOI: 10.1109/tvcg.2021.3114793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans' efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.
Collapse
|
11
|
Bernard J, Hutter M, Sedlmair M, Zeppelzauer M, Munzner T. A Taxonomy of Property Measures to Unify Active Learning and Human-centered Approaches to Data Labeling. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3439333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Strategies for selecting the next data instance to label, in service of generating labeled data for machine learning, have been considered separately in the machine learning literature on active learning and in the visual analytics literature on human-centered approaches. We propose a unified design space for instance selection strategies to support detailed and fine-grained analysis covering both of these perspectives. We identify a concise set of 15 properties, namely measureable characteristics of datasets or of machine learning models applied to them, that cover most of the strategies in these literatures. To quantify these properties, we introduce Property Measures (PM) as fine-grained building blocks that can be used to formalize instance selection strategies. In addition, we present a taxonomy of PMs to support the description, evaluation, and generation of PMs across four dimensions: machine learning (ML)
Model Output
,
Instance Relations
,
Measure Functionality
, and
Measure Valence
. We also create computational infrastructure to support qualitative visual data analysis: a visual analytics explainer for PMs built around an implementation of PMs using cascades of eight atomic functions. It supports eight analysis tasks, covering the analysis of datasets and ML models using visual comparison within and between PMs and groups of PMs, and over time during the interactive labeling process. We iteratively refined the PM taxonomy, the explainer, and the task abstraction in parallel with each other during a two-year formative process, and show evidence of their utility through a summative evaluation with the same infrastructure. This research builds a formal baseline for the better understanding of the commonalities and differences of instance selection strategies, which can serve as the stepping stone for the synthesis of novel strategies in future work.
Collapse
|
12
|
Sevastjanova R, Jentner W, Sperrle F, Kehlbeck R, Bernard J, El-assady M. QuestionComb: A Gamification Approach for the Visual Explanation of Linguistic Phenomena through Interactive Labeling. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3429448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Linguistic insight in the form of high-level relationships and rules in text builds the basis of our understanding of language. However, the data-driven generation of such structures often lacks labeled resources that can be used as training data for supervised machine learning. The creation of such ground-truth data is a time-consuming process that often requires domain expertise to resolve text ambiguities and characterize linguistic phenomena. Furthermore, the creation and refinement of machine learning models is often challenging for linguists as the models are often complex, in-transparent, and difficult to understand. To tackle these challenges, we present a visual analytics technique for interactive data labeling that applies concepts from gamification and explainable Artificial Intelligence (XAI) to support complex classification tasks. The visual-interactive labeling interface promotes the creation of effective training data. Visual explanations of learned rules unveil the decisions of the machine learning model and support iterative and interactive optimization. The gamification-inspired design guides the user through the labeling process and provides feedback on the model performance. As an instance of the proposed technique, we present
QuestionComb
, a workspace tailored to the task of question classification (i.e., in
information-seeking
vs.
non-information-seeking
questions). Our evaluation studies confirm that gamification concepts are beneficial to engage users through continuous feedback, offering an effective visual analytics technique when combined with active learning and XAI.
Collapse
|
13
|
Ali M, Borgo R, Jones MW. Concurrent time-series selections using deep learning and dimension reduction. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107507] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
14
|
Construct boundaries and place labels for multi-class scatterplots. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00791-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
15
|
Chen C, Wang Z, Wu J, Wang X, Guo LZ, Li YF, Liu S. Interactive Graph Construction for Graph-Based Semi-Supervised Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3701-3716. [PMID: 34048346 DOI: 10.1109/tvcg.2021.3084694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Semi-supervised learning (SSL) provides a way to improve the performance of prediction models (e.g., classifier) via the usage of unlabeled samples. An effective and widely used method is to construct a graph that describes the relationship between labeled and unlabeled samples. Practical experience indicates that graph quality significantly affects the model performance. In this paper, we present a visual analysis method that interactively constructs a high-quality graph for better model performance. In particular, we propose an interactive graph construction method based on the large margin principle. We have developed a river visualization and a hybrid visualization that combines a scatterplot, a node-link diagram, and a bar chart to convey the label propagation of graph-based SSL. Based on the understanding of the propagation, a user can select regions of interest to inspect and modify the graph. We conducted two case studies to showcase how our method facilitates the exploitation of labeled and unlabeled samples for improving model performance.
Collapse
|
16
|
Krak I, Barmak O, Manziuk E. Using visual analytics to develop human and machine‐centric models: A review of approaches and proposed information technology. Comput Intell 2020. [DOI: 10.1111/coin.12289] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Iurii Krak
- Department of Theoretical CyberneticsTaras Shevchenko National University of Kyiv Kyiv Ukraine
| | - Olexander Barmak
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| | - Eduard Manziuk
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| |
Collapse
|
17
|
Nadj M, Knaeble M, Li MX, Maedche A. Power to the Oracle? Design Principles for Interactive Labeling Systems in Machine Learning. KUNSTLICHE INTELLIGENZ 2020. [DOI: 10.1007/s13218-020-00634-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
18
|
Khayat M, Karimzadeh M, Ebert DS, Ghafoor A. The Validity, Generalizability and Feasibility of Summative Evaluation Methods in Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:353-363. [PMID: 31425085 DOI: 10.1109/tvcg.2019.2934264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many evaluation methods have been used to assess the usefulness of Visual Analytics (VA) solutions. These methods stem from a variety of origins with different assumptions and goals, which cause confusion about their proofing capabilities. Moreover, the lack of discussion about the evaluation processes may limit our potential to develop new evaluation methods specialized for VA. In this paper, we present an analysis of evaluation methods that have been used to summatively evaluate VA solutions. We provide a survey and taxonomy of the evaluation methods that have appeared in the VAST literature in the past two years. We then analyze these methods in terms of validity and generalizability of their findings, as well as the feasibility of using them. We propose a new metric called summative quality to compare evaluation methods according to their ability to prove usefulness, and make recommendations for selecting evaluation methods based on their summative quality in the VA domain.
Collapse
|
19
|
Snyder LS, Lin YS, Karimzadeh M, Goldwasser D, Ebert DS. Interactive Learning for Identifying Relevant Tweets to Support Real-time Situational Awareness. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:558-568. [PMID: 31442995 DOI: 10.1109/tvcg.2019.2934614] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Various domain users are increasingly leveraging real-time social media data to gain rapid situational awareness. However, due to the high noise in the deluge of data, effectively determining semantically relevant information can be difficult, further complicated by the changing definition of relevancy by each end user for different events. The majority of existing methods for short text relevance classification fail to incorporate users' knowledge into the classification process. Existing methods that incorporate interactive user feedback focus on historical datasets. Therefore, classifiers cannot be interactively retrained for specific events or user-dependent needs in real-time. This limits real-time situational awareness, as streaming data that is incorrectly classified cannot be corrected immediately, permitting the possibility for important incoming data to be incorrectly classified as well. We present a novel interactive learning framework to improve the classification process in which the user iteratively corrects the relevancy of tweets in real-time to train the classification model on-the-fly for immediate predictive improvements. We computationally evaluate our classification model adapted to learn at interactive rates. Our results show that our approach outperforms state-of-the-art machine learning models. In addition, we integrate our framework with the extended Social Media Analytics and Reporting Toolkit (SMART) 2.0 system, allowing the use of our interactive learning framework within a visual analytics system tailored for real-time situational awareness. To demonstrate our framework's effectiveness, we provide domain expert feedback from first responders who used the extended SMART 2.0 system.
Collapse
|
20
|
Fan X, Li C, Yuan X, Dong X, Liang J. An interactive visual analytics approach for network anomaly detection through smart labeling. J Vis (Tokyo) 2019. [DOI: 10.1007/s12650-019-00580-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
|
22
|
Liu S, Chen C, Lu Y, Ouyang F, Wang B. An Interactive Method to Improve Crowdsourced Annotations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:235-245. [PMID: 30130224 DOI: 10.1109/tvcg.2018.2864843] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In order to effectively infer correct labels from noisy crowdsourced annotations, learning-from-crowds models have introduced expert validation. However, little research has been done on facilitating the validation procedure. In this paper, we propose an interactive method to assist experts in verifying uncertain instance labels and unreliable workers. Given the instance labels and worker reliability inferred from a learning-from-crowds model, candidate instances and workers are selected for expert validation. The influence of verified results is propagated to relevant instances and workers through the learning-from-crowds model. To facilitate the validation of annotations, we have developed a confusion visualization to indicate the confusing classes for further exploration, a constrained projection method to show the uncertain labels in context, and a scatter-plot-based visualization to illustrate worker reliability. The three visualizations are tightly integrated with the learning-from-crowds model to provide an iterative and progressive environment for data validation. Two case studies were conducted that demonstrate our approach offers an efficient method for validating and improving crowdsourced annotations.
Collapse
|
23
|
Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification: A Review. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2018. [DOI: 10.3390/ijgi7020065] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|