1
|
Cibulski L, May T, Schmidt J, Kohlhammer J. COMPO*SED: Composite Parallel Coordinates for Co-Dependent Multi-Attribute Choices. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4047-4061. [PMID: 35679374 DOI: 10.1109/tvcg.2022.3180899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We propose Composite Parallel Coordinates, a novel parallel coordinates technique to effectively represent the interplay of component alternatives in a system. It builds upon a dedicated data model that formally describes the interaction of components. Parallel coordinates can help decision-makers identify the most preferred solution among a number of alternatives. Multi-component systems require one such multi-attribute choice for each component. Each of these choices might have side effects on the system's operability and performance, making them co-dependent. Common approaches employ complex multi-component models or involve back-and-forth iterations between single components until an acceptable compromise is reached. A simultaneous visual exploration across independently modeled but connected components is needed to make system design more efficient. Using dedicated layout and interaction strategies, our Composite Parallel Coordinates allow analysts to explore both individual properties of components as well as their interoperability and joint performance. We showcase the effectiveness of Composite Parallel Coordinates for co-dependent multi-attribute choices by means of three real-world scenarios from distinct application areas. In addition to the case studies, we reflect on observing two domain experts collaboratively working with the proposed technique and communicating along the way.
Collapse
|
2
|
Younesy H, Pober J, Möller T, Karimi MM. ModEx: a general purpose computer model exploration system. FRONTIERS IN BIOINFORMATICS 2023; 3:1153800. [PMID: 37304402 PMCID: PMC10249055 DOI: 10.3389/fbinf.2023.1153800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/09/2023] [Indexed: 06/13/2023] Open
Abstract
We present a general purpose visual analysis system that can be used for exploring parameters of a variety of computer models. Our proposed system offers key components of a visual parameter analysis framework including parameter sampling, deriving output summaries, and an exploration interface. It also provides an API for rapid development of parameter space exploration solutions as well as the flexibility to support custom workflows for different application domains. We evaluate the effectiveness of our system by demonstrating it in three domains: data mining, machine learning and specific application in bioinformatics.
Collapse
Affiliation(s)
- Hamid Younesy
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | | | - Torsten Möller
- Research Network Data Science and Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Mohammad M. Karimi
- Comprehensive Cancer Centre, School of Cancer and Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, United Kingdom
| |
Collapse
|
3
|
Tyagi A, Xie C, Mueller K. NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:299-309. [PMID: 36166525 DOI: 10.1109/tvcg.2022.3209361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The success of DL can be attributed to hours of parameter and architecture tuning by human experts. Neural Architecture Search (NAS) techniques aim to solve this problem by automating the search procedure for DNN architectures making it possible for non-experts to work with DNNs. Specifically, One-shot NAS techniques have recently gained popularity as they are known to reduce the search time for NAS techniques. One-Shot NAS works by training a large template network through parameter sharing which includes all the candidate NNs. This is followed by applying a procedure to rank its components through evaluating the possible candidate architectures chosen randomly. However, as these search models become increasingly powerful and diverse, they become harder to understand. Consequently, even though the search results work well, it is hard to identify search biases and control the search progression, hence a need for explainability and human-in-the-loop (HIL) One-Shot NAS. To alleviate these problems, we present NAS-Navigator, a visual analytics (VA) system aiming to solve three problems with One-Shot NAS; explainability, HIL design, and performance improvements compared to existing state-of-the-art (SOTA) techniques. NAS-Navigator gives full control of NAS back in the hands of the users while still keeping the perks of automated search, thus assisting non-expert users. Analysts can use their domain knowledge aided by cues from the interface to guide the search. Evaluation results confirm the performance of our improved One-Shot NAS algorithm is comparable to other SOTA techniques. While adding Visual Analytics (VA) using NAS-Navigator shows further improvements in search time and performance. We designed our interface in collaboration with several deep learning researchers and evaluated NAS-Navigator through a control experiment and expert interviews.
Collapse
|
4
|
Streeb D, Metz Y, Schlegel U, Schneider B, El-Assady M, Neth H, Chen M, Keim DA. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3307-3323. [PMID: 33439846 DOI: 10.1109/tvcg.2020.3045560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Visual analytics enables the coupling of machine learning models and humans in a tightly integrated workflow, addressing various analysis tasks. Each task poses distinct demands to analysts and decision-makers. In this survey, we focus on one canonical technique for rule-based classification, namely decision tree classifiers. We provide an overview of available visualizations for decision trees with a focus on how visualizations differ with respect to 16 tasks. Further, we investigate the types of visual designs employed, and the quality measures presented. We find that (i) interactive visual analytics systems for classifier development offer a variety of visual designs, (ii) utilization tasks are sparsely covered, (iii) beyond classifier development, node-link diagrams are omnipresent, (iv) even systems designed for machine learning experts rarely feature visual representations of quality measures other than accuracy. In conclusion, we see a potential for integrating algorithmic techniques, mathematical quality measures, and tailored interactive visualizations to enable human experts to utilize their knowledge more effectively.
Collapse
|
5
|
Heimerl F, Kralj C, Moller T, Gleicher M. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2953-2969. [PMID: 33347410 DOI: 10.1109/tvcg.2020.3045918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
Collapse
|
6
|
Roslan MHB, Chen CJ. Predicting students' performance in English and Mathematics using data mining techniques. EDUCATION AND INFORMATION TECHNOLOGIES 2022; 28:1427-1453. [PMID: 35919875 PMCID: PMC9334550 DOI: 10.1007/s10639-022-11259-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
This study attempts to predict secondary school students' performance in English and Mathematics subjects using data mining (DM) techniques. It aims to provide insights into predictors of students' performance in English and Mathematics, characteristics of students with different levels of performance, the most effective DM technique for students' performance prediction, and the relationship between these two subjects. The study employed the archival data of students who were 16 years old in 2019 and sat for the Malaysian Certificate of Examination (MCE) in 2021. The learning of English and Mathematics is a concern in many countries. Three main factors, namely students' past academic performance, demographics, and psychological attributes were scrutinized to identify their impact on the prediction. This study utilized the Orange software for the DM process. It employed Decision Tree (DT) rules to determine the characteristics of students with low, moderate, and high performance in English and Mathematics subjects. DT and Naïve Bayes (NB) techniques show the best predictive performance for English and Mathematics subjects, respectively. Such characteristics and predictions may cue appropriate interventions to improve students' performance in these subjects. This study revealed students' past academic performance as the most critical predictor, as well as a few demographics and psychological attributes. By examining top predictors derived using four different classifier types, this study found that students' past Mathematics performance predicts their MCE English performance and students' past English performance predicts their MCE Mathematics performance. This finding shows students' performances in both subjects are interrelated.
Collapse
Affiliation(s)
- Muhammad Haziq Bin Roslan
- Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Malaysia
| | - Chwen Jen Chen
- Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Malaysia
| |
Collapse
|
7
|
Shen S, Fan J. Emotion Analysis of Ideological and Political Education Using a GRU Deep Neural Network. Front Psychol 2022; 13:908154. [PMID: 35959025 PMCID: PMC9361775 DOI: 10.3389/fpsyg.2022.908154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 06/06/2022] [Indexed: 11/13/2022] Open
Abstract
Theoretical research into the emotional attributes of ideological and political education can improve our ability to understand human emotion and solve socio-emotional problems. To that end, this study undertook an analysis of emotion in ideological and political education by integrating a gate recurrent unit (GRU) with an attention mechanism. Based on the good results achieved by BERT in the downstream network, we use the long focusing attention mechanism assisted by two-way GRU to extract relevant information and global information of ideological and political education and emotion analysis, respectively. The two kinds of information complement each other, and the accuracy of emotion information can be further improved by combining neural network model. Finally, the validity and domain adaptability of the model were verified using several publicly available, fine-grained emotion datasets.
Collapse
Affiliation(s)
- Shoucheng Shen
- School of Marxism, Dalian Maritime University, Dalian, China
| | - Jinling Fan
- Office of Academic Affairs, Hebei University of Science and Technology, Shijiazhuang, China
- *Correspondence: Jinling Fan,
| |
Collapse
|
8
|
Knittel J, Lalama A, Koch S, Ertl T. Visual Neural Decomposition to Explain Multivariate Data Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1374-1384. [PMID: 33048724 DOI: 10.1109/tvcg.2020.3030420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.
Collapse
|
9
|
Ono JP, Castelo S, Lopez R, Bertini E, Freire J, Silva C. PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:390-400. [PMID: 33048694 DOI: 10.1109/tvcg.2020.3030361] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers to debug. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem. In this paper, we present the Pipeline Profiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be combined with common data science tools to enable a rich set of analyses of the ML pipelines, providing users a better understanding of the algorithms that generated them as well as insights into how they can be improved. We demonstrate the utility of our tool through use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.
Collapse
|
10
|
Wang YG, Huang SY, Wang LN, Zhou ZY, Qiu JD. Accurate prediction of species-specific 2-hydroxyisobutyrylation sites based on machine learning frameworks. Anal Biochem 2020; 602:113793. [DOI: 10.1016/j.ab.2020.113793] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 04/25/2020] [Accepted: 05/20/2020] [Indexed: 12/17/2022]
|
11
|
Krak I, Barmak O, Manziuk E. Using visual analytics to develop human and machine‐centric models: A review of approaches and proposed information technology. Comput Intell 2020. [DOI: 10.1111/coin.12289] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Iurii Krak
- Department of Theoretical CyberneticsTaras Shevchenko National University of Kyiv Kyiv Ukraine
| | - Olexander Barmak
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| | - Eduard Manziuk
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| |
Collapse
|
12
|
Cashman D, Perer A, Chang R, Strobelt H. Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:863-873. [PMID: 31502978 DOI: 10.1109/tvcg.2019.2934261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The performance of deep learning models is dependent on the precise configuration of many layers and parameters. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space.
Collapse
|
13
|
|
14
|
Xiong X, Fu M, Zhu M, Liang J. Visual potential expert prediction in question and answering communities. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
15
|
Zhao X, Wu Y, Lee DL, Cui W. iForest: Interpreting Random Forests via Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:407-416. [PMID: 30188822 DOI: 10.1109/tvcg.2018.2864475] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. The ensemble nature of the model helps random forests outperform any individual decision tree. However, it also leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. The interpretation challenges stem from the variety and complexity of the contained decision trees. Each decision tree has its unique structure and properties, such as the features used in the tree and the feature threshold in each tree node. Thus, a data input may lead to a variety of decision paths. To understand how a final prediction is achieved, it is desired to understand and compare all decision paths in the context of all tree structures, which is a huge challenge for any users. In this paper, we propose a visual analytic system aiming at interpreting random forest models and predictions. In addition to providing users with all the tree information, we summarize the decision paths in random forests, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation. To demonstrate the effectiveness of our system, two usage scenarios and a qualitative user study are conducted.
Collapse
|
16
|
Sacha D, Kraus M, Keim DA, Chen M. VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:385-395. [PMID: 30130221 DOI: 10.1109/tvcg.2018.2864838] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
While many VA workflows make use of machine-learned models to support analytical tasks, VA workflows have become increasingly important in understanding and improving Machine Learning (ML) processes. In this paper, we propose an ontology (VIS4ML) for a subarea of VA, namely "VA-assisted ML". The purpose of VIS4ML is to describe and understand existing VA workflows used in ML as well as to detect gaps in ML processes and the potential of introducing advanced VA techniques to such processes. Ontologies have been widely used to map out the scope of a topic in biology, medicine, and many other disciplines. We adopt the scholarly methodologies for constructing VIS4ML, including the specification, conceptualization, formalization, implementation, and validation of ontologies. In particular, we reinterpret the traditional VA pipeline to encompass model-development workflows. We introduce necessary definitions, rules, syntaxes, and visual notations for formulating VIS4ML and make use of semantic web technologies for implementing it in the Web Ontology Language (OWL). VIS4ML captures the high-level knowledge about previous workflows where VA is used to assist in ML. It is consistent with the established VA concepts and will continue to evolve along with the future developments in VA and ML. While this ontology is an effort for building the theoretical foundation of VA, it can be used by practitioners in real-world applications to optimize model-development workflows by systematically examining the potential benefits that can be brought about by either machine or human capabilities. Meanwhile, VIS4ML is intended to be extensible and will continue to be updated to reflect future advancements in using VA for building high-quality data-analytical models or for building such models rapidly.
Collapse
|