1
|
Chowdhury S, Joel-Edgar S, Dey PK, Bhattacharya S, Kharlamov A. Embedding transparency in artificial intelligence machine learning models: managerial implications on predicting and explaining employee turnover. INTERNATIONAL JOURNAL OF HUMAN RESOURCE MANAGEMENT 2022. [DOI: 10.1080/09585192.2022.2066981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Soumyadeb Chowdhury
- Information, Operations and Management Sciences Department, TBS Business School, Toulouse, France
| | - Sian Joel-Edgar
- New College of the Humanities Northeastern University, London, United Kingdom
| | - Prasanta Kumar Dey
- Operations and Information Management, Aston Business School, College of Business and Social Science, Aston University, Birmingham, United Kingdom
| | - Sudeshna Bhattacharya
- Work and Organisation Department, Aston Business School, College of Business and Social Science, Aston University, Birmingham, United Kingdom
| | - Alexander Kharlamov
- Operations and Information Management, Aston Business School, College of Business and Social Science, Aston University, Birmingham, United Kingdom
| |
Collapse
|
2
|
Jia S, Li Z, Chen N, Zhang J. Towards Visual Explainable Active Learning for Zero-Shot Classification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:791-801. [PMID: 34587036 DOI: 10.1109/tvcg.2021.3114793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans' efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.
Collapse
|
3
|
Neto MP, Paulovich FV. Explainable Matrix - Visualization for Global and Local Interpretability of Random Forest Classification Ensembles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1427-1437. [PMID: 33048689 DOI: 10.1109/tvcg.2020.3030354] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Over the past decades, classification models have proven to be essential machine learning tools given their potential and applicability in various domains. In these years, the north of the majority of the researchers had been to improve quantitative metrics, notwithstanding the lack of information about models' decisions such metrics convey. This paradigm has recently shifted, and strategies beyond tables and numbers to assist in interpreting models' decisions are increasing in importance. Part of this trend, visualization techniques have been extensively used to support classification models' interpretability, with a significant focus on rule-based models. Despite the advances, the existing approaches present limitations in terms of visual scalability, and the visualization of large and complex models, such as the ones produced by the Random Forest (RF) technique, remains a challenge. In this paper, we propose Explainable Matrix (ExMatrix), a novel visualization method for RF interpretability that can handle models with massive quantities of rules. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates, enabling the analysis of entire models and auditing classification results. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
Collapse
|
4
|
Chen S, Andrienko N, Andrienko G, Adilova L, Barlet J, Kindermann J, Nguyen PH, Thonnard O, Turkay C. LDA Ensembles for Interactive Exploration and Categorization of Behaviors. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2775-2792. [PMID: 30869622 DOI: 10.1109/tvcg.2019.2904069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We define behavior as a set of actions performed by some actor during a period of time. We consider the problem of analyzing a large collection of behaviors by multiple actors, more specifically, identifying typical behaviors and spotting anomalous behaviors. We propose an approach leveraging topic modeling techniques - LDA (Latent Dirichlet Allocation) Ensembles - to represent categories of typical behaviors by topics that are obtained through topic modeling a behavior collection. When such methods are applied to text in natural languages, the quality of the extracted topics are usually judged based on the semantic relatedness of the terms pertinent to the topics. This criterion, however, is not necessarily applicable to topics extracted from non-textual data, such as action sets, since relationships between actions may not be obvious. We have developed a suite of visual and interactive techniques supporting the construction of an appropriate combination of topics based on other criteria, such as distinctiveness and coverage of the behavior set. Two case studies on analyzing operation behaviors in the security management system and visiting behaviors in an amusement park, and the expert evaluation of the first case study demonstrate the effectiveness of our approach.
Collapse
|
5
|
Zhang J, Wang Y, Molino P, Li L, Ebert DS. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:364-373. [PMID: 30130197 DOI: 10.1109/tvcg.2018.2864499] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models' outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied.
Collapse
|
6
|
Li H, Fang S, Mukhopadhyay S, Saykin AJ, Shen L. Interactive Machine Learning by Visualization: A Small Data Solution. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2018; 2018:3513-3521. [PMID: 31061990 DOI: 10.1109/bigdata.2018.8621952] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Machine learning algorithms and traditional data mining process usually require a large volume of data to train the algorithm-specific models, with little or no user feedback during the model building process. Such a "big data" based automatic learning strategy is sometimes unrealistic for applications where data collection or processing is very expensive or difficult, such as in clinical trials. Furthermore, expert knowledge can be very valuable in the model building process in some fields such as biomedical sciences. In this paper, we propose a new visual analytics approach to interactive machine learning and visual data mining. In this approach, multi-dimensional data visualization techniques are employed to facilitate user interactions with the machine learning and mining process. This allows dynamic user feedback in different forms, such as data selection, data labeling, and data correction, to enhance the efficiency of model building. In particular, this approach can significantly reduce the amount of data required for training an accurate model, and therefore can be highly impactful for applications where large amount of data is hard to obtain. The proposed approach is tested on two application problems: the handwriting recognition (classification) problem and the human cognitive score prediction (regression) problem. Both experiments show that visualization supported interactive machine learning and data mining can achieve the same accuracy as an automatic process can with much smaller training data sets.
Collapse
Affiliation(s)
- Huang Li
- Department of Computer & Information Science, Indiana University Purdue University Indianapolis
| | - Shiaofen Fang
- Department of Computer & Information Science, Indiana University Purdue University Indianapolis
| | - Snehasis Mukhopadhyay
- Department of Computer & Information Science, Indiana University Purdue University Indianapolis
| | | | - Li Shen
- University of Pennsylvania Perelman School of Medicine
| |
Collapse
|
7
|
|
8
|
Rauber PE, Falcão AX, Telea AC. Projections as visual aids for classification system design. INFORMATION VISUALIZATION 2018; 17:282-305. [PMID: 30263012 PMCID: PMC6131729 DOI: 10.1177/1473871617713337] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms.
Collapse
Affiliation(s)
- Paulo E Rauber
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
- University of Campinas, Campinas,
Brazil
| | | | - Alexandru C Telea
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
9
|
|
10
|
Liu S, Chen C, Lu Y, Ouyang F, Wang B. An Interactive Method to Improve Crowdsourced Annotations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:235-245. [PMID: 30130224 DOI: 10.1109/tvcg.2018.2864843] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In order to effectively infer correct labels from noisy crowdsourced annotations, learning-from-crowds models have introduced expert validation. However, little research has been done on facilitating the validation procedure. In this paper, we propose an interactive method to assist experts in verifying uncertain instance labels and unreliable workers. Given the instance labels and worker reliability inferred from a learning-from-crowds model, candidate instances and workers are selected for expert validation. The influence of verified results is propagated to relevant instances and workers through the learning-from-crowds model. To facilitate the validation of annotations, we have developed a confusion visualization to indicate the confusing classes for further exploration, a constrained projection method to show the uncertain labels in context, and a scatter-plot-based visualization to illustrate worker reliability. The three visualizations are tightly integrated with the learning-from-crowds model to provide an iterative and progressive environment for data validation. Two case studies were conducted that demonstrate our approach offers an efficient method for validating and improving crowdsourced annotations.
Collapse
|
11
|
Liu S, Xiao J, Liu J, Wang X, Wu J, Zhu J. Visual Diagnosis of Tree Boosting Methods. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:163-173. [PMID: 28866545 DOI: 10.1109/tvcg.2017.2744378] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Tree boosting, which combines weak learners (typically decision trees) to generate a strong learner, is a highly effective and widely used machine learning method. However, the development of a high performance tree boosting model is a time-consuming process that requires numerous trial-and-error experiments. To tackle this issue, we have developed a visual diagnosis tool, BOOSTVis, to help experts quickly analyze and diagnose the training process of tree boosting. In particular, we have designed a temporal confusion matrix visualization, and combined it with a t-SNE projection and a tree visualization. These visualization components work together to provide a comprehensive overview of a tree boosting model, and enable an effective diagnosis of an unsatisfactory training process. Two case studies that were conducted on the Otto Group Product Classification Challenge dataset demonstrate that BOOSTVis can provide informative feedback and guidance to improve understanding and diagnosis of tree boosting algorithms.
Collapse
|
12
|
Liu M, Shi J, Li Z, Li C, Zhu J, Liu S. Towards Better Analysis of Deep Convolutional Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:91-100. [PMID: 27576252 DOI: 10.1109/tvcg.2016.2598831] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Deep convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks such as image classification. However, the development of high-quality deep models typically relies on a substantial amount of trial-and-error, as there is still no clear understanding of when and why a deep model works. In this paper, we present a visual analytics approach for better understanding, diagnosing, and refining deep CNNs. We formulate a deep CNN as a directed acyclic graph. Based on this formulation, a hybrid visualization is developed to disclose the multiple facets of each neuron and the interactions between them. In particular, we introduce a hierarchical rectangle packing algorithm and a matrix reordering algorithm to show the derived features of a neuron cluster. We also propose a biclustering-based edge bundling method to reduce visual clutter caused by a large number of connections between neurons. We evaluated our method on a set of CNNs and the results are generally favorable.
Collapse
|
13
|
Dinkla K, Strobelt H, Genest B, Reiling S, Borowsky M, Pfister H. Screenit: Visual Analysis of Cellular Screens. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:591-600. [PMID: 27875174 DOI: 10.1109/tvcg.2016.2598587] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput and high-content screening enables large scale, cost-effective experiments in which cell cultures are exposed to a wide spectrum of drugs. The resulting multivariate data sets have a large but shallow hierarchical structure. The deepest level of this structure describes cells in terms of numeric features that are derived from image data. The subsequent level describes enveloping cell cultures in terms of imposed experiment conditions (exposure to drugs). We present Screenit, a visual analysis approach designed in close collaboration with screening experts. Screenit enables the navigation and analysis of multivariate data at multiple hierarchy levels and at multiple levels of detail. Screenit integrates the interactive modeling of cell physical states (phenotypes) and the effects of drugs on cell cultures (hits). In addition, quality control is enabled via the detection of anomalies that indicate low-quality data, while providing an interface that is designed to match workflows of screening experts. We demonstrate analyses for a real-world data set, CellMorph, with 6 million cells across 20,000 cell cultures.
Collapse
|
14
|
Ren D, Amershi S, Lee B, Suh J, Williams JD. Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:61-70. [PMID: 27875134 DOI: 10.1109/tvcg.2016.2598828] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Performance analysis is critical in applied machine learning because it influences the models practitioners produce. Current performance analysis tools suffer from issues including obscuring important characteristics of model behavior and dissociating performance from data. In this work, we present Squares, a performance visualization for multiclass classification problems. Squares supports estimating common performance metrics while displaying instance-level distribution information necessary for helping practitioners prioritize efforts and access data. Our controlled study shows that practitioners can assess performance significantly faster and more accurately with Squares than a confusion matrix, a common performance analysis tool in machine learning.
Collapse
|
15
|
Zhang KB, Orgun MA, Shankaran R, Zhang D. Classifying high dimensional data by interactive visual analysis. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2016. [DOI: 10.1016/j.jvlc.2015.11.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|