1
|
Eckelt K, Hinterreiter A, Adelberger P, Walchshofer C, Dhanoa V, Humer C, Heckmann M, Steinparz C, Streit M. Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3312-3326. [PMID: 35254984 DOI: 10.1109/tvcg.2022.3156760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this work, we propose an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups are accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.
Collapse
|
2
|
Collaris D, van Wijk JJ. StrategyAtlas: Strategy Analysis for Machine Learning Interpretability. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2996-3008. [PMID: 35085084 DOI: 10.1109/tvcg.2022.3146806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Businesses in high-risk environments have been reluctant to adopt modern machine learning approaches due to their complex and uninterpretable nature. Most current solutions provide local, instance-level explanations, but this is insufficient for understanding the model as a whole. In this work, we show that strategy clusters (i.e., groups of data instances that are treated distinctly by the model) can be used to understand the global behavior of a complex ML model. To support effective exploration and understanding of these clusters, we introduce StrategyAtlas, a system designed to analyze and explain model strategies. Furthermore, it supports multiple ways to utilize these strategies for simplifying and improving the reference model. In collaboration with a large insurance company, we present a use case in automatic insurance acceptance, and show how professional data scientists were enabled to understand a complex model and improve the production model based on these insights.
Collapse
|
3
|
Zhang J, Li Y, Wang B, Song J, Li M, Chen P, Shen Z, Wu Y, Mao C, Cao H, Wang X, Zhang W, Lu T. Rapid evaluation of Radix Paeoniae Alba and its processed products by near-infrared spectroscopy combined with multivariate algorithms. Anal Bioanal Chem 2023; 415:1719-1732. [PMID: 36763106 DOI: 10.1007/s00216-023-04570-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/07/2023] [Accepted: 01/25/2023] [Indexed: 02/11/2023]
Abstract
It is well known that the processing method of herbal medicine has a complex impact on the active components and clinical efficacy, which is difficult to measure. As a representative herb medicine with diverse processing methods, Radix Paeoniae Alba (RPA) and its processed products differ greatly in clinical efficacy. However, in some cases, different processed products are confused for use in clinical practice. Therefore, it is necessary to strictly control the quality of RPA and its processed products. Giving that the time-consuming and laborious operation of traditional quality control methods, a comprehensive strategy of near-infrared (NIR) spectroscopy combined with multivariate algorithms was proposed. This strategy has the advantages of being rapid and non-destructive, not only qualitatively distinguishing RPA and various processed products but also enabling quantitative prediction of five bioactive components. Qualitatively, the subspace clustering algorithm successfully differentiated RPA and three processed products, with an accuracy rate of 97.1%; quantitatively, interval combination optimization (ICO), competitive adaptive reweighted sampling (CARS), and competitive adaptive reweighted sampling combined with successive projections algorithm (CARS-SPA) were used to optimize the PLS model, and satisfactory results were obtained in terms of wavelength selection. In conclusion, it is feasible to use NIR spectroscopy to rapidly evaluate the effect of processing methods on the quality of RPA, which provides a meaningful reference for quality control of other herbal medicines with numerous processing methods.
Collapse
Affiliation(s)
- Jiuba Zhang
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Yu Li
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Bin Wang
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Jiantao Song
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Mingxuan Li
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Peng Chen
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Zheyuan Shen
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Yi Wu
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Chunqin Mao
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Hui Cao
- Research Center for Traditional Chinese Medicine of Lingnan (Southern China), Jinan University, Guangzhou, 510632, China
| | - Xiachang Wang
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China
| | - Wei Zhang
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China. .,College of Pharmacy, Anhui University of Chinese Medicine, Hefei, 230038, China. .,Anhui Province Key Laboratory of Traditional Chinese Medicine Decoction Pieces of New Manufacturing Technology, Hefei, 230038, China.
| | - Tulin Lu
- College of Pharmacy, Nanjing University of Chinese Medicine, 138 Xianlin Rd, Nanjing, 210023, People's Republic of China.
| |
Collapse
|
4
|
Espadoto M, Appleby G, Suh A, Cashman D, Li M, Scheidegger C, Anderson EW, Chang R, Telea AC. UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1559-1572. [PMID: 34748493 DOI: 10.1109/tvcg.2021.3125576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
Collapse
|
5
|
Bibal A, Delchevalerie V, Frénay B. DT-SNE: t-SNE Discrete Visualizations as Decision Tree Structures. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
6
|
Li J, Zhou CQ. Incorporation of Human Knowledge into Data Embeddings to Improve Pattern Significance and Interpretability. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:723-733. [PMID: 36155441 DOI: 10.1109/tvcg.2022.3209382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Embedding is a common technique for analyzing multi-dimensional data. However, the embedding projection cannot always form significant and interpretable visual structures that foreshadow underlying data patterns. We propose an approach that incorporates human knowledge into data embeddings to improve pattern significance and interpretability. The core idea is (1) externalizing tacit human knowledge as explicit sample labels and (2) adding a classification loss in the embedding network to encode samples' classes. The approach pulls samples of the same class with similar data features closer in the projection, leading to more compact (significant) and class-consistent (interpretable) visual structures. We give an embedding network with a customized classification loss to implement the idea and integrate the network into a visualization system to form a workflow that supports flexible class creation and pattern exploration. Patterns found on open datasets in case studies, subjects' performance in a user study, and quantitative experiment results illustrate the general usability and effectiveness of the approach.
Collapse
|
7
|
Liu Z, Wang Y, Bernard J, Munzner T. Visualizing Graph Neural Networks With CorGIE: Corresponding a Graph to Its Embedding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2500-2516. [PMID: 35120005 DOI: 10.1109/tvcg.2022.3148197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Graph neural networks (GNNs) are a class of powerful machine learning tools that model node relations for making predictions of nodes or links. GNN developers rely on quantitative metrics of the predictions to evaluate a GNN, but similar to many other neural networks, it is difficult for them to understand if the GNN truly learns characteristics of a graph as expected. We propose an approach to corresponding an input graph to its node embedding (aka latent space), a common component of GNNs that is later used for prediction. We abstract the data and tasks, and develop an interactive multi-view interface called CorGIE to instantiate the abstraction. As the key function in CorGIE, we propose the K-hop graph layout to show topological neighbors in hops and their clustering structure. To evaluate the functionality and usability of CorGIE, we present how to use CorGIE in two usage scenarios, and conduct a case study with five GNN experts. Availability: Open-source code at https://github.com/zipengliu/corgie-ui/, supplemental materials & video at https://osf.io/tr3sb/.
Collapse
|
8
|
Sohns JT, Schmitt M, Jirasek F, Hasse H, Leitte H. Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:540-550. [PMID: 34587086 DOI: 10.1109/tvcg.2021.3114870] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
Collapse
|
9
|
Lensen A, Xue B, Zhang M. Genetic Programming for Evolving a Front of Interpretable Models for Data Visualization. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5468-5482. [PMID: 32092030 DOI: 10.1109/tcyb.2020.2970198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Data visualization is a key tool in data mining for understanding big datasets. Many visualization methods have been proposed, including the well-regarded state-of-the-art method t-distributed stochastic neighbor embedding. However, the most powerful visualization methods have a significant limitation: the manner in which they create their visualization from the original features of the dataset is completely opaque. Many domains require an understanding of the data in terms of the original features; there is hence a need for powerful visualization methods which use understandable models. In this article, we propose a genetic programming (GP) approach called GP-tSNE for evolving interpretable mappings from the dataset to high-quality visualizations. A multiobjective approach is designed that produces a variety of visualizations in a single run which gives different tradeoffs between visual quality and model complexity. Testing against baseline methods on a variety of datasets shows the clear potential of GP-tSNE to allow deeper insight into data than that provided by existing visualization methods. We further highlight the benefits of a multiobjective approach through an in-depth analysis of a candidate front, which shows how multiple models can be analyzed jointly to give increased insight into the dataset.
Collapse
|
10
|
Bian R, Xue Y, Zhou L, Zhang J, Chen B, Weiskopf D, Wang Y. Implicit Multidimensional Projection of Local Subspaces. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1558-1568. [PMID: 33048698 DOI: 10.1109/tvcg.2020.3030368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.
Collapse
|
11
|
Kang B, García García D, Lijffijt J, Santos-Rodríguez R, De Bie T. Conditional t-SNE: more informative t-SNE embeddings. Mach Learn 2020; 110:2905-2940. [PMID: 34840420 PMCID: PMC8599264 DOI: 10.1007/s10994-020-05917-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/28/2020] [Accepted: 09/19/2020] [Indexed: 11/25/2022]
Abstract
Dimensionality reduction and manifold learning methods such as t-distributed stochastic neighbor embedding (t-SNE) are frequently used to map high-dimensional data into a two-dimensional space to visualize and explore that data. Going beyond the specifics of t-SNE, there are two substantial limitations of any such approach: (1) not all information can be captured in a single two-dimensional embedding, and (2) to well-informed users, the salient structure of such an embedding is often already known, preventing that any real new insights can be obtained. Currently, it is not known how to extract the remaining information in a similarly effective manner. We introduce conditional t-SNE (ct-SNE), a generalization of t-SNE that discounts prior information in the form of labels. This enables obtaining more informative and more relevant embeddings. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining an elegant method with a single integrated objective. We show how to efficiently optimize the objective and study the effects of the extra parameter that ct-SNE has over t-SNE. Qualitative and quantitative empirical results on synthetic and real data show ct-SNE is scalable, effective, and achieves its goal: it allows complementary structure to be captured in the embedding and provided new insights into real data.
Collapse
Affiliation(s)
- Bo Kang
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| | | | - Jefrey Lijffijt
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| | | | - Tijl De Bie
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| |
Collapse
|
12
|
Chatzimparmpas A, Martins RM, Kerren A. t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2696-2714. [PMID: 32305922 DOI: 10.1109/tvcg.2020.2986996] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this article, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
Collapse
|