1
|
Bernard J, Barth CM, Cuba E, Meier A, Peiris Y, Shneiderman B. IVESA - Visual Analysis of Time-Stamped Event Sequences. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2235-2256. [PMID: 38587948 DOI: 10.1109/tvcg.2024.3382760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Time-stamped event sequences (TSEQs) are time-oriented data without value information, shifting the focus of users to the exploration of temporal event occurrences. TSEQs exist in application domains, such as sleeping behavior, earthquake aftershocks, and stock market crashes. Domain experts face four challenges, for which they could use interactive and visual data analysis methods. First, TSEQs can be large with respect to both the number of sequences and events, often leading to millions of events. Second, domain experts need validated metrics and features to identify interesting patterns. Third, after identifying interesting patterns, domain experts contextualize the patterns to foster sensemaking. Finally, domain experts seek to reduce data complexity by data simplification and machine learning support. We present IVESA, a visual analytics approach for TSEQs. It supports the analysis of TSEQs at the granularities of sequences and events, supported with metrics and feature analysis tools. IVESA has multiple linked views that support overview, sort+filter, comparison, details-on-demand, and metadata relation-seeking tasks, as well as data simplification through feature analysis, interactive clustering, filtering, and motif detection and simplification. We evaluated IVESA with three case studies and a user study with six domain experts working with six different datasets and applications. Results demonstrate the usability and generalizability of IVESA across applications and cases that had up to 1,000,000 events.
Collapse
|
2
|
Zhao J, Liu X, Tang H, Wang X, Yang S, Liu D, Chen Y, Chen YV. Mesoscopic structure graphs for interpreting uncertainty in non-linear embeddings. Comput Biol Med 2024; 182:109105. [PMID: 39265479 DOI: 10.1016/j.compbiomed.2024.109105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/06/2024] [Accepted: 09/01/2024] [Indexed: 09/14/2024]
Abstract
Probabilistic-based non-linear dimensionality reduction (PB-NL-DR) methods, such as t-SNE and UMAP, are effective in unfolding complex high-dimensional manifolds, allowing users to explore and understand the structural patterns of data. However, due to the trade-off between global and local structure preservation and the randomness during computation, these methods may introduce false neighborhood relationships, known as distortion errors and misleading visualizations. To address this issue, we first conduct a detailed survey to illustrate the design space of prior layout enrichment visualizations for interpreting DR results, and then propose a node-link visualization technique, ManiGraph. This technique rethinks the neighborhood fidelity between the high- and low-dimensional spaces by constructing dynamic mesoscopic structure graphs and measuring region-adapted trustworthiness. ManiGraph also addresses the overplotting issue in scatterplot visualization for large-scale datasets and supports examining in unsupervised scenarios. We demonstrate the effectiveness of ManiGraph in different analytical cases, including generic machine learning using 3D toy data illustrations and fashion-MNIST, a computational biology study using a single-cell RNA sequencing dataset, and a deep learning-enabled colorectal cancer study with histopathology-MNIST.
Collapse
Affiliation(s)
- Junhan Zhao
- Harvard Medical School, Boston, 02114, MA, USA; Harvard T.H.Chan School of Public Health, Boston, 02114, MA, USA; Purdue University, West Lafayette, 47907, IN, USA.
| | - Xiang Liu
- Purdue University, West Lafayette, 47907, IN, USA; Indiana University School of Medicine, Indianapolis, 46202, IN, USA.
| | - Hongping Tang
- Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, 518048, China.
| | - Xiyue Wang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Sen Yang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Donfang Liu
- Rochester Institute of Technology, Rochester, 14623, NY, USA.
| | - Yijiang Chen
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | | |
Collapse
|
3
|
Hilasaca GM, Marcilio-Jr WE, Eler DM, Martins RM, Paulovich FV. A Grid-Based Method for Removing Overlaps of Dimensionality Reduction Scatterplot Layouts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5733-5749. [PMID: 37647195 DOI: 10.1109/tvcg.2023.3309941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts that lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This article presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots' visual characteristics and the aesthetics of the final results.
Collapse
|
4
|
Piccolotto N, Bogl M, Muehlmann C, Nordhausen K, Filzmoser P, Schmidt J, Miksch S. Data Type Agnostic Visual Sensitivity Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; PP:1-11. [PMID: 37922175 DOI: 10.1109/tvcg.2023.3327203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
Modern science and industry rely on computational models for simulation, prediction, and data analysis. Spatial blind source separation (SBSS) is a model used to analyze spatial data. Designed explicitly for spatial data analysis, it is superior to popular non-spatial methods, like PCA. However, a challenge to its practical use is setting two complex tuning parameters, which requires parameter space analysis. In this paper, we focus on sensitivity analysis (SA). SBSS parameters and outputs are spatial data, which makes SA difficult as few SA approaches in the literature assume such complex data on both sides of the model. Based on the requirements in our design study with statistics experts, we developed a visual analytics prototype for data type agnostic visual sensitivity analysis that fits SBSS and other contexts. The main advantage of our approach is that it requires only dissimilarity measures for parameter settings and outputs (Fig. 1). We evaluated the prototype heuristically with visualization experts and through interviews with two SBSS experts. In addition, we show the transferability of our approach by applying it to microclimate simulations. Study participants could confirm suspected and known parameter-output relations, find surprising associations, and identify parameter subspaces to examine in the future. During our design study and evaluation, we identified challenging future research opportunities.
Collapse
|
5
|
Eckelt K, Hinterreiter A, Adelberger P, Walchshofer C, Dhanoa V, Humer C, Heckmann M, Steinparz C, Streit M. Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3312-3326. [PMID: 35254984 DOI: 10.1109/tvcg.2022.3156760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this work, we propose an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups are accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.
Collapse
|
6
|
Younesy H, Pober J, Möller T, Karimi MM. ModEx: a general purpose computer model exploration system. FRONTIERS IN BIOINFORMATICS 2023; 3:1153800. [PMID: 37304402 PMCID: PMC10249055 DOI: 10.3389/fbinf.2023.1153800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/09/2023] [Indexed: 06/13/2023] Open
Abstract
We present a general purpose visual analysis system that can be used for exploring parameters of a variety of computer models. Our proposed system offers key components of a visual parameter analysis framework including parameter sampling, deriving output summaries, and an exploration interface. It also provides an API for rapid development of parameter space exploration solutions as well as the flexibility to support custom workflows for different application domains. We evaluate the effectiveness of our system by demonstrating it in three domains: data mining, machine learning and specific application in bioinformatics.
Collapse
Affiliation(s)
- Hamid Younesy
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | | | - Torsten Möller
- Research Network Data Science and Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Mohammad M. Karimi
- Comprehensive Cancer Centre, School of Cancer and Pharmaceutical Sciences, Faculty of Life Sciences and Medicine, King's College London, London, United Kingdom
| |
Collapse
|
7
|
Hier DB, Yelugam R, Carrithers MD, Wunsch DC. The visualization of Orphadata neurology phenotypes. Front Digit Health 2023; 5:1064936. [PMID: 36778102 PMCID: PMC9911440 DOI: 10.3389/fdgth.2023.1064936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 01/10/2023] [Indexed: 01/28/2023] Open
Abstract
Disease phenotypes are characterized by signs (what a physician observes during the examination of a patient) and symptoms (the complaints of a patient to a physician). Large repositories of disease phenotypes are accessible through the Online Mendelian Inheritance of Man, Human Phenotype Ontology, and Orphadata initiatives. Many of the diseases in these datasets are neurologic. For each repository, the phenotype of neurologic disease is represented as a list of concepts of variable length where the concepts are selected from a restricted ontology. Visualizations of these concept lists are not provided. We address this limitation by using subsumption to reduce the number of descriptive features from 2,946 classes into thirty superclasses. Phenotype feature lists of variable lengths were converted into fixed-length vectors. Phenotype vectors were aggregated into matrices and visualized as heat maps that allowed side-by-side disease comparisons. Individual diseases (representing a row in the matrix) were visualized as word clouds. We illustrate the utility of this approach by visualizing the neuro-phenotypes of 32 dystonic diseases from Orphadata. Subsumption can collapse phenotype features into superclasses, phenotype lists can be vectorized, and phenotypes vectors can be visualized as heat maps and word clouds.
Collapse
Affiliation(s)
- Daniel B Hier
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States.,Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Raghu Yelugam
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
| | - Michael D Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C Wunsch
- National Institute of Diabetes and Digestive and Kidney Diseases, Liver Diseases Branch, Bethesda, MD, United States
| |
Collapse
|
8
|
Guo Y, Guo S, Jin Z, Kaul S, Gotz D, Cao N. Survey on Visual Analysis of Event Sequence Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:5091-5112. [PMID: 34314358 DOI: 10.1109/tvcg.2021.3100413] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.
Collapse
|
9
|
Cheng F, Liu D, Du F, Lin Y, Zytek A, Li H, Qu H, Veeramachaneni K. VBridge: Connecting the Dots Between Features and Data to Explain Healthcare Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:378-388. [PMID: 34596543 DOI: 10.1109/tvcg.2021.3114836] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is increasingly applied to Electronic Health Records (EHRs) to solve clinical prediction tasks. Although many ML models perform promisingly, issues with model transparency and interpretability limit their adoption in clinical practice. Directly using existing explainable ML techniques in clinical settings can be challenging. Through literature surveys and collaborations with six clinicians with an average of 17 years of clinical experience, we identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. Following an iterative design process, we further designed and developed VBridge, a visual analytics tool that seamlessly incorporates ML explanations into clinicians' decision-making workflow. The system includes a novel hierarchical display of contribution-based feature explanations and enriched interactions that connect the dots between ML features, explanations, and data. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians, showing that visually associating model explanations with patients' situational records can help clinicians better interpret and use model predictions when making clinician decisions. We further derived a list of design implications for developing future explainable ML tools to support clinical decision-making.
Collapse
|
10
|
Das S, Saket B, Kwon BC, Endert A. Geono-Cluster: Interactive Visual Cluster Analysis for Biologists. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:4401-4412. [PMID: 32746262 DOI: 10.1109/tvcg.2020.3002166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Biologists often perform clustering analysis to derive meaningful patterns, relationships, and structures from data instances and attributes. Though clustering plays a pivotal role in biologists' data exploration, it takes non-trivial efforts for biologists to find the best grouping in their data using existing tools. Visual cluster analysis is currently performed either programmatically or through menus and dialogues in many tools, which require parameter adjustments over several steps of trial-and-error. In this article, we introduce Geono-Cluster, a novel visual analysis tool designed to support cluster analysis for biologists who do not have formal data science training. Geono-Cluster enables biologists to apply their domain expertise into clustering results by visually demonstrating how their expected clustering outputs should look like with a small sample of data instances. The system then predicts users' intentions and generates potential clustering results. Our study follows the design study protocol to derive biologists' tasks and requirements, design the system, and evaluate the system with experts on their own dataset. Results of our study with six biologists provide initial evidence that Geono-Cluster enables biologists to create, refine, and evaluate clustering results to effectively analyze their data and gain data-driven insights. At the end, we discuss lessons learned and implications of our study.
Collapse
|
11
|
Sun L, Zhang X, Pan X, Liu Y, Yu W, Xu T, Liu F, Chen W, Wang Y, Su W, Zhou Z. Visual analytics of genealogy with attribute-enhanced topological clustering. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00802-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Yang W, Wang X, Lu J, Dou W, Liu S. Interactive Steering of Hierarchical Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3953-3967. [PMID: 32746252 DOI: 10.1109/tvcg.2020.2995100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven), and 2) enabling the interactive steering of clustering through a visual interface (user-driven). Our method first maps each data item to the most relevant items in a knowledge base. An initial constraint tree is then extracted using the ant colony optimization algorithm. The algorithm balances the tree width and depth and covers the data items with high confidence. Given the constraint tree, the data items are hierarchically clustered using evolutionary Bayesian rose tree. To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies and interactively improve them. The quantitative evaluation and case study demonstrate that the proposed approach facilitates the building of customized clustering trees in an efficient and effective manner.
Collapse
|
13
|
Knittel J, Lalama A, Koch S, Ertl T. Visual Neural Decomposition to Explain Multivariate Data Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1374-1384. [PMID: 33048724 DOI: 10.1109/tvcg.2020.3030420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.
Collapse
|
14
|
Lin Y, Wong K, Wang Y, Zhang R, Dong B, Qu H, Zheng Q. TaxThemis: Interactive Mining and Exploration of Suspicious Tax Evasion Groups. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:849-859. [PMID: 33048699 DOI: 10.1109/tvcg.2020.3030370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Tax evasion is a serious economic problem for many countries, as it can undermine the government's tax system and lead to an unfair business competition environment. Recent research has applied data analytics techniques to analyze and detect tax evasion behaviors of individual taxpayers. However, they have failed to support the analysis and exploration of the related party transaction tax evasion (RPTTE) behaviors (e.g., transfer pricing), where a group of taxpayers is involved. In this paper, we present TaxThemis, an interactive visual analytics system to help tax officers mine and explore suspicious tax evasion groups through analyzing heterogeneous tax-related data. A taxpayer network is constructed and fused with the respective trade network to detect suspicious RPTTE groups. Rich visualizations are designed to facilitate the exploration and investigation of suspicious transactions between related taxpayers with profit and topological data analysis. Specifically, we propose a calendar heatmap with a carefully-designed encoding scheme to intuitively show the evidence of transferring revenue through related party transactions. We demonstrate the usefulness and effectiveness of TaxThemis through two case studies on real-world tax-related data and interviews with domain experts.
Collapse
|
15
|
Ma Y, Fan A, He J, Nelakurthi AR, Maciejewski R. A Visual Analytics Framework for Explaining and Diagnosing Transfer Learning Processes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1385-1395. [PMID: 33035164 DOI: 10.1109/tvcg.2020.3028888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many statistical learning models hold an assumption that the training data and the future unlabeled data are drawn from the same distribution. However, this assumption is difficult to fulfill in real-world scenarios and creates barriers in reusing existing labels from similar application domains. Transfer Learning is intended to relax this assumption by modeling relationships between domains, and is often applied in deep learning applications to reduce the demand for labeled data and training time. Despite recent advances in exploring deep learning models with visual analytics tools, little work has explored the issue of explaining and diagnosing the knowledge transfer process between deep learning models. In this paper, we present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks. Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks. Based on a comprehensive requirement and task analysis, we employ descriptive visualization with performance measures and detailed inspections of model behaviors from the statistical, instance, feature, and model structure levels. We demonstrate our framework through two case studies on image classification by fine-tuning AlexNets to illustrate how analysts can utilize our framework.
Collapse
|
16
|
Ostropolets A, Zhang L, Hripcsak G. A scoping review of clinical decision support tools that generate new knowledge to support decision making in real time. J Am Med Inform Assoc 2020; 27:1968-1976. [PMID: 33120430 PMCID: PMC7824048 DOI: 10.1093/jamia/ocaa200] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 07/24/2020] [Accepted: 08/04/2020] [Indexed: 12/19/2022] Open
Abstract
OBJECTIVE A growing body of observational data enabled its secondary use to facilitate clinical care for complex cases not covered by the existing evidence. We conducted a scoping review to characterize clinical decision support systems (CDSSs) that generate new knowledge to provide guidance for such cases in real time. MATERIALS AND METHODS PubMed, Embase, ProQuest, and IEEE Xplore were searched up to May 2020. The abstracts were screened by 2 reviewers. Full texts of the relevant articles were reviewed by the first author and approved by the second reviewer, accompanied by the screening of articles' references. The details of design, implementation and evaluation of included CDSSs were extracted. RESULTS Our search returned 3427 articles, 53 of which describing 25 CDSSs were selected. We identified 8 expert-based and 17 data-driven tools. Sixteen (64%) tools were developed in the United States, with the others mostly in Europe. Most of the tools (n = 16, 64%) were implemented in 1 site, with only 5 being actively used in clinical practice. Patient or quality outcomes were assessed for 3 (18%) CDSSs, 4 (16%) underwent user acceptance or usage testing and 7 (28%) functional testing. CONCLUSIONS We found a number of CDSSs that generate new knowledge, although only 1 addressed confounding and bias. Overall, the tools lacked demonstration of their utility. Improvement in clinical and quality outcomes were shown only for a few CDSSs, while the benefits of the others remain unclear. This review suggests a need for a further testing of such CDSSs and, if appropriate, their dissemination.
Collapse
Affiliation(s)
- Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, USA
| | - Linying Zhang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, USA
- NewYork-Presbyterian Hospital, New York, New York, USA
| |
Collapse
|
17
|
Yang D, Yu S, Hao Y. Visual Analysis of Sorting and Classification of Multidimensional Data. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s021800142155003x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
An important work of data analysis is to identify correlation structures and classify the data in unlabeled high-dimensional data, which usually requires iterative experiments on clustering parameters, attribute weights and instances. For a large dataset, the number of clusters may be huge, and it is a great challenge to explore in this huge space. People usually have a more comprehensive understanding of some data. For example, they think that data A is better than data B, but they do not know which attributes are important. Therefore, a powerful interactive analysis tool can help people greatly improve the effectiveness of exploratory clustering analysis. This paper provides a visual analysis method for sorting and classifying multivariate data. It can determine the weight of each attribute through user’s interaction, thus, generating sorting, and then complete classification according to sorting results. Through visual display, users can understand the characteristics of data as well as category characteristics intuitively and quickly, and it helps users improve sorting and classification results.
Collapse
Affiliation(s)
- Dongsheng Yang
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
| | - Shidong Yu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
- Department of Electrical Engineering, Yingkou Institute of Technology, Yingkou 115014, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Ying Hao
- Department of Electrical Engineering, Yingkou Institute of Technology, Yingkou 115014, P. R. China
- Department of Information Science, Dalian Maritime University, Dalian 116026, P. R. China
| |
Collapse
|
18
|
Interactive clustering: a scoping review. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09913-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
19
|
Feller DJ, Burgermaster M, Levine ME, Smaldone A, Davidson PG, Albers DJ, Mamykina L. A visual analytics approach for pattern-recognition in patient-generated data. J Am Med Inform Assoc 2019; 25:1366-1374. [PMID: 29905826 PMCID: PMC6188507 DOI: 10.1093/jamia/ocy054] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 04/18/2018] [Indexed: 11/30/2022] Open
Abstract
Objective To develop and test a visual analytics tool to help clinicians identify systematic and clinically meaningful patterns in patient-generated data (PGD) while decreasing perceived information overload. Methods Participatory design was used to develop Glucolyzer, an interactive tool featuring hierarchical clustering and a heatmap visualization to help registered dietitians (RDs) identify associative patterns between blood glucose levels and per-meal macronutrient composition for individuals with type 2 diabetes (T2DM). Ten RDs participated in a within-subjects experiment to compare Glucolyzer to a static logbook format. For each representation, participants had 25 minutes to examine 1 month of diabetes self-monitoring data captured by an individual with T2DM and identify clinically meaningful patterns. We compared the quality and accuracy of the observations generated using each representation. Results Participants generated 50% more observations when using Glucolyzer (98) than when using the logbook format (64) without any loss in accuracy (69% accuracy vs 62%, respectively, p = .17). Participants identified more observations that included ingredients other than carbohydrates using Glucolyzer (36% vs 16%, p = .027). Fewer RDs reported feelings of information overload using Glucolyzer compared to the logbook format. Study participants displayed variable acceptance of hierarchical clustering. Conclusions Visual analytics have the potential to mitigate provider concerns about the volume of self-monitoring data. Glucolyzer helped dietitians identify meaningful patterns in self-monitoring data without incurring perceived information overload. Future studies should assess whether similar tools can support clinicians in personalizing behavioral interventions that improve patient outcomes.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Matthew E Levine
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Arlene Smaldone
- Columbia University School of Nursing and College of Dental Medicine, Columbia University Medical Center, New York, NY, USA
| | | | - David J Albers
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Lena Mamykina
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
20
|
Wu H, Shi D, Chen N, Shi Y, Jin Z, Cao N. VisAct: a visualization design system based on semantic actions. J Vis (Tokyo) 2019. [DOI: 10.1007/s12650-019-00617-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
21
|
VINCENT: A visual analytics system for investigating the online vaccine debate. Online J Public Health Inform 2019; 11:e5. [PMID: 31632599 DOI: 10.5210/ojphi.v11i2.10114] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
This paper reports and describes VINCENT, a visual analytics system that is designed to help public health stakeholders (i.e., users) make sense of data from websites involved in the online debate about vaccines. VINCENT allows users to explore visualizations of data from a group of 37 vaccine-focused websites. These websites differ in their position on vaccines, topics of focus about vaccines, geographic location, and sentiment towards the efficacy and morality of vaccines, specific and general ones. By integrating webometrics, natural language processing of website text, data visualization, and human-data interaction, VINCENT helps users explore complex data that would be difficult to understand, and, if at all possible, to analyze without the aid of computational tools. The objectives of this paper are to explore A) the feasibility of developing a visual analytics system that integrates webometrics, natural language processing of website text, data visualization, and human-data interaction in a seamless manner; B) how a visual analytics system can help with the investigation of the online vaccine debate; and C) what needs to be taken into consideration when developing such a system. This paper demonstrates that visual analytics systems can integrate different computational techniques; that such systems can help with the exploration of online public health debates that are distributed across a set of websites; and that care should go into the design of the different components of such systems.
Collapse
|
22
|
Behrisch M, Streeb D, Stoffel F, Seebacher D, Matejek B, Weber SH, Mittelstadt S, Pfister H, Keim D. Commercial Visual Analytics Systems-Advances in the Big Data Analytics Field. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:3011-3031. [PMID: 30059307 DOI: 10.1109/tvcg.2018.2859973] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Five years after the first state-of-the-art report on Commercial Visual Analytics Systems we present a reevaluation of the Big Data Analytics field. We build on the success of the 2012 survey, which was influential even beyond the boundaries of the InfoVis and Visual Analytics (VA) community. While the field has matured significantly since the original survey, we find that innovation and research-driven development are increasingly sacrificed to satisfy a wide range of user groups. We evaluate new product versions on established evaluation criteria, such as available features, performance, and usability, to extend on and assure comparability with the previous survey. We also investigate previously unavailable products to paint a more complete picture of the commercial VA landscape. Furthermore, we introduce novel measures, like suitability for specific user groups and the ability to handle complex data types, and undertake a new case study to highlight innovative features. We explore the achievements in the commercial sector in addressing VA challenges and propose novel developments that should be on systems' roadmaps in the coming years.
Collapse
|
23
|
Nonato LG, Aupetit M. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2650-2673. [PMID: 29994258 DOI: 10.1109/tvcg.2018.2846735] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analysis of multidimensional data requires expressive and effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots whose visual patterns reflect some notion of similarity in the original data. However, MDP come with distortions that make these visual patterns not trustworthy, hindering users to infer actual data characteristics. Moreover, the patterns present in the scatter plots might not be enough to allow a clear understanding of multidimensional data, motivating the development of layout enrichment methodologies to operate together with MDP. This survey attempts to cover the main aspects of MDP as a visualization and visual analytic tool. It provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors. The survey also approaches the different types of distortions that can result from MDP mappings and it overviews existing mechanisms to quantitatively evaluate such distortions. A qualitative analysis of the impact of distortions on the different analytic tasks performed by users when exploring multidimensional data through MDP is also presented. Guidelines for choosing the best MDP for an intended task are also provided as a result of this analysis. Finally, layout enrichment schemes to debunk MDP distortions and/or reveal relevant information not directly inferable from the scatter plot are reviewed and discussed in the light of new taxonomies. We conclude the survey providing future research axes to fill discovered gaps in this domain.
Collapse
|
24
|
Stopar L, Skraba P, Grobelnik M, Mladenic D. StreamStory: Exploring Multivariate Time Series on Multiple Scales. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:1788-1802. [PMID: 29993637 DOI: 10.1109/tvcg.2018.2825424] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents an approach for the interactive visualization, exploration and interpretation of large multivariate time series. Interesting patterns in such datasets usually appear as periodic or recurrent behavior often caused by the interaction between variables. To identify such patterns, we summarize the data as conceptual states, modeling temporal dynamics as transitions between the states. This representation can visualize large datasets with potentially billions of examples. We extend the representation to multiple spatial granularities allowing the user to find patterns on multiple scales. The result is an interactive web-based tool called StreamStory. StreamStory couples the abstraction with several tools that map the abstractions back to domain-specific concepts using techniques from statistics and machine learning. It is aimed at users who are not experts in data analytics, minimizing the number of parameters to configure out-of-the-box. We use three real-world datasets to demonstrate how StreamStory can be used to perform three main visual analytics tasks: identify the main states of a complex system and map them back to data-specific concepts, find high-level and long-term periodic behavior and traverse the scales to identify which scales exhibit interesting phenomena. We find and interpret several known, as well as previously unknown patterns in these datasets.
Collapse
|
25
|
Bernard J, Sessler D, Kohlhammer J, Ruddle RA. Using Dashboard Networks to Visualize Multiple Patient Histories: A Design Study on Post-Operative Prostate Cancer. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:1615-1628. [PMID: 29994364 DOI: 10.1109/tvcg.2018.2803829] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this design study, we present a visualization technique that segments patients' histories instead of treating them as raw event sequences, aggregates the segments using criteria such as the whole history or treatment combinations, and then visualizes the aggregated segments as static dashboards that are arranged in a dashboard network to show longitudinal changes. The static dashboards were developed in nine iterations, to show 15 important attributes from the patients' histories. The final design was evaluated with five non-experts, five visualization experts and four medical experts, who successfully used it to gain an overview of a 2,000 patient dataset, and to make observations about longitudinal changes and differences between two cohorts. The research represents a step-change in the detail of large-scale data that may be successfully visualized using dashboards, and provides guidance about how the approach may be generalized.
Collapse
|
26
|
Wu Y, Chen Z, Sun G, Xie X, Cao N, Liu S, Cui W. StreamExplorer: A Multi-Stage System for Visually Exploring Events in Social Streams. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2758-2772. [PMID: 29053452 DOI: 10.1109/tvcg.2017.2764459] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Analyzing social streams is important for many applications, such as crisis management. However, the considerable diversity, increasing volume, and high dynamics of social streams of large events continue to be significant challenges that must be overcome to ensure effective exploration. We propose a novel framework by which to handle complex social streams on a budget PC. This framework features two components: 1) an online method to detect important time periods (i.e., subevents), and 2) a tailored GPU-assisted Self-Organizing Map (SOM) method, which clusters the tweets of subevents stably and efficiently. Based on the framework, we present StreamExplorer to facilitate the visual analysis, tracking, and comparison of a social stream at three levels. At a macroscopic level, StreamExplorer uses a new glyph-based timeline visualization, which presents a quick multi-faceted overview of the ebb and flow of a social stream. At a mesoscopic level, a map visualization is employed to visually summarize the social stream from either a topical or geographical aspect. At a microscopic level, users can employ interactive lenses to visually examine and explore the social stream from different perspectives. Two case studies and a task-based evaluation are used to demonstrate the effectiveness and usefulness of StreamExplorer.Analyzing social streams is important for many applications, such as crisis management. However, the considerable diversity, increasing volume, and high dynamics of social streams of large events continue to be significant challenges that must be overcome to ensure effective exploration. We propose a novel framework by which to handle complex social streams on a budget PC. This framework features two components: 1) an online method to detect important time periods (i.e., subevents), and 2) a tailored GPU-assisted Self-Organizing Map (SOM) method, which clusters the tweets of subevents stably and efficiently. Based on the framework, we present StreamExplorer to facilitate the visual analysis, tracking, and comparison of a social stream at three levels. At a macroscopic level, StreamExplorer uses a new glyph-based timeline visualization, which presents a quick multi-faceted overview of the ebb and flow of a social stream. At a mesoscopic level, a map visualization is employed to visually summarize the social stream from either a topical or geographical aspect. At a microscopic level, users can employ interactive lenses to visually examine and explore the social stream from different perspectives. Two case studies and a task-based evaluation are used to demonstrate the effectiveness and usefulness of StreamExplorer.
Collapse
Affiliation(s)
- Yingcai Wu
- Computer Science, Zhejiang University, 12377 Hangzhou, Beijing China 310058 (e-mail: )
| | - Zhutian Chen
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, 58207 Kowloon, Hong Kong Hong Kong (e-mail: )
| | - Guodao Sun
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang China 310023 (e-mail: )
| | - Xiao Xie
- State Key Lab of CAD&CG, Zhejiang University, 12377 Hangzhou, Zhejiang China (e-mail: )
| | - Nan Cao
- College of Design and Innovation, Tongji University, 12476 Shanghai, Shanghai China (e-mail: )
| | - Shixia Liu
- School of Sotfware, Tsinghua University, Beijing, Beijing China (e-mail: )
| | - Weiwei Cui
- Internet Graphics, Microsoft Research Asia, Beijing, Beijing China (e-mail: )
| |
Collapse
|
27
|
Cavallo M, Demiralp C. Clustrophile 2: Guided Visual Clustering Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:267-276. [PMID: 30130194 DOI: 10.1109/tvcg.2018.2864477] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The number of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large clustering space so as to improve the effectiveness of exploratory clustering analysis. We introduce Clustrophile 2, a new interactive tool for guided clustering analysis. Clustrophile 2 guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, Clustrophile 2 contributes a novel feature, the Clustering Tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate Clustrophile 2 through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson's disease patients. Results suggest that Clustrophile 2 improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts.
Collapse
|
28
|
Law PM, Basole RC, Wu Y. Duet: Helping Data Analysis Novices Conduct Pairwise Comparisons by Minimal Specification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:427-437. [PMID: 30130204 DOI: 10.1109/tvcg.2018.2864526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Data analysis novices often encounter barriers in executing low-level operations for pairwise comparisons. They may also run into barriers in interpreting the artifacts (e.g., visualizations) created as a result of the operations. We developed Duet, a visual analysis system designed to help data analysis novices conduct pairwise comparisons by addressing execution and interpretation barriers. To reduce the barriers in executing low-level operations during pairwise comparison, Duet employs minimal specification: when one object group (i.e. a group of records in a data table) is specified, Duet recommends object groups that are similar to or different from the specified one; when two object groups are specified, Duet recommends similar and different attributes between them. To lower the barriers in interpreting its recommendations, Duet explains the recommended groups and attributes using both visualizations and textual descriptions. We conducted a qualitative evaluation with eight participants to understand the effectiveness of Duet. The results suggest that minimal specification is easy to use and Duet's explanations are helpful for interpreting the recommendations despite some usability issues.
Collapse
|
29
|
Sacha D, Kraus M, Bernard J, Behrisch M, Schreck T, Asano Y, Keim DA. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:120-130. [PMID: 28866559 DOI: 10.1109/tvcg.2017.2744805] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
Collapse
|
30
|
Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A. Clustervision: Visual Supervision of Unsupervised Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:142-151. [PMID: 28866567 DOI: 10.1109/tvcg.2017.2745085] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Collapse
|
31
|
Guo S, Xu K, Zhao R, Gotz D, Zha H, Cao N. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:56-65. [PMID: 28866586 DOI: 10.1109/tvcg.2017.2745320] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Collapse
|
32
|
Gotz D, Sun S, Cao N, Kundu R, Meyer AM. Adaptive Contextualization Methods for Combating Selection Bias during High-Dimensional Visualization. ACM T INTERACT INTEL 2017. [DOI: 10.1145/3009973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Large and high-dimensional real-world datasets are being gathered across a wide range of application disciplines to enable data-driven decision making. Interactive data visualization can play a critical role in allowing domain experts to select and analyze data from these large collections. However, there is a critical mismatch between the very large number of dimensions in complex real-world datasets and the much smaller number of dimensions that can be concurrently visualized using modern techniques. This gap in dimensionality can result in high levels of selection bias that go unnoticed by users. The bias can in turn threaten the very validity of any subsequent insights. This article describes Adaptive Contextualization (AC), a novel approach to interactive visual data selection that is specifically designed to combat the invisible introduction of selection bias. The AC approach (1) monitors and models a user’s visual data selection activity, (2) computes metrics over that model to quantify the amount of selection bias after each step, (3) visualizes the metric results, and (4) provides interactive tools that help users assess and avoid bias-related problems. This article expands on an earlier article presented at ACM IUI 2016 [16] by providing a more detailed review of the AC methodology and additional evaluation results.
Collapse
Affiliation(s)
- David Gotz
- University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Shun Sun
- University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Nan Cao
- Tong Ji University, Shanghai, P.R. China
| | - Rita Kundu
- University of North Carolina at Chapel Hill, Chapel Hill, NC
| | | |
Collapse
|
33
|
|
34
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|
35
|
Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W. NameClarifier: A Visual Analytics System for Author Name Disambiguation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:141-150. [PMID: 27514051 DOI: 10.1109/tvcg.2016.2598465] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we present a novel visual analytics system called NameClarifier to interactively disambiguate author names in publications by keeping humans in the loop. Specifically, NameClarifier quantifies and visualizes the similarities between ambiguous names and those that have been confirmed in digital libraries. The similarities are calculated using three key factors, namely, co-authorships, publication venues, and temporal information. Our system estimates all possible allocations, and then provides visual cues to users to help them validate every ambiguous case. By looping users in the disambiguation process, our system can achieve more reliable results than general data mining models for highly ambiguous cases. In addition, once an ambiguous case is resolved, the result is instantly added back to our system and serves as additional cues for all the remaining unidentified names. In this way, we open up the black box in traditional disambiguation processes, and help intuitively and comprehensively explain why the corresponding classifications should hold. We conducted two use cases and an expert review to demonstrate the effectiveness of NameClarifier.
Collapse
|
36
|
Pu J, Teng Z, Gong R, Wen C, Xu Y. Sci-Fin: Visual Mining Spatial and Temporal Behavior Features from Social Media. SENSORS 2016; 16:s16122194. [PMID: 27999398 PMCID: PMC5191173 DOI: 10.3390/s16122194] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 12/05/2016] [Accepted: 12/12/2016] [Indexed: 11/19/2022]
Abstract
Check-in records are usually available in social services, which offer us the opportunity to capture and analyze users’ spatial and temporal behaviors. Mining such behavior features is essential to social analysis and business intelligence. However, the complexity and incompleteness of check-in records bring challenges to achieve such a task. Different from the previous work on social behavior analysis, in this paper, we present a visual analytics system, Social Check-in Fingerprinting (Sci-Fin), to facilitate the analysis and visualization of social check-in data. We focus on three major components of user check-in data: location, activity, and profile. Visual fingerprints for location, activity, and profile are designed to intuitively represent the high-dimensional attributes. To visually mine and demonstrate the behavior features, we integrate WorldMapper and Voronoi Treemap into our glyph-like designs. Such visual fingerprint designs offer us the opportunity to summarize the interesting features and patterns from different check-in locations, activities and users (groups). We demonstrate the effectiveness and usability of our system by conducting extensive case studies on real check-in data collected from a popular microblogging service. Interesting findings are reported and discussed at last.
Collapse
Affiliation(s)
- Jiansu Pu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Zhiyao Teng
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Rui Gong
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Changjiang Wen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Yang Xu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
37
|
Ioannidis D, Tropios P, Krinidis S, Stavropoulos G, Tzovaras D, Likothanasis S. Occupancy driven building performance assessment. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.jides.2016.10.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
38
|
Chen Q, Chen Y, Liu D, Shi C, Wu Y, Qu H. PeakVizor: Visual Analytics of Peaks in Video Clickstreams from Massive Open Online Courses. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:2315-2330. [PMID: 26661473 DOI: 10.1109/tvcg.2015.2505305] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Massive open online courses (MOOCs) aim to facilitate open-access and massive-participation education. These courses have attracted millions of learners recently. At present, most MOOC platforms record the web log data of learner interactions with course videos. Such large amounts of multivariate data pose a new challenge in terms of analyzing online learning behaviors. Previous studies have mainly focused on the aggregate behaviors of learners from a summative view; however, few attempts have been made to conduct a detailed analysis of such behaviors. To determine complex learning patterns in MOOC video interactions, this paper introduces a comprehensive visualization system called PeakVizor. This system enables course instructors and education experts to analyze the "peaks" or the video segments that generate numerous clickstreams. The system features three views at different levels: the overview with glyphs to display valuable statistics regarding the peaks detected; the flow view to present spatio-temporal information regarding the peaks; and the correlation view to show the correlation between different learner groups and the peaks. Case studies and interviews conducted with domain experts have demonstrated the usefulness and effectiveness of PeakVizor, and new findings about learning behaviors in MOOC platforms have been reported.
Collapse
|
39
|
Cao N, Lin YR, Gotz D. UnTangle Map: Visual Analysis of Probabilistic Multi-Label Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:1149-1163. [PMID: 26731458 DOI: 10.1109/tvcg.2015.2424878] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Data with multiple probabilistic labels are common in many situations. For example, a movie may be associated with multiple genres with different levels of confidence. Despite their ubiquity, the problem of visualizing probabilistic labels has not been adequately addressed. Existing approaches often either discard the probabilistic information, or map the data to a low-dimensional subspace where their associations with original labels are obscured. In this paper, we propose a novel visual technique, UnTangle Map, for visualizing probabilistic multi-labels. In our proposed visualization, data items are placed inside a web of connected triangles, with labels assigned to the triangle vertices such that nearby labels are more relevant to each other. The positions of the data items are determined based on the probabilistic associations between items and labels. UnTangle Map provides both (a) an automatic label placement algorithm, and (b) adaptive interactions that allow users to control the label positioning for different information needs. Our work makes a unique contribution by providing an effective way to investigate the relationship between data items and their probabilistic labels, as well as the relationships among labels. Our user study suggests that the visualization effectively helps users discover emergent patterns and compare the nuances of probabilistic information in the data labels.
Collapse
|