1
|
Bernard J, Barth CM, Cuba E, Meier A, Peiris Y, Shneiderman B. IVESA - Visual Analysis of Time-Stamped Event Sequences. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2235-2256. [PMID: 38587948 DOI: 10.1109/tvcg.2024.3382760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Time-stamped event sequences (TSEQs) are time-oriented data without value information, shifting the focus of users to the exploration of temporal event occurrences. TSEQs exist in application domains, such as sleeping behavior, earthquake aftershocks, and stock market crashes. Domain experts face four challenges, for which they could use interactive and visual data analysis methods. First, TSEQs can be large with respect to both the number of sequences and events, often leading to millions of events. Second, domain experts need validated metrics and features to identify interesting patterns. Third, after identifying interesting patterns, domain experts contextualize the patterns to foster sensemaking. Finally, domain experts seek to reduce data complexity by data simplification and machine learning support. We present IVESA, a visual analytics approach for TSEQs. It supports the analysis of TSEQs at the granularities of sequences and events, supported with metrics and feature analysis tools. IVESA has multiple linked views that support overview, sort+filter, comparison, details-on-demand, and metadata relation-seeking tasks, as well as data simplification through feature analysis, interactive clustering, filtering, and motif detection and simplification. We evaluated IVESA with three case studies and a user study with six domain experts working with six different datasets and applications. Results demonstrate the usability and generalizability of IVESA across applications and cases that had up to 1,000,000 events.
Collapse
|
2
|
Braun D, Chang R, Gleicher M, von Landesberger T. Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:787-797. [PMID: 39255144 DOI: 10.1109/tvcg.2024.3456305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of s lope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.
Collapse
|
3
|
Wyss A, Morgenshtern G, Hirsch-Husler A, Bernard J. DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing - A Design Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:54-64. [PMID: 39312428 DOI: 10.1109/tvcg.2024.3456329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In medical diagnostics of both early disease detection and routine patient care, particle-based contamination of in-vitro diagnostics consumables poses a significant threat to patients. Objective data-driven decision-making on the severity of contamination is key for reducing patient risk, while saving time and cost in quality assessment. Our collaborators introduced us to their quality control process, including particle data acquisition through image recognition, feature extraction, and attributes reflecting the production context of particles. Shortcomings in the current process are limitations in exploring thousands of images, data-driven decision making, and ineffective knowledge externalization. Following the design study methodology, our contributions are a characterization of the problem space and requirements, the development and validation of DaedalusData, a comprehensive discussion of our study's learnings, and a generalizable framework for knowledge externalization. DaedalusData is a visual analytics system that enables domain experts to explore particle contamination patterns, label particles in label alphabets, and externalize knowledge through semi-supervised label-informed data projections. The results of our case study and user study show high usability of DaedalusData and its efficient support of experts in generating comprehensive overviews of thousands of particles, labeling of large quantities of particles, and externalizing knowledge to augment the dataset further. Reflecting on our approach, we discuss insights on dataset augmentation via human knowledge externalization, and on the scalability and trade-offs that come with the adoption of this approach in practice.
Collapse
|
4
|
Lan J, Zhou Z, Xie X, Wu Y, Zhang H, Wu Y. MediVizor: Visual Mediation Analysis of Nominal Variables. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4853-4866. [PMID: 37276102 DOI: 10.1109/tvcg.2023.3282801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Mediation analysis is crucial for diagnosing indirect causal relations in many scientific fields. However, mediation analysis of nominal variables requires examining and comparing multiple total effects and their corresponding direct/indirect causal effects derived from mediation models. This process is tedious and challenging to achieve with classical analysis tools such as Excel tables. In this study, we worked closely with experts from two scientific domains to design MediVizor, a visualization system that enables experts to conduct visual mediation analysis of nominal variables. The visualization design allows users to browse and compare multiple total effects together with the direct/indirect effects that compose them. The design also allows users to examine to what extent the positive and negative direct/indirect effects contribute to and reduce the total effects, respectively. We conducted two case studies separately with the experts from the two domains, sports and communication science, and a user study with common users to evaluate the system and design. The positive feedback from experts and common users demonstrates the effectiveness and generalizability of the system.
Collapse
|
5
|
Wentzel A, Floricel C, Canahuate G, Naser MA, Mohamed AS, Fuller CD, van Dijk L, Marai GE. DASS Good: Explainable Data Mining of Spatial Cohort Data. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2023; 42:283-295. [PMID: 37854026 PMCID: PMC10583718 DOI: 10.1111/cgf.14830] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Developing applicable clinical machine learning models is a difficult task when the data includes spatial information, for example, radiation dose distributions across adjacent organs at risk. We describe the co-design of a modeling system, DASS, to support the hybrid human-machine development and validation of predictive models for estimating long-term toxicities related to radiotherapy doses in head and neck cancer patients. Developed in collaboration with domain experts in oncology and data mining, DASS incorporates human-in-the-loop visual steering, spatial data, and explainable AI to augment domain knowledge with automatic data mining. We demonstrate DASS with the development of two practical clinical stratification models and report feedback from domain experts. Finally, we describe the design lessons learned from this collaborative experience.
Collapse
Affiliation(s)
- A Wentzel
- University of Illinois Chicago, Electronic Visualization Lab
| | - C Floricel
- University of Illinois Chicago, Electronic Visualization Lab
| | | | - M A Naser
- University of Texas MD Anderson Cancer Center
| | - A S Mohamed
- University of Texas MD Anderson Cancer Center
| | - C D Fuller
- University of Texas MD Anderson Cancer Center
| | - L van Dijk
- University of Texas MD Anderson Cancer Center
| | - G E Marai
- University of Illinois Chicago, Electronic Visualization Lab
| |
Collapse
|
6
|
Tyagi A, Xie C, Mueller K. NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:299-309. [PMID: 36166525 DOI: 10.1109/tvcg.2022.3209361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The success of DL can be attributed to hours of parameter and architecture tuning by human experts. Neural Architecture Search (NAS) techniques aim to solve this problem by automating the search procedure for DNN architectures making it possible for non-experts to work with DNNs. Specifically, One-shot NAS techniques have recently gained popularity as they are known to reduce the search time for NAS techniques. One-Shot NAS works by training a large template network through parameter sharing which includes all the candidate NNs. This is followed by applying a procedure to rank its components through evaluating the possible candidate architectures chosen randomly. However, as these search models become increasingly powerful and diverse, they become harder to understand. Consequently, even though the search results work well, it is hard to identify search biases and control the search progression, hence a need for explainability and human-in-the-loop (HIL) One-Shot NAS. To alleviate these problems, we present NAS-Navigator, a visual analytics (VA) system aiming to solve three problems with One-Shot NAS; explainability, HIL design, and performance improvements compared to existing state-of-the-art (SOTA) techniques. NAS-Navigator gives full control of NAS back in the hands of the users while still keeping the perks of automated search, thus assisting non-expert users. Analysts can use their domain knowledge aided by cues from the interface to guide the search. Evaluation results confirm the performance of our improved One-Shot NAS algorithm is comparable to other SOTA techniques. While adding Visual Analytics (VA) using NAS-Navigator shows further improvements in search time and performance. We designed our interface in collaboration with several deep learning researchers and evaluated NAS-Navigator through a control experiment and expert interviews.
Collapse
|
7
|
Dunne M, Mohammadi H, Challenor P, Borgo R, Porphyre T, Vernon I, Firat EE, Turkay C, Torsney-Weir T, Goldstein M, Reeve R, Fang H, Swallow B. Complex model calibration through emulation, a worked example for a stochastic epidemic model. Epidemics 2022; 39:100574. [PMID: 35617882 PMCID: PMC9109972 DOI: 10.1016/j.epidem.2022.100574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 04/22/2022] [Accepted: 04/29/2022] [Indexed: 12/03/2022] Open
Abstract
Uncertainty quantification is a formal paradigm of statistical estimation that aims to account for all uncertainties inherent in the modelling process of real-world complex systems. The methods are directly applicable to stochastic models in epidemiology, however they have thus far not been widely used in this context. In this paper, we provide a tutorial on uncertainty quantification of stochastic epidemic models, aiming to facilitate the use of the uncertainty quantification paradigm for practitioners with other complex stochastic simulators of applied systems. We provide a formal workflow including the important decisions and considerations that need to be taken, and illustrate the methods over a simple stochastic epidemic model of UK SARS-CoV-2 transmission and patient outcome. We also present new approaches to visualisation of outputs from sensitivity analyses and uncertainty quantification more generally in high input and/or output dimensions.
Collapse
Affiliation(s)
- Michael Dunne
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK
| | - Hossein Mohammadi
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK
| | - Peter Challenor
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK
| | - Rita Borgo
- Department of Informatics, King's College London, London, UK
| | - Thibaud Porphyre
- Laboratoire de Biométrie et Biologie Evolutive, VetAgro Sup, Marcy l'Etoile, France
| | - Ian Vernon
- Department of Mathematical Sciences, Durham University, Durham, UK
| | - Elif E Firat
- Department of Computer Science, University of Nottingham, Nottingham, UK
| | - Cagatay Turkay
- Centre for Interdisciplinary Methodologies, University of Warwick, Coventry, UK
| | - Thomas Torsney-Weir
- VRVis Zentrum für Virtual Reality und Visualisierung Forschungs-GmbH, Vienna, Austria
| | | | - Richard Reeve
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Hui Fang
- Department of Computer Science, Loughborough University, Loughborough, UK
| | - Ben Swallow
- School of Mathematics and Statistics, University of Glasgow, Glasgow, UK.
| |
Collapse
|
8
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1773-1791. [PMID: 34990365 DOI: 10.1109/tvcg.2022.3141040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.
Collapse
|
9
|
|
10
|
Palenik J, Spengler T, Hauser H. IsoTrotter: Visually Guided Empirical Modelling of Atmospheric Convection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:775-784. [PMID: 33079665 DOI: 10.1109/tvcg.2020.3030389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Empirical models, fitted to data from observations, are often used in natural sciences to describe physical behaviour and support discoveries. However, with more complex models, the regression of parameters quickly becomes insufficient, requiring a visual parameter space analysis to understand and optimize the models. In this work, we present a design study for building a model describing atmospheric convection. We present a mixed-initiative approach to visually guided modelling, integrating an interactive visual parameter space analysis with partial automatic parameter optimization. Our approach includes a new, semi-automatic technique called IsoTrotting, where we optimize the procedure by navigating along isocontours of the model. We evaluate the model with unique observational data of atmospheric convection based on flight trajectories of paragliders.
Collapse
|
11
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1547-1557. [PMID: 33048687 DOI: 10.1109/tvcg.2020.3030352] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
Collapse
|
12
|
Knittel J, Lalama A, Koch S, Ertl T. Visual Neural Decomposition to Explain Multivariate Data Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1374-1384. [PMID: 33048724 DOI: 10.1109/tvcg.2020.3030420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.
Collapse
|
13
|
Visual performance improvement analytics of predictive model for unbalanced panel data. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-020-00716-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
14
|
Chishtie JA, Marchand JS, Turcotte LA, Bielska IA, Babineau J, Cepoiu-Martin M, Irvine M, Munce S, Abudiab S, Bjelica M, Hossain S, Imran M, Jeji T, Jaglal S. Visual Analytic Tools and Techniques in Population Health and Health Services Research: Scoping Review. J Med Internet Res 2020; 22:e17892. [PMID: 33270029 PMCID: PMC7716797 DOI: 10.2196/17892] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 07/01/2020] [Accepted: 09/24/2020] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Visual analytics (VA) promotes the understanding of data with visual, interactive techniques, using analytic and visual engines. The analytic engine includes automated techniques, whereas common visual outputs include flow maps and spatiotemporal hot spots. OBJECTIVE This scoping review aims to address a gap in the literature, with the specific objective to synthesize literature on the use of VA tools, techniques, and frameworks in interrelated health care areas of population health and health services research (HSR). METHODS Using the 2018 PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, the review focuses on peer-reviewed journal articles and full conference papers from 2005 to March 2019. Two researchers were involved at each step, and another researcher arbitrated disagreements. A comprehensive abstraction platform captured data from diverse bodies of the literature, primarily from the computer and health sciences. RESULTS After screening 11,310 articles, findings from 55 articles were synthesized under the major headings of visual and analytic engines, visual presentation characteristics, tools used and their capabilities, application to health care areas, data types and sources, VA frameworks, frameworks used for VA applications, availability and innovation, and co-design initiatives. We found extensive application of VA methods used in areas of epidemiology, surveillance and modeling, health services access, use, and cost analyses. All articles included a distinct analytic and visualization engine, with varying levels of detail provided. Most tools were prototypes, with 5 in use at the time of publication. Seven articles presented methodological frameworks. Toward consistent reporting, we present a checklist, with an expanded definition for VA applications in health care, to assist researchers in sharing research for greater replicability. We summarized the results in a Tableau dashboard. CONCLUSIONS With the increasing availability and generation of big health care data, VA is a fast-growing method applied to complex health care data. What makes VA innovative is its capability to process multiple, varied data sources to demonstrate trends and patterns for exploratory analysis, leading to knowledge generation and decision support. This is the first review to bridge a critical gap in the literature on VA methods applied to the areas of population health and HSR, which further indicates possible avenues for the adoption of these methods in the future. This review is especially important in the wake of COVID-19 surveillance and response initiatives, where many VA products have taken center stage. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/14019.
Collapse
Affiliation(s)
- Jawad Ahmed Chishtie
- Rehabilitation Sciences Institute, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Advanced Analytics, Canadian Institute for Health Information, Toronto, ON, Canada
- Ontario Neurotrauma Foundation, Toronto, ON, Canada
- Toronto Rehabilitation Institute, University Health Network, Toronto, ON, Canada
| | | | - Luke A Turcotte
- Advanced Analytics, Canadian Institute for Health Information, Toronto, ON, Canada
- School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada
| | - Iwona Anna Bielska
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
- Centre for Health Economics and Policy Analysis, McMaster University, Hamilton, ON, Canada
| | - Jessica Babineau
- Library & Information Services, University Health Network, Toronto, ON, Canada
| | - Monica Cepoiu-Martin
- Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Michael Irvine
- Department of Mathematics, University of British Columbia, Vancouver, BC, Canada
- British Columbia Centre for Disease Control, Vancouver, BC, Canada
| | - Sarah Munce
- Rehabilitation Sciences Institute, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Toronto Rehabilitation Institute, University Health Network, Toronto, ON, Canada
- Department of Occupational Science and Occupational Therapy, University of Toronto, Toronto, ON, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Sally Abudiab
- Rehabilitation Sciences Institute, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Marko Bjelica
- Rehabilitation Sciences Institute, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Toronto Rehabilitation Institute, University Health Network, Toronto, ON, Canada
| | - Saima Hossain
- Department of Physical Therapy, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Muhammad Imran
- Department of Epidemiology and Public Health, Health Services Academy, Islamabad, Pakistan
| | - Tara Jeji
- Ontario Neurotrauma Foundation, Toronto, ON, Canada
| | - Susan Jaglal
- Department of Physical Therapy, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
15
|
Abstract
Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.
Collapse
|
16
|
Li Q, Wu Z, Yi L, Seann K, Qu H, Ma X. WeSeer: Visual Analysis for Better Information Cascade Prediction of WeChat Articles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1399-1412. [PMID: 30176600 DOI: 10.1109/tvcg.2018.2867776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Social media, such as Facebook and WeChat, empowers millions of users to create, consume, and disseminate online information on an unprecedented scale. The abundant information on social media intensifies the competition of WeChat Public Official Articles (i.e., posts) for gaining user attention due to the zero-sum nature of attention. Therefore, only a small portion of information tends to become extremely popular while the rest remains unnoticed or quickly disappears. Such a typical "long-tail" phenomenon is very common in social media. Thus, recent years have witnessed a growing interest in predicting the future trend in the popularity of social media posts and understanding the factors that influence the popularity of the posts. Nevertheless, existing predictive models either rely on cumbersome feature engineering or sophisticated parameter tuning, which are difficult to understand and improve. In this paper, we study and enhance a point process-based model by incorporating visual reasoning to support communication between the users and the predictive model for a better prediction result. The proposed system supports users to uncover the working mechanism behind the model and improve the prediction accuracy accordingly based on the insights gained. We use realistic WeChat articles to demonstrate the effectiveness of the system and verify the improved model on a large scale of WeChat articles. We also elicit and summarize the feedback from WeChat domain experts.
Collapse
|
17
|
Krueger R, Beyer J, Jang WD, Kim NW, Sokolov A, Sorger PK, Pfister H. Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:227-237. [PMID: 31514138 PMCID: PMC7045445 DOI: 10.1109/tvcg.2019.2934547] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Facetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109 or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.
Collapse
|
18
|
Cashman D, Perer A, Chang R, Strobelt H. Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:863-873. [PMID: 31502978 DOI: 10.1109/tvcg.2019.2934261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The performance of deep learning models is dependent on the precise configuration of many layers and parameters. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space.
Collapse
|
19
|
Behrisch M, Streeb D, Stoffel F, Seebacher D, Matejek B, Weber SH, Mittelstadt S, Pfister H, Keim D. Commercial Visual Analytics Systems-Advances in the Big Data Analytics Field. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:3011-3031. [PMID: 30059307 DOI: 10.1109/tvcg.2018.2859973] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Five years after the first state-of-the-art report on Commercial Visual Analytics Systems we present a reevaluation of the Big Data Analytics field. We build on the success of the 2012 survey, which was influential even beyond the boundaries of the InfoVis and Visual Analytics (VA) community. While the field has matured significantly since the original survey, we find that innovation and research-driven development are increasingly sacrificed to satisfy a wide range of user groups. We evaluate new product versions on established evaluation criteria, such as available features, performance, and usability, to extend on and assure comparability with the previous survey. We also investigate previously unavailable products to paint a more complete picture of the commercial VA landscape. Furthermore, we introduce novel measures, like suitability for specific user groups and the ability to handle complex data types, and undertake a new case study to highlight innovative features. We explore the achievements in the commercial sector in addressing VA challenges and propose novel developments that should be on systems' roadmaps in the coming years.
Collapse
|
20
|
Bernard J, Sessler D, Kohlhammer J, Ruddle RA. Using Dashboard Networks to Visualize Multiple Patient Histories: A Design Study on Post-Operative Prostate Cancer. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:1615-1628. [PMID: 29994364 DOI: 10.1109/tvcg.2018.2803829] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this design study, we present a visualization technique that segments patients' histories instead of treating them as raw event sequences, aggregates the segments using criteria such as the whole history or treatment combinations, and then visualizes the aggregated segments as static dashboards that are arranged in a dashboard network to show longitudinal changes. The static dashboards were developed in nine iterations, to show 15 important attributes from the patients' histories. The final design was evaluated with five non-experts, five visualization experts and four medical experts, who successfully used it to gain an overview of a 2,000 patient dataset, and to make observations about longitudinal changes and differences between two cohorts. The research represents a step-change in the detail of large-scale data that may be successfully visualized using dashboards, and provides guidance about how the approach may be generalized.
Collapse
|
21
|
Zhang J, Wang Y, Molino P, Li L, Ebert DS. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:364-373. [PMID: 30130197 DOI: 10.1109/tvcg.2018.2864499] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models' outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied.
Collapse
|
22
|
Dingen D, Veer MV, Houthuizen P, Mestrom EHJ, Korsten EHHM, Bouwman ARA, Wijk JV. RegressionExplorer: Interactive Exploration of Logistic Regression Models with Subgroup Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:246-255. [PMID: 30222573 DOI: 10.1109/tvcg.2018.2865043] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present RegressionExplorer, a Visual Analytics tool for the interactive exploration of logistic regression models. Our application domain is Clinical Biostatistics, where models are derived from patient data with the aim to obtain clinically meaningful insights and consequences. Development and interpretation of a proper model requires domain expertise and insight into model characteristics. Because of time constraints, often a limited number of candidate models is evaluated. RegressionExplorer enables experts to quickly generate, evaluate, and compare many different models, taking the workflow for model development as starting point. Global patterns in parameter values of candidate models can be explored effectively. In addition, experts are enabled to compare candidate models across multiple subpopulations. The insights obtained can be used to formulate new hypotheses or to steer model development. The effectiveness of the tool is demonstrated for two uses cases: prediction of a cardiac conduction disorder in patients after receiving a heart valve implant and prediction of hypernatremia in critically ill patients.
Collapse
|
23
|
Turkay C, Slingsby A, Lahtinen K, Butt S, Dykes J. Supporting theoretically-grounded model building in the social sciences through interactive visualisation. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
24
|
What you see is what you can change: Human-centered machine learning by interactive visualization. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.105] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
25
|
|
26
|
Torsney-Weir T, Bergner S, Bingham D, Moller T. Predicting the Interactive Rendering Time Threshold of Gaussian Process Models With HyperSlice. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1111-1123. [PMID: 26915126 DOI: 10.1109/tvcg.2016.2532333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper we present a method for predicting the rendering time to display multi-dimensional data for the analysis of computer simulations using the HyperSlice [36] method with Gaussian process model reconstruction. Our method relies on a theoretical understanding of how the data points are drawn on slices and then fits the formula to a user's machine using practical experiments. We also describe the typical characteristics of data when analyzing deterministic computer simulations as described by the statistics community. We then show the advantage of carefully considering how many data points can be drawn in real time by proposing two approaches of how this predictive formula can be used in a real-world system.
Collapse
|
27
|
Turkay C, Kaya E, Balcisoy S, Hauser H. Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:131-140. [PMID: 27514056 DOI: 10.1109/tvcg.2016.2598470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In interactive data analysis processes, the dialogue between the human and the computer is the enabling mechanism that can lead to actionable observations about the phenomena being investigated. It is of paramount importance that this dialogue is not interrupted by slow computational mechanisms that do not consider any known temporal human-computer interaction characteristics that prioritize the perceptual and cognitive capabilities of the users. In cases where the analysis involves an integrated computational method, for instance to reduce the dimensionality of the data or to perform clustering, such non-optimal processes are often likely. To remedy this, progressive computations, where results are iteratively improved, are getting increasing interest in visual analytics. In this paper, we present techniques and design considerations to incorporate progressive methods within interactive analysis processes that involve high-dimensional data. We define methodologies to facilitate processes that adhere to the perceptual characteristics of users and describe how online algorithms can be incorporated within these. A set of design recommendations and according methods to support analysts in accomplishing high-dimensional data analysis tasks are then presented. Our arguments and decisions here are informed by observations gathered over a series of analysis sessions with analysts from finance. We document observations and recommendations from this study and present evidence on how our approach contribute to the efficiency and productivity of interactive visual analysis sessions involving high-dimensional data.
Collapse
|
28
|
|
29
|
Goodwin S, Dykes J, Slingsby A, Turkay C. Visualizing Multiple Variables Across Scale and Geography. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:599-608. [PMID: 26390471 DOI: 10.1109/tvcg.2015.2467199] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Comparing multiple variables to select those that effectively characterize complex entities is important in a wide variety of domains - geodemographics for example. Identifying variables that correlate is a common practice to remove redundancy, but correlation varies across space, with scale and over time, and the frequently used global statistics hide potentially important differentiating local variation. For more comprehensive and robust insights into multivariate relations, these local correlations need to be assessed through various means of defining locality. We explore the geography of this issue, and use novel interactive visualization to identify interdependencies in multivariate data sets to support geographically informed multivariate analysis. We offer terminology for considering scale and locality, visual techniques for establishing the effects of scale on correlation and a theoretical framework through which variation in geographic correlation with scale and locality are addressed explicitly. Prototype software demonstrates how these contributions act together. These techniques enable multiple variables and their geographic characteristics to be considered concurrently as we extend visual parameter space analysis (vPSA) to the spatial domain. We find variable correlations to be sensitive to scale and geography to varying degrees in the context of energy-based geodemographics. This sensitivity depends upon the calculation of locality as well as the geographical and statistical structure of the variable.
Collapse
|
30
|
Klemm P, Lawonn K, Glaßer S, Niemann U, Hegenscheid K, Völzke H, Preim B. 3D Regression Heat Map Analysis of Population Study Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:81-90. [PMID: 26529689 DOI: 10.1109/tvcg.2015.2468291] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Collapse
|
31
|
Sedlmair M, Heinzl C, Bruckner S, Piringer H, Möller T. Visual Parameter Space Analysis: A Conceptual Framework. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:2161-2170. [PMID: 26356930 DOI: 10.1109/tvcg.2014.2346321] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Various case studies in different application domains have shown the great potential of visual parameter space analysis to support validating and using simulation models. In order to guide and systematize research endeavors in this area, we provide a conceptual framework for visual parameter space analysis problems. The framework is based on our own experience and a structured analysis of the visualization literature. It contains three major components: (1) a data flow model that helps to abstractly describe visual parameter space analysis problems independent of their application domain; (2) a set of four navigation strategies of how parameter space analysis can be supported by visualization tools; and (3) a characterization of six analysis tasks. Based on our framework, we analyze and classify the current body of literature, and identify three open research gaps in visual parameter space analysis. The framework and its discussion are meant to support visualization designers and researchers in characterizing parameter space analysis problems and to guide their design and evaluation processes.
Collapse
|
32
|
Mühlbacher T, Piringer H, Gratzl S, Sedlmair M, Streit M. Opening the Black Box: Strategies for Increased User Involvement in Existing Algorithm Implementations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1643-1652. [PMID: 26356878 DOI: 10.1109/tvcg.2014.2346578] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
An increasing number of interactive visualization tools stress the integration with computational software like MATLAB and R to access a variety of proven algorithms. In many cases, however, the algorithms are used as black boxes that run to completion in isolation which contradicts the needs of interactive data exploration. This paper structures, formalizes, and discusses possibilities to enable user involvement in ongoing computations. Based on a structured characterization of needs regarding intermediate feedback and control, the main contribution is a formalization and comparison of strategies for achieving user involvement for algorithms with different characteristics. In the context of integration, we describe considerations for implementing these strategies either as part of the visualization tool or as part of the algorithm, and we identify requirements and guidelines for the design of algorithmic APIs. To assess the practical applicability, we provide a survey of frequently used algorithm implementations within R regarding the fulfillment of these guidelines. While echoing previous calls for analysis modules which support data exploration more directly, we conclude that a range of pragmatic options for enabling user involvement in ongoing computations exists on both the visualization and algorithm side and should be used.
Collapse
|
33
|
Malik A, Maciejewski R, Towers S, McCullough S, Ebert DS. Proactive Spatiotemporal Resource Allocation and Predictive Visual Analytics for Community Policing and Law Enforcement. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1863-1872. [PMID: 26356900 DOI: 10.1109/tvcg.2014.2346926] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we present a visual analytics approach that provides decision makers with a proactive and predictive environment in order to assist them in making effective resource allocation and deployment decisions. The challenges involved with such predictive analytics processes include end-users' understanding, and the application of the underlying statistical algorithms at the right spatiotemporal granularity levels so that good prediction estimates can be established. In our approach, we provide analysts with a suite of natural scale templates and methods that enable them to focus and drill down to appropriate geospatial and temporal resolution levels. Our forecasting technique is based on the Seasonal Trend decomposition based on Loess (STL) method, which we apply in a spatiotemporal visual analytics context to provide analysts with predicted levels of future activity. We also present a novel kernel density estimation technique we have developed, in which the prediction process is influenced by the spatial correlation of recent incidents at nearby locations. We demonstrate our techniques by applying our methodology to Criminal, Traffic and Civil (CTC) incident datasets.
Collapse
|
34
|
Krause J, Perer A, Bertini E. INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1614-1623. [PMID: 26356875 DOI: 10.1109/tvcg.2014.2346482] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.
Collapse
|
35
|
Lu Y, Wang F, Maciejewski R. Business intelligence from social media: a study from the VAST Box Office Challenge. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2014; 34:58-69. [PMID: 25248200 DOI: 10.1109/mcg.2014.61] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
With over 16 million tweets per hour, 600 new blog posts per minute, and 400 million active users on Facebook, businesses have begun searching for ways to turn real-time consumer-based posts into actionable intelligence. The goal is to extract information from this noisy, unstructured data and use it for trend analysis and prediction. Current practices support the idea that visual analytics (VA) can help enable the effective analysis of such data. However, empirical evidence demonstrating the effectiveness of a VA solution is still lacking. A proposed VA toolkit extracts data from Bitly and Twitter to predict movie revenue and ratings. Results from the 2013 VAST Box Office Challenge demonstrate the benefit of an interactive environment for predictive analysis, compared to a purely statistical modeling approach. The VA approach used by the toolkit is generalizable to other domains involving social media data, such as sales forecasting and advertisement analysis.
Collapse
|
36
|
Turkay C, Jeanquartier F, Holzinger A, Hauser H. On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics. INTERACTIVE KNOWLEDGE DISCOVERY AND DATA MINING IN BIOMEDICAL INFORMATICS 2014. [DOI: 10.1007/978-3-662-43968-5_7] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
37
|
Kehrer J, Piringer H, Berger W, Gröller ME. A model for structure-based comparison of many categories in small-multiple displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2013; 19:2287-2296. [PMID: 24051795 DOI: 10.1109/tvcg.2013.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Many application domains deal with multi-variate data that consist of both categorical and numerical information. Smallmultiple displays are a powerful concept for comparing such data by juxtaposition. For comparison by overlay or by explicit encoding of computed differences, however, a specification of references is necessary. In this paper, we present a formal model for defining semantically meaningful comparisons between many categories in a small-multiple display. Based on pivotized data that are hierarchically partitioned by the categories assigned to the x and y axis of the display, we propose two alternatives for structure-based comparison within this hierarchy. With an absolute reference specification, categories are compared to a fixed reference category. With a relative reference specification, in contrast, a semantic ordering of the categories is considered when comparing them either to the previous or subsequent category each. Both reference specifications can be defined at multiple levels of the hierarchy (including aggregated summaries), enabling a multitude of useful comparisons. We demonstrate the general applicability of our model in several application examples using different visualizations that compare data by overlay or explicit encoding of differences.
Collapse
Affiliation(s)
- Johannes Kehrer
- VRVis Research Center, Vienna, and the Institute of Computer Graphics and Algorithms, Vienna University of Technology
| | | | | | | |
Collapse
|