1
|
Li G, Wang J, Wang Y, Shan G, Zhao Y. An In-Situ Visual Analytics Framework for Deep Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:6770-6786. [PMID: 38051629 DOI: 10.1109/tvcg.2023.3339585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The past decade has witnessed the superior power of deep neural networks (DNNs) in applications across various domains. However, training a high-quality DNN remains a non-trivial task due to its massive number of parameters. Visualization has shown great potential in addressing this situation, as evidenced by numerous recent visualization works that aid in DNN training and interpretation. These works commonly employ a strategy of logging training-related data and conducting post-hoc analysis. Based on the results of offline analysis, the model can be further trained or fine-tuned. This strategy, however, does not cope with the increasing complexity of DNNs, because (1) the time-series data collected over the training are usually too large to be stored entirely; (2) the huge I/O overhead significantly impacts the training efficiency; (3) post-hoc analysis does not allow rapid human-interventions (e.g., stop training with improper hyper-parameter settings to save computational resources). To address these challenges, we propose an in-situ visualization and analysis framework for the training of DNNs. Specifically, we employ feature extraction algorithms to reduce the size of training-related data in-situ and use the reduced data for real-time visual analytics. The states of model training are disclosed to model designers in real-time, enabling human interventions on demand to steer the training. Through concrete case studies, we demonstrate how our in-situ framework helps deep learning experts optimize DNNs and improve their analysis efficiency.
Collapse
|
2
|
GPU-based adaptive data reconstruction for large-scale statistical visualization. J Vis (Tokyo) 2023. [DOI: 10.1007/s12650-022-00892-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
|
3
|
Neuroth T, Rieth M, Aditya K, Lee M, Chen JH, Ma KL. Level Set Restricted Voronoi Tessellation for Large scale Spatial Statistical Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:548-558. [PMID: 36166541 DOI: 10.1109/tvcg.2022.3209473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Spatial statistical analysis of multivariate volumetric data can be challenging due to scale, complexity, and occlusion. Advances in topological segmentation, feature extraction, and statistical summarization have helped overcome the challenges. This work introduces a new spatial statistical decomposition method based on level sets, connected components, and a novel variation of the restricted centroidal Voronoi tessellation that is better suited for spatial statistical decomposition and parallel efficiency. The resulting data structures organize features into a coherent nested hierarchy to support flexible and efficient out-of-core region-of-interest extraction. Next, we provide an efficient parallel implementation. Finally, an interactive visualization system based on this approach is designed and then applied to turbulent combustion data. The combined approach enables an interactive spatial statistical analysis workflow for large-scale data with a top-down approach through multiple-levels-of-detail that links phase space statistics with spatial features.
Collapse
|
4
|
Hu R, Ye Z, Chen B, van Kaick O, Huang H. Self-Supervised Color-Concept Association via Image Colorization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:247-256. [PMID: 36166543 DOI: 10.1109/tvcg.2022.3209481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The interpretation of colors in visualizations is facilitated when the assignments between colors and concepts in the visualizations match human's expectations, implying that the colors can be interpreted in a semantic manner. However, manually creating a dataset of suitable associations between colors and concepts for use in visualizations is costly, as such associations would have to be collected from humans for a large variety of concepts. To address the challenge of collecting this data, we introduce a method to extract color-concept associations automatically from a set of concept images. While the state-of-the-art method extracts associations from data with supervised learning, we developed a self-supervised method based on colorization that does not require the preparation of ground truth color-concept associations. Our key insight is that a set of images of a concept should be sufficient for learning color-concept associations, since humans also learn to associate colors to concepts mainly from past visual input. Thus, we propose to use an automatic colorization method to extract statistical models of the color-concept associations that appear in concept images. Specifically, we take a colorization model pre-trained on ImageNet and fine-tune it on the set of images associated with a given concept, to predict pixel-wise probability distributions in Lab color space for the images. Then, we convert the predicted probability distributions into color ratings for a given color library and aggregate them for all the images of a concept to obtain the final color-concept associations. We evaluate our method using four different evaluation metrics and via a user study. Experiments show that, although the state-of-the-art method based on supervised learning with user-provided ratings is more effective at capturing relative associations, our self-supervised method obtains overall better results according to metrics like Earth Mover's Distance (EMD) and Entropy Difference (ED), which are closer to human perception of color distributions.
Collapse
|
5
|
Biswas A, Dutta S, Lawrence E, Patchett J, Calhoun JC, Ahrens J. Probabilistic Data-Driven Sampling via Multi-Criteria Importance Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:4439-4454. [PMID: 32746272 DOI: 10.1109/tvcg.2020.3006426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Although supercomputers are becoming increasingly powerful, their components have thus far not scaled proportionately. Compute power is growing enormously and is enabling finely resolved simulations that produce never-before-seen features. However, I/O capabilities lag by orders of magnitude, which means only a fraction of the simulation data can be stored for post hoc analysis. Prespecified plans for saving features and quantities of interest do not work for features that have not been seen before. Data-driven intelligent sampling schemes are needed to detect and save important parts of the simulation while it is running. Here, we propose a novel sampling scheme that reduces the size of the data by orders-of-magnitude while still preserving important regions. The approach we develop selects points with unusual data values and high gradients. We demonstrate that our approach outperforms traditional sampling schemes on a number of tasks.
Collapse
|
6
|
Rapp T, Peters C, Dachsbacher C. Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1580-1590. [PMID: 33048705 DOI: 10.1109/tvcg.2020.3030379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.
Collapse
|
7
|
Biswas A, Ahrens JP, Dutta S, Musser JM, Almgren AS, Turton TL. Feature Analysis, Tracking, and Data Reduction: An Application to Multiphase Reactor Simulation MFiX-Exa for In-Situ Use Case. Comput Sci Eng 2021. [DOI: 10.1109/mcse.2020.3016927] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ayan Biswas
- Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Soumya Dutta
- Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Ann S. Almgren
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | |
Collapse
|
8
|
Wang KC, Wei TH, Shareef N, Shen HW. Ray-Based Exploration of Large Time-Varying Volume Data Using Per-Ray Proxy Distributions. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3299-3313. [PMID: 31170075 DOI: 10.1109/tvcg.2019.2920130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The analysis and visualization of data created from simulations on modern supercomputers is a daunting challenge because the incredible compute power of modern supercomputers allow scientists to generate datasets with very high spatial and temporal resolutions. The limited bandwidth and capacity of networking and storage devices connecting supercomputers to analysis machines become the major bottleneck for data analysis such that simply moving the whole dataset from the supercomputer to a data analysis machine is infeasible. A common approach to visualize high temporal resolution simulation datasets under constrained I/O is to reduce the sampling rate in the temporal domain while preserving the original spatial resolution at the time steps. Data interpolation between the sampled time steps alone may not be a viable option since it may suffer from large errors, especially when using a lower sampling rate. We present a novel ray-based representation storing ray based histograms and depth information that recovers the evolution of volume data between sampled time steps. Our view-dependent proxy allows for a good trade off between compactly representing the time-varying data and leveraging temporal coherence within the data by utilizing interpolation between time steps, ray histograms, depth information, and codebooks. Our approach is able to provide fast rendering in the context of transfer function exploration to support visualization of feature evolution in time-varying data.
Collapse
|
9
|
Rapp T, Peters C, Dachsbacher C. Void-and-Cluster Sampling of Large Scattered Data and Trajectories. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:780-789. [PMID: 31425103 DOI: 10.1109/tvcg.2019.2934335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose a data reduction technique for scattered data based on statistical sampling. Our void-and-cluster sampling technique finds a representative subset that is optimally distributed in the spatial domain with respect to the blue noise property. In addition, it can adapt to a given density function, which we use to sample regions of high complexity in the multivariate value domain more densely. Moreover, our sampling technique implicitly defines an ordering on the samples that enables progressive data loading and a continuous level-of-detail representation. We extend our technique to sample time-dependent trajectories, for example pathlines in a time interval, using an efficient and iterative approach. Furthermore, we introduce a local and continuous error measure to quantify how well a set of samples represents the original dataset. We apply this error measure during sampling to guide the number of samples that are taken. Finally, we use this error measure and other quantities to evaluate the quality, performance, and scalability of our algorithm.
Collapse
|
10
|
He W, Wang J, Guo H, Wang KC, Shen HW, Raj M, Nashed YSG, Peterka T. InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:23-33. [PMID: 31425097 DOI: 10.1109/tvcg.2019.2934312] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose InSituNet, a deep learning based surrogate model to support parameter space exploration for ensemble simulations that are visualized in situ. In situ visualization, generating visualizations at simulation time, is becoming prevalent in handling large-scale simulations because of the I/O and storage constraints. However, in situ visualization approaches limit the flexibility of post-hoc exploration because the raw simulation data are no longer available. Although multiple image-based approaches have been proposed to mitigate this limitation, those approaches lack the ability to explore the simulation parameters. Our approach allows flexible exploration of parameter space for large-scale ensemble simulations by taking advantage of the recent advances in deep learning. Specifically, we design InSituNet as a convolutional regression model to learn the mapping from the simulation and visualization parameters to the visualization results. With the trained model, users can generate new images for different simulation parameters under various visualization settings, which enables in-depth analysis of the underlying ensemble simulations. We demonstrate the effectiveness of InSituNet in combustion, cosmology, and ocean simulations through quantitative and qualitative evaluations.
Collapse
|
11
|
Dutta S, Biswas A, Ahrens J. Multivariate Pointwise Information-Driven Data Sampling and Visualization. ENTROPY 2019; 21:e21070699. [PMID: 33267413 PMCID: PMC7515213 DOI: 10.3390/e21070699] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 06/25/2019] [Accepted: 07/06/2019] [Indexed: 12/05/2022]
Abstract
With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.
Collapse
|
12
|
Zhou F, Lin X, Liu C, Zhao Y, Xu P, Ren L, Xue T, Ren L. A survey of visualization for smart manufacturing. J Vis (Tokyo) 2018. [DOI: 10.1007/s12650-018-0530-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Hazarika S, Dutta S, Shen HW, Chen JP. CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:1214-1224. [PMID: 30130206 DOI: 10.1109/tvcg.2018.2864801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
Collapse
|
14
|
Zhou F, Lin X, Luo X, Zhao Y, Chen Y, Chen N, Gui W. Visually enhanced situation awareness for complex manufacturing facility monitoring in smart factories. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2017.11.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Hazarika S, Biswas A, Shen HW. Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:934-943. [PMID: 28866523 DOI: 10.1109/tvcg.2017.2744099] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Distributions are often used to model uncertainty in many scientific datasets. To preserve the correlation among the spatially sampled grid locations in the dataset, various standard multivariate distribution models have been proposed in visualization literature. These models treat each grid location as a univariate random variable which models the uncertainty at that location. Standard multivariate distributions (both parametric and nonparametric) assume that all the univariate marginals are of the same type/family of distribution. But in reality, different grid locations show different statistical behavior which may not be modeled best by the same type of distribution. In this paper, we propose a new multivariate uncertainty modeling strategy to address the needs of uncertainty modeling in scientific datasets. Our proposed method is based on a statistically sound multivariate technique called Copula, which makes it possible to separate the process of estimating the univariate marginals and the process of modeling dependency, unlike the standard multivariate distributions. The modeling flexibility offered by our proposed method makes it possible to design distribution fields which can have different types of distribution (Gaussian, Histogram, KDE etc.) at the grid locations, while maintaining the correlation structure at the same time. Depending on the results of various standard statistical tests, we can choose an optimal distribution representation at each location, resulting in a more cost efficient modeling without significantly sacrificing on the analysis quality. To demonstrate the efficacy of our proposed modeling strategy, we extract and visualize uncertain features like isocontours and vortices in various real world datasets. We also study various modeling criterion to help users in the task of univariate model selection.
Collapse
|