1
|
Huang T, Niu S, Zhang F, Wang B, Wang J, Liu G, Yao M. Correlating gene expression levels with transcription factor binding sites facilitates identification of key transcription factors from transcriptome data. Front Genet 2024; 15:1511456. [PMID: 39678374 PMCID: PMC11638204 DOI: 10.3389/fgene.2024.1511456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 11/18/2024] [Indexed: 12/17/2024] Open
Abstract
Identification of key transcription factors from transcriptome data by correlating gene expression levels with transcription factor binding sites is important for transcriptome data analysis. In a typical scenario, we always set a threshold to filter the top ranked differentially expressed genes and top ranked transcription factor binding sites. However, correlation analysis of filtered data can often result in spurious correlations. In this study, we tested four methods for creating the gene expression inputs (ranked gene list) in the correlation analysis: star coordinate map transformation (START), expression differential score (ED), preferential expression measure (PEM), and the specificity measure (SPM). Then, Kendall's tau correlation statistical algorithms implementing the standard (STD), LINEAR, MIX-LINEAR, DENSITY-CURVE, and MIXED-DENSITY-CURVE weighting methods were used to identify key transcription factors. ED was identified as the optimal method for creating a ranked gene list from filtered expression data, which can address the "unable to detect negative correlation" fallacy presented by other methods. The MIXED-DENSITY-CURVE was the most sensitive for identifying transcription factors from the gene set and list in which only the top proportion was correlated. Ultimately, 644 transcription factor candidates were identified from the transcriptome data of 1,206 cell lines, six of which were validated by wet lab experiments. The Jinzer and Flaver software implementing these methods can be obtained from http://www.thua45/cn/flaver under a free academic license.
Collapse
Affiliation(s)
- Tinghua Huang
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| | - Siqi Niu
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| | - Fanghong Zhang
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| | - Binyu Wang
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| | - Jianwu Wang
- College of Agriculture, Yangtze University, Jingzhou, China
| | - Guoping Liu
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| | - Min Yao
- College of Animal Science and Technology, Yangtze University, Jingzhou, China
| |
Collapse
|
2
|
Cibulski L, May T, Schmidt J, Kohlhammer J. COMPO*SED: Composite Parallel Coordinates for Co-Dependent Multi-Attribute Choices. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4047-4061. [PMID: 35679374 DOI: 10.1109/tvcg.2022.3180899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We propose Composite Parallel Coordinates, a novel parallel coordinates technique to effectively represent the interplay of component alternatives in a system. It builds upon a dedicated data model that formally describes the interaction of components. Parallel coordinates can help decision-makers identify the most preferred solution among a number of alternatives. Multi-component systems require one such multi-attribute choice for each component. Each of these choices might have side effects on the system's operability and performance, making them co-dependent. Common approaches employ complex multi-component models or involve back-and-forth iterations between single components until an acceptable compromise is reached. A simultaneous visual exploration across independently modeled but connected components is needed to make system design more efficient. Using dedicated layout and interaction strategies, our Composite Parallel Coordinates allow analysts to explore both individual properties of components as well as their interoperability and joint performance. We showcase the effectiveness of Composite Parallel Coordinates for co-dependent multi-attribute choices by means of three real-world scenarios from distinct application areas. In addition to the case studies, we reflect on observing two domain experts collaboratively working with the proposed technique and communicating along the way.
Collapse
|
3
|
Vander Plas S, Ge Y, Unwin A, Hofmann H. Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2195462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
4
|
Athawale TM, Johnson CR, Sane S, Pugmire D. Fiber Uncertainty Visualization for Bivariate Data With Parametric and Nonparametric Noise Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:613-623. [PMID: 36155460 DOI: 10.1109/tvcg.2022.3209424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Visualization and analysis of multivariate data and their uncertainty are top research challenges in data visualization. Constructing fiber surfaces is a popular technique for multivariate data visualization that generalizes the idea of level-set visualization for univariate data to multivariate data. In this paper, we present a statistical framework to quantify positional probabilities of fibers extracted from uncertain bivariate fields. Specifically, we extend the state-of-the-art Gaussian models of uncertainty for bivariate data to other parametric distributions (e.g., uniform and Epanechnikov) and more general nonparametric probability distributions (e.g., histograms and kernel density estimation) and derive corresponding spatial probabilities of fibers. In our proposed framework, we leverage Green's theorem for closed-form computation of fiber probabilities when bivariate data are assumed to have independent parametric and nonparametric noise. Additionally, we present a nonparametric approach combined with numerical integration to study the positional probability of fibers when bivariate data are assumed to have correlated noise. For uncertainty analysis, we visualize the derived probability volumes for fibers via volume rendering and extracting level sets based on probability thresholds. We present the utility of our proposed techniques via experiments on synthetic and simulation datasets.
Collapse
|
5
|
Toward a taxonomy for 2D non-paired General Line Coordinates: a comprehensive survey. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00361-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
6
|
Bok J, Kim B, Seo J. Augmenting Parallel Coordinates Plots With Color-Coded Stacked Histograms. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2563-2576. [PMID: 33201820 DOI: 10.1109/tvcg.2020.3038446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We introduce Parallel Histogram Plot (PHP), a technique that overcomes the innate limitations of parallel coordinates plot (PCP) by attaching stacked-bar histograms with discrete color schemes to PCP. The color-coded histograms enable users to see an overview of the whole data without cluttering or scalability issues. Each rectangle in the PHP histograms is color coded according to the data ranking by a selected attribute. This color-coding scheme allows users to visually examine relationships between attributes, even between those that are displayed far apart, without repositioning or reordering axes. We adopt the Visual Information Seeking Mantra so that the polylines of the original PCP can be used to show details of a small number of selected items when the cluttering problem subsides. We also design interactions, such as a focus+context technique, to help users investigate small regions of interest in a space-efficient manner. We provide a real-world example in which PHP is effectively utilized compared with other visualizations, and we perform a controlled user study to evaluate the performance of PHP in helping users estimate the correlation between attributes. The results demonstrate that the performance of PHP was consistent in the estimation of correlations between two attributes regardless of the distance between them.
Collapse
|
7
|
Weiskopf D. Uncertainty Visualization: Concepts, Methods, and Applications in Biological Data Visualization. FRONTIERS IN BIOINFORMATICS 2022; 2:793819. [PMID: 36304261 PMCID: PMC9580861 DOI: 10.3389/fbinf.2022.793819] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/14/2022] [Indexed: 11/23/2022] Open
Abstract
This paper provides an overview of uncertainty visualization in general, along with specific examples of applications in bioinformatics. Starting from a processing and interaction pipeline of visualization, components are discussed that are relevant for handling and visualizing uncertainty introduced with the original data and at later stages in the pipeline, which shows the importance of making the stages of the pipeline aware of uncertainty and allowing them to propagate uncertainty. We detail concepts and methods for visual mappings of uncertainty, distinguishing between explicit and implict representations of distributions, different ways to show summary statistics, and combined or hybrid visualizations. The basic concepts are illustrated for several examples of graph visualization under uncertainty. Finally, this review paper discusses implications for the visualization of biological data and future research directions.
Collapse
|
8
|
Zheng B, Sadlo F. Uncertainty in Continuous Scatterplots, Continuous Parallel Coordinates, and Fibers. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1819-1828. [PMID: 33048747 DOI: 10.1109/tvcg.2020.3030466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we introduce uncertainty to continuous scatterplots and continuous parallel coordinates. We derive respective models, validate them with sampling-based brute-force schemes, and present acceleration strategies for their computation. At the same time, we show that our approach lends itself as well for introducing uncertainty into the definition of fibers in bivariate data. Finally, we demonstrate the properties and the utility of our approach using specifically designed synthetic cases and simulated data.
Collapse
|
9
|
Rapp T, Peters C, Dachsbacher C. Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1580-1590. [PMID: 33048705 DOI: 10.1109/tvcg.2020.3030379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.
Collapse
|
10
|
Masood TB, Hotz I. Continuous Histograms for Anisotropy of 2D Symmetric Piece-Wise Linear Tensor Fields. MATHEMATICS AND VISUALIZATION 2021:39-70. [DOI: 10.1007/978-3-030-56215-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractIn this chapter we present an accurate derivation of the distribution of scalar invariants with quadratic behavior represented as continuous histograms. The anisotropy field, computed from a two-dimensional piece-wise linear tensor field, is used as an example and is discussed in all details. Histograms visualizing an approximation of the distribution of scalar values play an important role in visualization. They are used as an interface for the design of transfer-functions for volume rendering or feature selection in interactive interfaces. While there are standard algorithms to compute continuous histograms for piece-wise linear scalar fields, they are not directly applicable to tensor invariants with non-linear, often even non-convex behavior in cells when applying linear tensor interpolation. Our derivation is based on a sub-division of the mesh in triangles that exhibit a monotonic behavior. We compare the results to a naïve approach based on linear interpolation on the original mesh or the subdivision.
Collapse
|
11
|
Zhou L, Rivinius M, Johnson CR, Weiskopf D. Photographic High-Dynamic-Range Scalar Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2156-2167. [PMID: 32175863 PMCID: PMC8500312 DOI: 10.1109/tvcg.2020.2970522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose a photographic method to show scalar values of high dynamic range (HDR) by color mapping for 2D visualization. We combine (1) tone-mapping operators that transform the data to the display range of the monitor while preserving perceptually important features, based on a systematic evaluation, and (2) simulated glares that highlight high-value regions. Simulated glares are effective for highlighting small areas (of a few pixels) that may not be visible with conventional visualizations; through a controlled perception study, we confirm that glare is preattentive. The usefulness of our overall photographic HDR visualization is validated through the feedback of expert users.
Collapse
|
12
|
He W, Guo H, Shen HW, Peterka T. eFESTA: Ensemble Feature Exploration with Surface Density Estimates. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1716-1731. [PMID: 30418881 DOI: 10.1109/tvcg.2018.2879866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose surface density estimate (SDE) to model the spatial distribution of surface features-isosurfaces, ridge surfaces, and streamsurfaces-in 3D ensemble simulation data. The inputs of SDE computation are surface features represented as polygon meshes, and no field datasets are required (e.g., scalar fields or vector fields). The SDE is defined as the kernel density estimate of the infinite set of points on the input surfaces and is approximated by accumulating the surface densities of triangular patches. We also propose an algorithm to guide the selection of a proper kernel bandwidth for SDE computation. An ensemble Feature Exploration method based on Surface densiTy EstimAtes (eFESTA) is then proposed to extract and visualize the major trends of ensemble surface features. For an ensemble of surface features, each surface is first transformed into a density field based on its contribution to the SDE, and the resulting density fields are organized into a hierarchical representation based on the pairwise distances between them. The hierarchical representation is then used to guide visual exploration of the density fields as well as the underlying surface features. We demonstrate the application of our method using isosurface in ensemble scalar fields, Lagrangian coherent structures in uncertain unsteady flows, and streamsurfaces in ensemble fluid flows.
Collapse
|
13
|
Liu J, Gao Y, Shan G, Chi X. VASEM: visual analytics system for electron microscopy data bank. J Vis (Tokyo) 2019. [DOI: 10.1007/s12650-019-00597-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
14
|
Wang Y, Wang Z, Fu CW, Schmauder H, Deussen O, Weiskopf D. Image-Based Aspect Ratio Selection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:840-849. [PMID: 30137008 DOI: 10.1109/tvcg.2018.2865266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Selecting a good aspect ratio is crucial for effective 2D diagrams. There are several aspect ratio selection methods for function plots and line charts, but only few can handle general, discrete diagrams such as 2D scatter plots. However, these methods either lack a perceptual foundation or heavily rely on intermediate isoline representations, which depend on choosing the right isovalues and are time-consuming to compute. This paper introduces a general image-based approach for selecting aspect ratios for a wide variety of 2D diagrams, ranging from scatter plots and density function plots to line charts. Our approach is derived from Federer's co-area formula and a line integral representation that enable us to directly construct image-based versions of existing selection methods using density fields. In contrast to previous methods, our approach bypasses isoline computation, so it is faster to compute, while following the perceptual foundation to select aspect ratios. Furthermore, this approach is complemented by an anisotropic kernel density estimation to construct density fields, allowing us to more faithfully characterize data patterns, such as the subgroups in scatterplots or dense regions in time series. We demonstrate the effectiveness of our approach by quantitatively comparing to previous methods and revisiting a prior user study. Finally, we present extensions for ROI banking, multi-scale banking, and the application to image data.
Collapse
|
15
|
Zhou L, Weiskopf D. Indexed-Points Parallel Coordinates Visualization of Multivariate Correlations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1997-2010. [PMID: 28459690 DOI: 10.1109/tvcg.2017.2698041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We address the problem of visualizing multivariate correlations in parallel coordinates. We focus on multivariate correlation in the form of linear relationships between multiple variables. Traditional parallel coordinates are well prepared to show negative correlations between two attributes by distinct visual patterns. However, it is difficult to recognize positive correlations in parallel coordinates. Furthermore, there is no support to highlight multivariate correlations in parallel coordinates. In this paper, we exploit the indexed point representation of p -flats (planes in multidimensional data) to visualize local multivariate correlations in parallel coordinates. Our method yields clear visual signatures for negative and positive correlations alike, and it supports large datasets. All information is shown in a unified parallel coordinates framework, which leads to easy and familiar user interactions for analysts who have experience with traditional parallel coordinates. The usefulness of our method is demonstrated through examples of typical multidimensional datasets.
Collapse
|
16
|
Nguyen H, Rosen P. DSPCP: A Data Scalable Approach for Identifying Relationships in Parallel Coordinates. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1301-1315. [PMID: 28166499 DOI: 10.1109/tvcg.2017.2661309] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Parallel coordinates plots (PCPs) are a well-studied technique for exploring multi-attribute datasets. In many situations, users find them a flexible method to analyze and interact with data. Unfortunately, using PCPs becomes challenging as the number of data items grows large or multiple trends within the data mix in the visualization. The resulting overdraw can obscure important features. A number of modifications to PCPs have been proposed, including using color, opacity, smooth curves, frequency, density, and animation to mitigate this problem. However, these modified PCPs tend to have their own limitations in the kinds of relationships they emphasize. We propose a new data scalable design for representing and exploring data relationships in PCPs. The approach exploits the point/line duality property of PCPs and a local linear assumption of data to extract and represent relationship summarizations. This approach simultaneously shows relationships in the data and the consistency of those relationships. Our approach supports various visualization tasks, including mixed linear and nonlinear pattern identification, noise detection, and outlier detection, all in large data. We demonstrate these tasks on multiple synthetic and real-world datasets.
Collapse
|
17
|
Dunn W, Burgun A, Krebs MO, Rance B. Exploring and visualizing multidimensional data in translational research platforms. Brief Bioinform 2017; 18:1044-1056. [PMID: 27585944 PMCID: PMC5862238 DOI: 10.1093/bib/bbw080] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 07/30/2016] [Accepted: 08/03/2016] [Indexed: 01/20/2023] Open
Abstract
The unprecedented advances in technology and scientific research over the past few years have provided the scientific community with new and more complex forms of data. Large data sets collected from single groups or cross-institution consortiums containing hundreds of omic and clinical variables corresponding to thousands of patients are becoming increasingly commonplace in the research setting. Before any core analyses are performed, visualization often plays a key role in the initial phases of research, especially for projects where no initial hypotheses are dominant. Proper visualization of data at a high level facilitates researcher's abilities to find trends, identify outliers and perform quality checks. In addition, research has uncovered the important role of visualization in data analysis and its implied benefits facilitating our understanding of disease and ultimately improving patient care. In this work, we present a review of the current landscape of existing tools designed to facilitate the visualization of multidimensional data in translational research platforms. Specifically, we reviewed the biomedical literature for translational platforms allowing the visualization and exploration of clinical and omics data, and identified 11 platforms: cBioPortal, interactive genomics patient stratification explorer, Igloo-Plot, The Georgetown Database of Cancer Plus, tranSMART, an unnamed data-cube-based model supporting heterogeneous data, Papilio, Caleydo Domino, Qlucore Omics, Oracle Health Sciences Translational Research Center and OmicsOffice® powered by TIBCO Spotfire. In a health sector continuously witnessing an increase in data from multifarious sources, visualization tools used to better grasp these data will grow in their importance, and we believe our work will be useful in guiding investigators in similar situations.
Collapse
Affiliation(s)
- William Dunn
- Inserm University Paris Descartes UMR_S894 Centre de Psychiatrie et Neurosciences Laboratoire de Physiopathologie des maladies Psychiatriques, Paris, France
| | - Anita Burgun
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Marie-Odile Krebs
- Inserm University Paris Descartes UMR_S894 Centre de Psychiatrie et Neurosciences Laboratoire de Physiopathologie des maladies Psychiatriques, Paris, France
- Université Paris Descartes, Faculté de Médecine Paris Descartes, Service Hospitalo Universitaire, Centre Hospitalier Sainte-Anne, CNRS GDR 3557 – Institut de Psychiatrie, Paris, France
| | - Bastien Rance
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| |
Collapse
|
18
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|
19
|
Zhou L, Hansen CD. A Survey of Colormaps in Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:2051-69. [PMID: 26513793 PMCID: PMC4959790 DOI: 10.1109/tvcg.2015.2489649] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Colormaps are a vital method for users to gain insights into data in a visualization. With a good choice of colormaps, users are able to acquire information in the data more effectively and efficiently. In this survey, we attempt to provide readers with a comprehensive review of colormap generation techniques and provide readers a taxonomy which is helpful for finding appropriate techniques to use for their data and applications. Specifically, we first briefly introduce the basics of color spaces including color appearance models. In the core of our paper, we survey colormap generation techniques, including the latest advances in the field by grouping these techniques into four classes: procedural methods, user-study based methods, rule-based methods, and data-driven methods; we also include a section on methods that are beyond pure data comprehension purposes. We then classify colormapping techniques into a taxonomy for readers to quickly identify the appropriate techniques they might use. Furthermore, a representative set of visualization techniques that explicitly discuss the use of colormaps is reviewed and classified based on the nature of the data in these applications. Our paper is also intended to be a reference of colormap choices for readers when they are faced with similar data and/or tasks.
Collapse
Affiliation(s)
- Liang Zhou
- Visualisierungsinstitut, Universität Stuttgart (VISUS), Stuttgart, Germany
| | - Charles D. Hansen
- Scientific Computing and Imaging Institute and the School of Computing, University of Utah, Salt Lake City, UT 84112
| |
Collapse
|
20
|
Palomo C, Guo Z, Silva CT, Freire J. Visually Exploring Transportation Schedules. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:170-179. [PMID: 26529697 DOI: 10.1109/tvcg.2015.2467592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Public transportation schedules are designed by agencies to optimize service quality under multiple constraints. However, real service usually deviates from the plan. Therefore, transportation analysts need to identify, compare and explain both eventual and systemic performance issues that must be addressed so that better timetables can be created. The purely statistical tools commonly used by analysts pose many difficulties due to the large number of attributes at trip- and station-level for planned and real service. Also challenging is the need for models at multiple scales to search for patterns at different times and stations, since analysts do not know exactly where or when relevant patterns might emerge and need to compute statistical summaries for multiple attributes at different granularities. To aid in this analysis, we worked in close collaboration with a transportation expert to design TR-EX, a visual exploration tool developed to identify, inspect and compare spatio-temporal patterns for planned and real transportation service. TR-EX combines two new visual encodings inspired by Marey's Train Schedule: Trips Explorer for trip-level analysis of frequency, deviation and speed; and Stops Explorer for station-level study of delay, wait time, reliability and performance deficiencies such as bunching. To tackle overplotting and to provide a robust representation for a large numbers of trips and stops at multiple scales, the system supports variable kernel bandwidths to achieve the level of detail required by users for different tasks. We justify our design decisions based on specific analysis needs of transportation analysts. We provide anecdotal evidence of the efficacy of TR-EX through a series of case studies that explore NYC subway service, which illustrate how TR-EX can be used to confirm hypotheses and derive new insights through visual exploration.
Collapse
|
21
|
Stolte C, Sabir KS, Heinrich J, Hammang CJ, Schafferhans A, O'Donoghue SI. Integrated visual analysis of protein structures, sequences, and feature data. BMC Bioinformatics 2015; 16 Suppl 11:S7. [PMID: 26329268 PMCID: PMC4547178 DOI: 10.1186/1471-2105-16-s11-s7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.
Collapse
Affiliation(s)
| | - Kenneth S Sabir
- The Garvan Institute of Medical Research, Sydney, Australia
- The University of Sydney, Sydney, Australia
| | | | | | | | - Seán I O'Donoghue
- CSIRO, Sydney, Australia
- The Garvan Institute of Medical Research, Sydney, Australia
- The University of Sydney, Sydney, Australia
| |
Collapse
|
22
|
Heinrich J, Weiskopf D. Parallel Coordinates for Multidimensional Data Visualization: Basic Concepts. Comput Sci Eng 2015. [DOI: 10.1109/mcse.2015.55] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
23
|
Trimm D, Rheingans P, desJardins M. Visualizing Student Histories Using Clustering and Composition. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2012; 18:2809-2818. [PMID: 26357190 DOI: 10.1109/tvcg.2012.288] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
While intuitive time-series visualizations exist for common datasets, student course history data is difficult to represent using traditional visualization techniques due its concurrent nature. A visual composition process is developed and applied to reveal trends across various groupings. By working closely with educators, analytic strategies and techniques are developed to leverage the visualization composition to reveal unknown trends in the data. Furthermore, clustering algorithms are developed to group common course-grade histories for further analysis. Lastly, variations of the composition process are implemented to reveal subtle differences in the underlying data. These analytic tools and techniques enabled educators to confirm expected trends and to discover new ones.
Collapse
Affiliation(s)
- D Trimm
- University of Maryland, Baltimore County (UMBC), USA.
| | | | | |
Collapse
|
24
|
Harter JM, Wu X, Alabi OS, Phadke M, Pinto L, Dougherty D, Petersen H, Bass S, Taylor RM. Increasing the perceptual salience of relationships in parallel coordinate plots. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2012; 8294:82940T. [PMID: 23145217 DOI: 10.1117/12.907486] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
We present three extensions to parallel coordinates that increase the perceptual salience of relationships between axes in multivariate data sets: (1) luminance modulation maintains the ability to preattentively detect patterns in the presence of overplotting, (2) adding a one-vs.-all variable display highlights relationships between one variable and all others, and (3) adding a scatter plot within the parallel-coordinates display preattentively highlights clusters and spatial layouts without strongly interfering with the parallel-coordinates display. These techniques can be combined with one another and with existing extensions to parallel coordinates, and two of them generalize beyond cases with known-important axes. We applied these techniques to two real-world data sets (relativistic heavy-ion collision hydrodynamics and weather observations with statistical principal component analysis) as well as the popular car data set. We present relationships discovered in the data sets using these methods.
Collapse
Affiliation(s)
- Jonathan M Harter
- Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Guo H, Xiao H, Yuan X. Scalable Multivariate Volume Visualization and Analysis Based on Dimension Projection and Parallel Coordinates. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2012; 18:1397-1410. [PMID: 22411886 DOI: 10.1109/tvcg.2012.80] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In this paper, we present an effective and scalable system for multivariate volume data visualization and analysis with a novel transfer function interface design that tightly couples parallel coordinates plots (PCP) and MDS-based dimension projection plots. In our system, the PCP visualizes the data distribution of each variate (dimension) and the MDS plots project features. They are integrated seamlessly to provide flexible feature classification without context switching between different data presentations during the user interaction. The proposed interface enables users to identify relevant correlation clusters and assign optical properties with lassos, magic wand, and other tools. Furthermore, direct sketching on the volume rendered images has been implemented to probe and edit features. With our system, users can interactively analyze multivariate volumetric data sets by navigating and exploring feature spaces in unified PCP and MDS plots. To further support large-scale multivariate volume data visualization and analysis, Scalable Pivot MDS (SPMDS), parallel adaptive continuous PCP rendering, as well as parallel rendering techniques are developed and integrated into our visualization system. Our experiments show that the system is effective in multivariate volume data visualization and its performance is highly scalable for data sets with different sizes and number of variates.
Collapse
|
26
|
Hasenauer J, Heinrich J, Doszczak M, Scheurich P, Weiskopf D, Allgöwer F. A visual analytics approach for models of heterogeneous cell populations. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2012; 2012:4. [PMID: 22651376 PMCID: PMC3403928 DOI: 10.1186/1687-4153-2012-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Accepted: 05/31/2012] [Indexed: 01/26/2023]
Abstract
In recent years, cell population models have become increasingly common. In contrast to classic single cell models, population models allow for the study of cell-to-cell variability, a crucial phenomenon in most populations of primary cells, cancer cells, and stem cells. Unfortunately, tools for in-depth analysis of population models are still missing. This problem originates from the complexity of population models. Particularly important are methods to determine the source of heterogeneity (e.g., genetics or epigenetic differences) and to select potential (bio-)markers. We propose an analysis based on visual analytics to tackle this problem. Our approach combines parallel-coordinates plots, used for a visual assessment of the high-dimensional dependencies, and nonlinear support vector machines, for the quantification of effects. The method can be employed to study qualitative and quantitative differences among cells. To illustrate the different components, we perform a case study using the proapoptotic signal transduction pathway involved in cellular apoptosis.
Collapse
Affiliation(s)
- Jan Hasenauer
- Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, 70569 Stuttgart, Germany.
| | | | | | | | | | | |
Collapse
|
27
|
Burch M, Vehlow C, Beck F, Diehl S, Weiskopf D. Parallel edge splatting for scalable dynamic graph visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2011; 17:2344-2353. [PMID: 22034355 DOI: 10.1109/tvcg.2011.226] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We present a novel dynamic graph visualization technique based on node-link diagrams. The graphs are drawn side-byside from left to right as a sequence of narrow stripes that are placed perpendicular to the horizontal time line. The hierarchically organized vertices of the graphs are arranged on vertical, parallel lines that bound the stripes; directed edges connect these vertices from left to right. To address massive overplotting of edges in huge graphs, we employ a splatting approach that transforms the edges to a pixel-based scalar field. This field represents the edge densities in a scalable way and is depicted by non-linear color mapping. The visualization method is complemented by interaction techniques that support data exploration by aggregation, filtering, brushing, and selective data zooming. Furthermore, we formalize graph patterns so that they can be interactively highlighted on demand. A case study on software releases explores the evolution of call graphs extracted from the JUnit open source software project. In a second application, we demonstrate the scalability of our approach by applying it to a bibliography dataset containing more than 1.5 million paper titles from 60 years of research history producing a vast amount of relations between title words.
Collapse
|
28
|
Lehmann DJ, Theisel H. Features in Continuous Parallel Coordinates. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2011; 17:1912-1921. [PMID: 22034308 DOI: 10.1109/tvcg.2011.200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis.
Collapse
Affiliation(s)
- Dirk J Lehmann
- Department of Simulation and Graphics, University of Magdeburg, Germany.
| | | |
Collapse
|
29
|
Feng D, Kwock L, Lee Y, Taylor RM. Matching visual saliency to confidence in plots of uncertain data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2010; 16:980-989. [PMID: 20975135 PMCID: PMC3179257 DOI: 10.1109/tvcg.2010.176] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Conveying data uncertainty in visualizations is crucial for preventing viewers from drawing conclusions based on untrustworthy data points. This paper proposes a methodology for efficiently generating density plots of uncertain multivariate data sets that draws viewers to preattentively identify values of high certainty while not calling attention to uncertain values. We demonstrate how to augment scatter plots and parallel coordinates plots to incorporate statistically modeled uncertainty and show how to integrate them with existing multivariate analysis techniques, including outlier detection and interactive brushing. Computing high quality density plots can be expensive for large data sets, so we also describe a probabilistic plotting technique that summarizes the data without requiring explicit density plot computation. These techniques have been useful for identifying brain tumors in multivariate magnetic resonance spectroscopy data and we describe how to extend them to visualize ensemble data sets.
Collapse
Affiliation(s)
- David Feng
- University of North Carolina at Chapel Hill
| | | | - Yueh Lee
- UNC Hospital Department of Radiology
| | | |
Collapse
|
30
|
Paulovich FV, Silva CT, Nonato LG. Two-phase mapping for projecting massive data sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2010; 16:1281-1290. [PMID: 20975168 DOI: 10.1109/tvcg.2010.207] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Collapse
|
31
|
Lehmann DJ, Theisel H. Discontinuities in continuous scatter plots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2010; 16:1291-1300. [PMID: 20975169 DOI: 10.1109/tvcg.2010.146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The concept of continuous scatterplot (CSP) is a modern visualization technique. The idea is to define a scalar density value based on the map between an n-dimensional spatial domain and an m-dimensional data domain, which describe the CSP space. Usually the data domain is two-dimensional to visually convey the underlying, density coded, data. In this paper we investigate kinds of map-based discontinuities, especially for the practical cases n = m = 2 and n = 3 | m = 2, and we depict relations between them and attributes of the resulting CSP itself. Additionally, we show that discontinuities build critical line structures, and we introduce algorithms to detect them. Further, we introduce a discontinuity-based visualization approach—called contribution map (CM)—which establishes a relationship between the CSP's data domain and the number of connected components in the spatial domain. We show that CMs enhance the CSP-based linking & brushing interaction. Finally, we apply our approaches to a number of synthetic as well as real data sets.
Collapse
|
32
|
Dang TN, Wilkinson L, Anand A. Stacking graphic elements to avoid over-plotting. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2010; 16:1044-1052. [PMID: 20975142 DOI: 10.1109/tvcg.2010.197] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
An ongoing challenge for information visualization is how to deal with over-plotting forced by ties or the relatively limited visual field of display devices. A popular solution is to represent local data density with area (bubble plots, treemaps), color (heatmaps), or aggregation (histograms, kernel densities, pixel displays). All of these methods have at least one of three deficiencies:1) magnitude judgments are biased because area and color have convex downward perceptual functions, 2) area, hue, and brightness have relatively restricted ranges of perceptual intensity compared to length representations, and/or 3) it is difficult to brush or link to individual cases when viewing aggregations. In this paper, we introduce a new technique for visualizing and interacting with datasets that preserves density information by stacking overlapping cases. The overlapping data can be points or lines or other geometric elements, depending on the type of plot. We show real-dataset applications of this stacking paradigm and compare them to other techniques that deal with over-plotting in high-dimensional displays.
Collapse
Affiliation(s)
- Tuan Nhon Dang
- Department of Computer Science, University of Illinois at Chicago, USA.
| | | | | |
Collapse
|