1
|
Han D, Parsad G, Kim H, Shim J, Kwon OS, Son KA, Lee J, Cho I, Ko S. HisVA: A Visual Analytics System for Studying History. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4344-4359. [PMID: 34086573 DOI: 10.1109/tvcg.2021.3086414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Studying history involves many difficult tasks. Examples include searching for proper data in a large event space, understanding stories of historical events by time and space, and finding relationships among events that may not be apparent. Instructors who extensively use well-organized and well-argued materials (e.g., textbooks and online resources) can lead students to a narrow perspective in understanding history and prevent spontaneous investigation of historical events, with the students asking their own questions. In this article, we proposed HisVA, a visual analytics system that allows the efficient exploration of historical events from Wikipedia using three views: event, map, and resource. HisVA provides an effective event exploration space, where users can investigate relationships among historical events by reviewing and linking them in terms of space and time. To evaluate our system, we present two usage scenarios, a user study with a qualitative analysis of user exploration strategies, and in-class deployment results.
Collapse
|
2
|
Meinecke C, Wrisley DJ, Janicke S. Explaining Semi-Supervised Text Alignment Through Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4797-4809. [PMID: 34406941 DOI: 10.1109/tvcg.2021.3105899] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The analysis of variance in complex text traditions is an arduous task when carried out manually. Text alignment algorithms provide domain experts with a robust alternative to such repetitive tasks. Existing white-box approaches allow the digital humanities to establish syntax-based metrics taking into account the spelling, morphology and order of words. However, they produce limited results, as semantic meanings are typically not taken into account. Our interdisciplinary collaboration between visualization and digital humanities combined a semi-supervised text alignment approach based on word embeddings that take not only syntactic but also semantic text features into account, thereby improving the overall quality of the alignment. In our collaboration, we developed different visual interfaces that communicate the word distribution in high-dimensional vector space generated by the underlying neural network for increased transparency, assessment of the tool's reliability and overall improved hypothesis generation. We further offer visual means to enable the expert reader to feed domain knowledge into the system at multiple levels with the aim of improving both the product and the process of text alignment. This ultimately illustrates how visualization can engage with and augment complex modes of reading in the humanities.
Collapse
|
3
|
Heimerl F, Kralj C, Moller T, Gleicher M. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2953-2969. [PMID: 33347410 DOI: 10.1109/tvcg.2020.3045918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
Collapse
|
4
|
Hybrid multi-document text summarization via categorization based on BERT deep learning models. Int J Health Sci (Qassim) 2022. [DOI: 10.53730/ijhs.v6ns1.6095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Text summarization is the process of employing a system to shorten a document or a collection of documents into brief paragraphs or sentences using various approaches. This paper presents text categorization using BERT to improve summarization task which is a state-of-the-art deep learning language processing model that performs significantly better than all other previous language models. Multi-document summarization (MDS) has got its bottleneck due to lack of training data and varied categories of documents. Aiming in this direction, the proposed novel hybrid summarization B-HEATS (Bert based Hybrid Extractive Abstractive Text Summarization)framework is a combination of extractive summary via categorization and abstractive summary using deep learning architecture RNN-LSTM-CNN to fine-tune BERT which results in the qualitative summary for multiple documents and overcomes out of vocabulary (OOV). The output layer of BERT is replaced using RNN-LSTM-CNN architecture to fine tune which improves the summarization model. The proposed automatic text summarization is compared over the existing models in terms of performance measures like ROUGE metrics achieves high scores as R1 score 43.61, R2 score 22.64, R3 score 44.95 and RL score is 44.27 on Benchmark DUC datasets.
Collapse
|
5
|
Narechania A, Karduni A, Wesslen R, Wall E. VITALITY: Promoting Serendipitous Discovery of Academic Literature with Transformers & Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:486-496. [PMID: 34587054 DOI: 10.1109/tvcg.2021.3114820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
There are a few prominent practices for conducting reviews of academic literature, including searching for specific keywords on Google Scholar or checking citations from some initial seed paper(s). These approaches serve a critical purpose for academic literature reviews, yet there remain challenges in identifying relevant literature when similar work may utilize different terminology (e.g., mixed-initiative visual analytics papers may not use the same terminology as papers on model-steering, yet the two topics are relevant to one another). In this paper, we introduce a system, VITALITY, intended to complement existing practices. In particular, VITALITY promotes serendipitous discovery of relevant literature using transformer language models, allowing users to find semantically similar papers in a word embedding space given (1) a list of input paper(s) or (2) a working abstract. VITALITY visualizes this document-level embedding space in an interactive 2-D scatterplot using dimension reduction. VITALITY also summarizes meta information about the document corpus or search query, including keywords and co-authors, and allows users to save and export papers for use in a literature review. We present qualitative findings from an evaluation of VITALITY, suggesting it can be a promising complementary technique for conducting academic literature reviews. Furthermore, we contribute data from 38 popular data visualization publication venues in VITALITY, and we provide scrapers for the open-source community to continue to grow the list of supported venues.
Collapse
|
6
|
Knittel J, Koch S, Tang T, Chen W, Wu Y, Liu S, Ertl T. Real-Time Visual Analysis of High-Volume Social Media Posts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:879-889. [PMID: 34587041 DOI: 10.1109/tvcg.2021.3114800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Breaking news and first-hand reports often trend on social media platforms before traditional news outlets cover them. The real-time analysis of posts on such platforms can reveal valuable and timely insights for journalists, politicians, business analysts, and first responders, but the high number and diversity of new posts pose a challenge. In this work, we present an interactive system that enables the visual analysis of streaming social media data on a large scale in real-time. We propose an efficient and explainable dynamic clustering algorithm that powers a continuously updated visualization of the current thematic landscape as well as detailed visual summaries of specific topics of interest. Our parallel clustering strategy provides an adaptive stream with a digestible but diverse selection of recent posts related to relevant topics. We also integrate familiar visual metaphors that are highly interlinked for enabling both explorative and more focused monitoring tasks. Analysts can gradually increase the resolution to dive deeper into particular topics. In contrast to previous work, our system also works with non-geolocated posts and avoids extensive preprocessing such as detecting events. We evaluated our dynamic clustering algorithm and discuss several use cases that show the utility of our system.
Collapse
|
7
|
Abstract
Stability in social, technical, and financial systems, as well as the capacity of organizations to work across borders, requires consistency in public policy across jurisdictions. The diffusion of laws and regulations across political boundaries can reduce the tension that arises between innovation and consistency. Policy diffusion has been a topic of focus across the social sciences for several decades, but due to limitations of data and computational capacity, researchers have not taken a comprehensive and data-intensive look at the aggregate, cross-policy patterns of diffusion. This work combines visual analytics and text and network analyses to help understand how policies, as represented in digitized text, spread across states. As a result, our approach can quickly guide analysts to progressively gain insights into policy adoption data. We evaluate the effectiveness of our system via case studies with a real-world policy dataset and qualitative interviews with domain experts.
Collapse
Affiliation(s)
- Yongsu Ahn
- University of Pittsburgh, Pittsburgh, PA
| | - Yu-Ru Lin
- University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
8
|
Hosseini S, Najafipour S, Cheung NM, Yin H, Kangavari MR, Zhou X. TEAGS: time-aware text embedding approach to generate subgraphs. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00688-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
9
|
Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists. Knowl Inf Syst 2020. [DOI: 10.1007/s10115-020-01451-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
El-Assady M, Kehlbeck R, Collins C, Keim D, Deussen O. Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1001-1011. [PMID: 31443000 DOI: 10.1109/tvcg.2019.2934654] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a framework that allows users to incorporate the semantics of their domain knowledge for topic model refinement while remaining model-agnostic. Our approach enables users to (1) understand the semantic space of the model, (2) identify regions of potential conflicts and problems, and (3) readjust the semantic relation of concepts based on their understanding, directly influencing the topic modeling. These tasks are supported by an interactive visual analytics workspace that uses word-embedding projections to define concept regions which can then be refined. The user-refined concepts are independent of a particular document collection and can be transferred to related corpora. All user interactions within the concept space directly affect the semantic relations of the underlying vector space model, which, in turn, change the topic modeling. In addition to direct manipulation, our system guides the users' decision-making process through recommended interactions that point out potential improvements. This targeted refinement aims at minimizing the feedback required for an efficient human-in-the-loop process. We confirm the improvements achieved through our approach in two user studies that show topic model quality improvements through our visual knowledge externalization and learning process.
Collapse
|
11
|
Khayat M, Karimzadeh M, Ebert DS, Ghafoor A. The Validity, Generalizability and Feasibility of Summative Evaluation Methods in Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:353-363. [PMID: 31425085 DOI: 10.1109/tvcg.2019.2934264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many evaluation methods have been used to assess the usefulness of Visual Analytics (VA) solutions. These methods stem from a variety of origins with different assumptions and goals, which cause confusion about their proofing capabilities. Moreover, the lack of discussion about the evaluation processes may limit our potential to develop new evaluation methods specialized for VA. In this paper, we present an analysis of evaluation methods that have been used to summatively evaluate VA solutions. We provide a survey and taxonomy of the evaluation methods that have appeared in the VAST literature in the past two years. We then analyze these methods in terms of validity and generalizability of their findings, as well as the feasibility of using them. We propose a new metric called summative quality to compare evaluation methods according to their ability to prove usefulness, and make recommendations for selecting evaluation methods based on their summative quality in the VA domain.
Collapse
|
12
|
|
13
|
Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM. SEQ2SEQ-VIS : A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:353-363. [PMID: 30334796 DOI: 10.1109/tvcg.2018.2865044] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Neural sequence-to-sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work with a five-stage blackbox pipeline that begins with encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction and "what if"-style exploration of trained sequence-to-sequence models through each stage of the translation process. The aim is to identify which patterns have been learned, to detect model errors, and to probe the model with counterfactual scenario. We demonstrate the utility of our tool through several real-world sequence-to-sequence use cases on large-scale models.
Collapse
|
14
|
Visual exploration and comparison of word embeddings. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
15
|
Sultanum N, Singh D, Brudno M, Chevalier F. Doccurate: A Curation-Based Approach for Clinical Text Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:142-151. [PMID: 30136959 DOI: 10.1109/tvcg.2018.2864905] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Before seeing a patient, physicians seek to obtain an overview of the patient's medical history. Text plays a major role in this activity since it represents the bulk of the clinical documentation, but reviewing it quickly becomes onerous when patient charts grow too large. Text visualization methods have been widely explored to manage this large scale through visual summaries that rely on information retrieval algorithms to structure text and make it amenable to visualization. However, the integration with such automated approaches comes with a number of limitations, including significant error rates and the need for healthcare providers to fine-tune algorithms without expert knowledge of their inner mechanics. In addition, several of these approaches obscure or substitute the original clinical text and therefore fail to leverage qualitative and rhetorical flavours of the clinical notes. These drawbacks have limited the adoption of text visualization and other summarization technologies in clinical practice. In this work we present Doccurate, a novel system embodying a curation-based approach for the visualization of large clinical text datasets. Our approach offers automation auditing and customizability to physicians while also preserving and extensively linking to the original text. We discuss findings of a formal qualitative evaluation conducted with 6 domain experts, shedding light onto physicians' information needs, perceived strengths and limitations of automated tools, and the importance of customization while balancing efficiency. We also present use case scenarios to showcase Doccurate's envisioned usage in practice.
Collapse
|
16
|
Hohman FM, Kahng M, Pienta R, Chau DH. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:10.1109/TVCG.2018.2843369. [PMID: 29993551 PMCID: PMC6703958 DOI: 10.1109/tvcg.2018.2843369] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Deep learning has recently seen rapid development and received significant attention due to its state-of-the-art performance on previously-thought hard problems. However, because of the internal complexity and nonlinear structure of deep neural networks, the underlying decision making processes for why these models are achieving such performance are challenging and sometimes mystifying to interpret. As deep learning spreads across domains, it is of paramount importance that we equip users of deep learning with tools for understanding when a model works correctly, when it fails, and ultimately how to improve its performance. Standardized toolkits for building neural networks have helped democratize deep learning; visual analytics systems have now been developed to support model explanation, interpretation, debugging, and improvement. We present a survey of the role of visual analytics in deep learning research, which highlights its short yet impactful history and thoroughly summarizes the state-of-the-art using a human-centered interrogative framework, focusing on the Five W's and How (Why, Who, What, How, When, and Where). We conclude by highlighting research directions and open research problems. This survey helps researchers and practitioners in both visual analytics and deep learning to quickly learn key aspects of this young and rapidly growing body of research, whose impact spans a diverse range of domains.
Collapse
|