1
|
Bernard J, Barth CM, Cuba E, Meier A, Peiris Y, Shneiderman B. IVESA - Visual Analysis of Time-Stamped Event Sequences. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2235-2256. [PMID: 38587948 DOI: 10.1109/tvcg.2024.3382760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Time-stamped event sequences (TSEQs) are time-oriented data without value information, shifting the focus of users to the exploration of temporal event occurrences. TSEQs exist in application domains, such as sleeping behavior, earthquake aftershocks, and stock market crashes. Domain experts face four challenges, for which they could use interactive and visual data analysis methods. First, TSEQs can be large with respect to both the number of sequences and events, often leading to millions of events. Second, domain experts need validated metrics and features to identify interesting patterns. Third, after identifying interesting patterns, domain experts contextualize the patterns to foster sensemaking. Finally, domain experts seek to reduce data complexity by data simplification and machine learning support. We present IVESA, a visual analytics approach for TSEQs. It supports the analysis of TSEQs at the granularities of sequences and events, supported with metrics and feature analysis tools. IVESA has multiple linked views that support overview, sort+filter, comparison, details-on-demand, and metadata relation-seeking tasks, as well as data simplification through feature analysis, interactive clustering, filtering, and motif detection and simplification. We evaluated IVESA with three case studies and a user study with six domain experts working with six different datasets and applications. Results demonstrate the usability and generalizability of IVESA across applications and cases that had up to 1,000,000 events.
Collapse
|
2
|
Yin J, Jia H, Zhou B, Tang T, Ying L, Ye S, Peng TQ, Wu Y. Blowing Seeds Across Gardens: Visualizing Implicit Propagation of Cross-Platform Social Media Posts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:185-195. [PMID: 39255156 DOI: 10.1109/tvcg.2024.3456181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Propagation analysis refers to studying how information spreads on social media, a pivotal endeavor for understanding social sentiment and public opinions. Numerous studies contribute to visualizing information spread, but few have considered the implicit and complex diffusion patterns among multiple platforms. To bridge the gap, we summarize cross-platform diffusion patterns with experts and identify significant factors that dissect the mechanisms of cross-platform information spread. Based on that, we propose an information diffusion model that estimates the likelihood of a topic/post spreading among different social media platforms. Moreover, we propose a novel visual metaphor that encapsulates cross-platform propagation in a manner analogous to the spread of seeds across gardens. Specifically, we visualize platforms, posts, implicit cross-platform routes, and salient instances as elements of a virtual ecosystem - gardens, flowers, winds, and seeds, respectively. We further develop a visual analytic system, namely BloomWind, that enables users to quickly identify the cross-platform diffusion patterns and investigate the relevant social media posts. Ultimately, we demonstrate the usage of BloomWind through two case studies and validate its effectiveness using expert interviews.
Collapse
|
3
|
Tomassi A, Falegnami A, Romano E. Mapping automatic social media information disorder. The role of bots and AI in spreading misleading information in society. PLoS One 2024; 19:e0303183. [PMID: 38820281 PMCID: PMC11142451 DOI: 10.1371/journal.pone.0303183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 04/19/2024] [Indexed: 06/02/2024] Open
Abstract
This paper presents an analysis on information disorder in social media platforms. The study employed methods such as Natural Language Processing, Topic Modeling, and Knowledge Graph building to gain new insights into the phenomenon of fake news and its impact on critical thinking and knowledge management. The analysis focused on four research questions: 1) the distribution of misinformation, disinformation, and malinformation across different platforms; 2) recurring themes in fake news and their visibility; 3) the role of artificial intelligence as an authoritative and/or spreader agent; and 4) strategies for combating information disorder. The role of AI was highlighted, both as a tool for fact-checking and building truthiness identification bots, and as a potential amplifier of false narratives. Strategies proposed for combating information disorder include improving digital literacy skills and promoting critical thinking among social media users.
Collapse
Affiliation(s)
- Andrea Tomassi
- Engineering Faculty, Uninettuno International Telematic University, Rome, Italy
| | - Andrea Falegnami
- Engineering Faculty, Uninettuno International Telematic University, Rome, Italy
| | - Elpidio Romano
- Engineering Faculty, Uninettuno International Telematic University, Rome, Italy
| |
Collapse
|
4
|
Chen AT, Komi M, Bessler S, Mikles SP, Zhang Y. Integrating statistical and visual analytic methods for bot identification of health-related survey data. J Biomed Inform 2023; 144:104439. [PMID: 37419375 DOI: 10.1016/j.jbi.2023.104439] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 07/02/2023] [Accepted: 07/04/2023] [Indexed: 07/09/2023]
Abstract
OBJECTIVE In recent years, we have increasingly observed issues concerning quality of online information due to misinformation and disinformation. Aside from social media, there is growing awareness that questionnaire data collected using online recruitment methods may include suspect data provided by bots. Issues with data quality can be particularly problematic in health and/or biomedical contexts; thus, developing robust methods for suspect data identification and removal is of paramount importance in informatics. In this study, we describe an interactive visual analytics approach to suspect data identification and removal and demonstrate the application of this approach on questionnaire data pertaining to COVID-19 derived from different recruitment venues, including listservs and social media. METHODS We developed a pipeline for data cleaning, pre-processing, analysis, and automated ranking of data to address data quality issues. We then employed the ranking in conjunction with manual review to identify suspect data and remove them from subsequent analyses. Last, we compared differences in the data before and after removal. RESULTS We performed data cleaning, pre-processing, and exploratory analysis on a survey dataset (N = 4,163) collected using multiple recruitment mechanins using the Qualtrics survey platform. Based on these results, we identified suspect features and used these to generate a suspect feature indicator for each survey response. We excluded survey responses that did not fit the inclusion criteria for the study (n = 29) and then performed manual review of the remaining responses, triangulating with the suspect feature indicator. Based on this review, we excluded 2,921 responses. Additional responses were excluded based on a spam classification by Qualtrics (n=13), and the percentage of survey completion (n=328), resulting in a final sample size of 872. We performed additional analyses to demonstrate the extent to which the suspect feature indicator was congruent with eventual inclusion, as well as compared the characteristics of the included and excluded data. CONCLUSION Our main contributions are: 1) a proposed framework for data quality assessment, including suspect data identification and removal; 2) the analysis of potential consequences in terms of representation bias in the dataset; and 3) recommendations for implementation of this approach in practice.
Collapse
Affiliation(s)
- Annie T Chen
- Department of Biomedical Informatics and Medical Education, University of Washington School of Medicine, 850 Republican St., Box 358047, Seattle, WA 98195, United States.
| | - Midori Komi
- University of Washington, Department of Mathematics Box 354350, Seattle, WA 98195-4350, United States
| | - Sierrah Bessler
- University of Washington, Department of Applied Mathematics, 4182 W Stevens Way NE, Seattle, WA 98105, United States.
| | - Sean P Mikles
- Lineberger Comprehensive Cancer Outcomes Program, Lineberger Comprehensive Cancer Center, UNC School of Medicine, 450 West Drive, Chapel Hill, NC 27514, United States
| | - Yan Zhang
- School of Information, The University of Texas at Austin, 1616 Guadalupe Suite #5.202, Austin, TX 78701-1213, United States.
| |
Collapse
|
5
|
Ying L, Shu X, Deng D, Yang Y, Tang T, Yu L, Wu Y. MetaGlyph: Automatic Generation of Metaphoric Glyph-based Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:331-341. [PMID: 36179002 DOI: 10.1109/tvcg.2022.3209447] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Glyph-based visualization achieves an impressive graphic design when associated with comprehensive visual metaphors, which help audiences effectively grasp the conveyed information through revealing data semantics. However, creating such metaphoric glyph-based visualization (MGV) is not an easy task, as it requires not only a deep understanding of data but also professional design skills. This paper proposes MetaGlyph, an automatic system for generating MGVs from a spreadsheet. To develop MetaGlyph, we first conduct a qualitative analysis to understand the design of current MGVs from the perspectives of metaphor embodiment and glyph design. Based on the results, we introduce a novel framework for generating MGVs by metaphoric image selection and an MGV construction. Specifically, MetaGlyph automatically selects metaphors with corresponding images from online resources based on the input data semantics. We then integrate a Monte Carlo tree search algorithm that explores the design of an MGV by associating visual elements with data dimensions given the data importance, semantic relevance, and glyph non-overlap. The system also provides editing feedback that allows users to customize the MGVs according to their design preferences. We demonstrate the use of MetaGlyph through a set of examples, one usage scenario, and validate its effectiveness through a series of expert interviews.
Collapse
|
6
|
Guo Y, Guo S, Jin Z, Kaul S, Gotz D, Cao N. Survey on Visual Analysis of Event Sequence Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:5091-5112. [PMID: 34314358 DOI: 10.1109/tvcg.2021.3100413] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.
Collapse
|
7
|
Guo S, Jin Z, Chen Q, Gotz D, Zha H, Cao N. Interpretable Anomaly Detection in Event Sequences via Sequence Matching and Visual Comparison. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4531-4545. [PMID: 34191728 DOI: 10.1109/tvcg.2021.3093585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this article, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequences and normal sequences. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm, demonstrate the effectiveness of our system through case studies, and report feedback collected from study participants.
Collapse
|
8
|
Arleo A, Didimo W, Liotta G, Miksch S, Montecchiani F. Influence Maximization With Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3428-3440. [PMID: 35830402 DOI: 10.1109/tvcg.2022.3190623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In social networks, individuals' decisions are strongly influenced by recommendations from their friends, acquaintances, and favorite renowned personalities. The popularity of online social networking platforms makes them the prime venues to advertise products and promote opinions. The Influence Maximization (IM) problem entails selecting a seed set of users that maximizes the influence spread, i.e., the expected number of users positively influenced by a stochastic diffusion process triggered by the seeds. Engineering and analyzing IM algorithms remains a difficult and demanding task due to the NP-hardness of the problem and the stochastic nature of the diffusion processes. Despite several heuristics being introduced, they often fail in providing enough information on how the network topology affects the diffusion process, precious insights that could help researchers improve their seed set selection. In this paper, we present VAIM, a visual analytics system that supports users in analyzing, evaluating, and comparing information diffusion processes determined by different IM algorithms. Furthermore, VAIM provides useful insights that the analyst can use to modify the seed set of an IM algorithm, so to improve its influence spread. We assess our system by: (i) a qualitative evaluation based on a guided experiment with two domain experts on two different data sets; (ii) a quantitative estimation of the value of the proposed visualization through the ICE-T methodology by Wall et al. (IEEE TVCG - 2018). The twofold assessment indicates that VAIM effectively supports our target users in the visual analysis of the performance of IM algorithms.
Collapse
|
9
|
Sharma S, Saraswat M, Dubey AK. Fake news detection on Twitter. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2022. [DOI: 10.1108/ijwis-02-2022-0044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
Owing to the increased accessibility of internet and related technologies, more and more individuals across the globe now turn to social media for their daily dose of news rather than traditional news outlets. With the global nature of social media and hardly any checks in place on posting of content, exponential increase in spread of fake news is easy. Businesses propagate fake news to improve their economic standing and influencing consumers and demand, and individuals spread fake news for personal gains like popularity and life goals. The content of fake news is diverse in terms of topics, styles and media platforms, and fake news attempts to distort truth with diverse linguistic styles while simultaneously mocking true news. All these factors together make fake news detection an arduous task. This work tried to check the spread of disinformation on Twitter.
Design/methodology/approach
This study carries out fake news detection using user characteristics and tweet textual content as features. For categorizing user characteristics, this study uses the XGBoost algorithm. To classify the tweet text, this study uses various natural language processing techniques to pre-process the tweets and then apply a hybrid convolutional neural network–recurrent neural network (CNN-RNN) and state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) transformer.
Findings
This study uses a combination of machine learning and deep learning approaches for fake news detection, namely, XGBoost, hybrid CNN-RNN and BERT. The models have also been evaluated and compared with various baseline models to show that this approach effectively tackles this problem.
Originality/value
This study proposes a novel framework that exploits news content and social contexts to learn useful representations for predicting fake news. This model is based on a transformer architecture, which facilitates representation learning from fake news data and helps detect fake news easily. This study also carries out an investigative study on the relative importance of content and social context features for the task of detecting false news and whether absence of one of these categories of features hampers the effectiveness of the resultant system. This investigation can go a long way in aiding further research on the subject and for fake news detection in the presence of extremely noisy or unusable data.
Collapse
|
10
|
Artificial Intelligence-Based Medical Data Mining. J Pers Med 2022; 12:jpm12091359. [PMID: 36143144 PMCID: PMC9501106 DOI: 10.3390/jpm12091359] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/02/2022] [Accepted: 08/17/2022] [Indexed: 11/17/2022] Open
Abstract
Understanding published unstructured textual data using traditional text mining approaches and tools is becoming a challenging issue due to the rapid increase in electronic open-source publications. The application of data mining techniques in the medical sciences is an emerging trend; however, traditional text-mining approaches are insufficient to cope with the current upsurge in the volume of published data. Therefore, artificial intelligence-based text mining tools are being developed and used to process large volumes of data and to explore the hidden features and correlations in the data. This review provides a clear-cut and insightful understanding of how artificial intelligence-based data-mining technology is being used to analyze medical data. We also describe a standard process of data mining based on CRISP-DM (Cross-Industry Standard Process for Data Mining) and the most common tools/libraries available for each step of medical data mining.
Collapse
|
11
|
Tan L, Wang G, Jia F, Lian X. Research status of deep learning methods for rumor detection. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:2941-2982. [PMID: 35469150 PMCID: PMC9022167 DOI: 10.1007/s11042-022-12800-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/12/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
To manage the rumors in social media to reduce the harm of rumors in society. Many studies used methods of deep learning to detect rumors in open networks. To comprehensively sort out the research status of rumor detection from multiple perspectives, this paper analyzes the highly focused work from three perspectives: Feature Selection, Model Structure, and Research Methods. From the perspective of feature selection, we divide methods into content feature, social feature, and propagation structure feature of the rumors. Then, this work divides deep learning models of rumor detection into CNN, RNN, GNN, Transformer based on the model structure, which is convenient for comparison. Besides, this work summarizes 30 works into 7 rumor detection methods such as propagation trees, adversarial learning, cross-domain methods, multi-task learning, unsupervised and semi-supervised methods, based knowledge graph, and other methods for the first time. And compare the advantages of different methods to detect rumors. In addition, this review enumerate datasets available and discusses the potential issues and future work to help researchers advance the development of field.
Collapse
Affiliation(s)
- Li Tan
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048 China
| | - Ge Wang
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048 China
| | - Feiyang Jia
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048 China
| | - Xiaofeng Lian
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing, 100048 China
| |
Collapse
|
12
|
Assessment of Factors Impacting the Perception of Online Content Trustworthiness by Age, Education and Gender. SOCIETIES 2022. [DOI: 10.3390/soc12020061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Online content trustworthiness has become a topic of significant interest due to the growth of so-called ‘fake news’ and other deceptive online content. Deceptive content has been responsible for an armed standoff, caused mistrust surrounding elections and reduced the trust in media, generally. Modern society, though, depends on the ability to share information to function. Citizens may be injured if they don’t heed medical, weather and other emergency warnings. Distrust for educational information impedes the transfer of knowledge of innovations and societal growth. To function properly, societal trust in shared in information is critical. This article seeks to understand the problem and possible solutions. It assesses the impact of the characteristics of online articles and their authors, publishers and sponsors on perceived trustworthiness to ascertain how Americans make online article trust decisions. This analysis is conducted with a focus on how the impact of these factors on trustworthiness varies based on individuals’ age, education and gender.
Collapse
|
13
|
Dimara E, Stasko J. A Critical Reflection on Visualization Research: Where Do Decision Making Tasks Hide? IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1128-1138. [PMID: 34587049 DOI: 10.1109/tvcg.2021.3114813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
It has been widely suggested that a key goal of visualization systems is to assist decision making, but is this true? We conduct a critical investigation on whether the activity of decision making is indeed central to the visualization domain. By approaching decision making as a user task, we explore the degree to which decision tasks are evident in visualization research and user studies. Our analysis suggests that decision tasks are not commonly found in current visualization task taxonomies and that the visualization field has yet to leverage guidance from decision theory domains on how to study such tasks. We further found that the majority of visualizations addressing decision making were not evaluated based on their ability to assist decision tasks. Finally, to help expand the impact of visual analytics in organizational as well as casual decision making activities, we initiate a research agenda on how decision making assistance could be elevated throughout visualization research.
Collapse
|
14
|
Al-Rawi A. News loopholing: Telegram news as portable alternative media. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE 2021; 5:949-968. [PMID: 34981037 PMCID: PMC8715841 DOI: 10.1007/s42001-021-00155-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 12/10/2021] [Indexed: 06/14/2023]
Abstract
This paper deals with foreign state-run media outlets that disseminate Persian language news targeted to the Iranian public. More specifically, it focuses on the mobile news app Telegram by undertaking a content analysis of a sample of the top 400 most viewed stories across four channels, i.e., BBC Persian, Voice of America's Persian language service VOA Farsi, Radio Farda, and Iran International television channel. It also offers a topic modelling of all news stories posted. Results show that most of the news coverage centered on politics, particularly with an emphasis on internal Iranian issues, while a few other channels repeatedly urged their followers to submit not only their email addresses and other private information, but also photographs and/or videos of anti-government protests. Conceptually, I consider these channels as portable alternative media, as opposed to state-run news media, since the Iranian public seeks them out as sources of political information that assist them in better understanding world news and, most importantly, news about their own country. The Telegram instant messaging app is related to the meso dimension of alternative media, meaning that it is characterized by the unique production and dissemination means it utilizes. This paper concludes by highlighting the implications of foreign state-run news outlets using news loopholing to disseminate information, while simultaneously collecting private information about their users and/or potentially risking their safety.
Collapse
Affiliation(s)
- Ahmed Al-Rawi
- School of Communication, Simon Fraser University, Room # K8645, 8888 University Dr., Burnaby, BC V5A 1S6 Canada
| |
Collapse
|
15
|
de Souza MC, Nogueira BM, Rossi RG, Marcacini RM, dos Santos BN, Rezende SO. A network-based positive and unlabeled learning approach for fake news detection. Mach Learn 2021; 111:3549-3592. [PMID: 34815619 PMCID: PMC8601374 DOI: 10.1007/s10994-021-06111-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/02/2022]
Abstract
Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.
Collapse
|
16
|
Lv C, Ren K, Zhang H, Fu J, Lin Y. PEVis: visual analytics of potential anomaly pattern evolution for temporal multivariate data. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00807-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Zhou C, Li K, Lu Y. Linguistic characteristics and the dissemination of misinformation in social media: The moderating effect of information richness. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2021.102679] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
HyIDSVis: hybrid intrusion detection visualization analysis based on rare category and association rules. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00789-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
19
|
Chen C, Yuan J, Lu Y, Liu Y, Su H, Yuan S, Liu S. OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3335-3349. [PMID: 32070976 DOI: 10.1109/tvcg.2020.2973258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this article, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integrates an ensemble OoD detection method and a grid-based visualization. The detection method is improved from deep ensembles by combining more features with algorithms in the same family. To better analyze and understand the OoD samples in context, we have developed a novel kNN-based grid layout algorithm motivated by Hall's theorem. The algorithm approximates the optimal layout and has O(kN2) time complexity, faster than the grid layout algorithm with overall best performance but O(N3) time complexity. Quantitative evaluation and case studies were performed on several datasets to demonstrate the effectiveness and usefulness of OoDAnalyzer.
Collapse
|
20
|
Zhang T, Chen Z, Zhao Z, Luo X, Zheng W, Chen W. FaultTracer: interactive visual exploration of fault propagation patterns in power grid simulation data. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-020-00741-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Deep learning for misinformation detection on online social networks: a survey and new perspectives. SOCIAL NETWORK ANALYSIS AND MINING 2020; 10:82. [PMID: 33014173 PMCID: PMC7524036 DOI: 10.1007/s13278-020-00696-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 09/11/2020] [Accepted: 09/12/2020] [Indexed: 11/23/2022]
Abstract
Recently, the use of social networks such as Facebook, Twitter, and Sina Weibo has become an inseparable part of our daily lives. It is considered as a convenient platform for users to share personal messages, pictures, and videos. However, while people enjoy social networks, many deceptive activities such as fake news or rumors can mislead users into believing misinformation. Besides, spreading the massive amount of misinformation in social networks has become a global risk. Therefore, misinformation detection (MID) in social networks has gained a great deal of attention and is considered an emerging area of research interest. We find that several studies related to MID have been studied to new research problems and techniques. While important, however, the automated detection of misinformation is difficult to accomplish as it requires the advanced model to understand how related or unrelated the reported information is when compared to real information. The existing studies have mainly focused on three broad categories of misinformation: false information, fake news, and rumor detection. Therefore, related to the previous issues, we present a comprehensive survey of automated misinformation detection on (i) false information, (ii) rumors, (iii) spam, (iv) fake news, and (v) disinformation. We provide a state-of-the-art review on MID where deep learning (DL) is used to automatically process data and create patterns to make decisions not only to extract global features but also to achieve better results. We further show that DL is an effective and scalable technique for the state-of-the-art MID. Finally, we suggest several open issues that currently limit real-world implementation and point to future directions along this dimension.
Collapse
|
22
|
Abstract
Social networking sites such as Twitter have been a popular choice for people to express their opinions, report real-life events, and provide a perspective on what is happening around the world. In the outbreak of the COVID-19 pandemic, people have used Twitter to spontaneously share data visualizations from news outlets and government agencies and to post casual data visualizations that they individually crafted. We conducted a Twitter crawl of 5409 visualizations (from the period between 14 April 2020 and 9 May 2020) to capture what people are posting. Our study explores what people are posting, what they retweet the most, and the challenges that may arise when interpreting COVID-19 data visualization on Twitter. Our findings show that multiple factors, such as the source of the data, who created the chart (individual vs. organization), the type of visualization, and the variables on the chart influence the retweet count of the original post. We identify and discuss five challenges that arise when interpreting these casual data visualizations, and discuss recommendations that should be considered by Twitter users while designing COVID-19 data visualizations to facilitate data interpretation and to avoid the spread of misconceptions and confusion.
Collapse
|
23
|
Yan J, Shi L, Tao J, Yu X, Zhuang Z, Huang C, Yu R, Su P, Wang C, Chen Y. Visual Analysis of Collective Anomalies Using Faceted High-Order Correlation Graphs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2517-2534. [PMID: 30582546 DOI: 10.1109/tvcg.2018.2889470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Successfully detecting, analyzing, and reasoning about collective anomalies is important for many real-life application domains (e.g., intrusion detection, fraud analysis, software security). The primary challenges to achieving this goal include the overwhelming number of low-risk events and their multimodal relationships, the diversity of collective anomalies by various data and anomaly types, and the difficulty in incorporating the domain knowledge of experts. In this paper, we propose the novel concept of the faceted High-Order Correlation Graph (HOCG). Compared with previous, low-order correlation graphs, HOCG achieves better user interactivity, computational scalability, and domain generality through synthesizing heterogeneous types of objects, their anomalies, and the multimodal relationships, all in a single graph. We design elaborate visual metaphors, interaction models, and the coordinated multiple view based interface to allow users to fully unleash the visual analytics power of the HOCG. We conduct case studies for three application domains and collect feedback from domain experts who apply our method to these scenarios. The results demonstrate the effectiveness of the HOCG in the overview of point anomalies, the detection of collective anomalies, and the reasoning process of root cause analyses.
Collapse
|
24
|
|
25
|
Li J, Chen S, Chen W, Andrienko G, Andrienko N. Semantics-Space-Time Cube: A Conceptual Framework for Systematic Analysis of Texts in Space and Time. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1789-1806. [PMID: 30475721 DOI: 10.1109/tvcg.2018.2882449] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales.
Collapse
|
26
|
Ma Y, Tung AKH, Wang W, Gao X, Pan Z, Chen W. ScatterNet: A Deep Subjective Similarity Model for Visual Analysis of Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1562-1576. [PMID: 30334762 DOI: 10.1109/tvcg.2018.2875702] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Similarity measuring methods are widely adopted in a broad range of visualization applications. In this work, we address the challenge of representing human perception in the visual analysis of scatterplots by introducing a novel deep-learning-based approach, ScatterNet, captures perception-driven similarities of such plots. The approach exploits deep neural networks to extract semantic features of scatterplot images for similarity calculation. We create a large labeled dataset consisting of similar and dissimilar images of scatterplots to train the deep neural network. We conduct a set of evaluations including performance experiments and a user study to demonstrate the effectiveness and efficiency of our approach. The evaluations confirm that the learned features capture the human perception of scatterplot similarity effectively. We describe two scenarios to show how ScatterNet can be applied in visual analysis applications.
Collapse
|
27
|
|
28
|
Krak I, Barmak O, Manziuk E. Using visual analytics to develop human and machine‐centric models: A review of approaches and proposed information technology. Comput Intell 2020. [DOI: 10.1111/coin.12289] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Iurii Krak
- Department of Theoretical CyberneticsTaras Shevchenko National University of Kyiv Kyiv Ukraine
| | - Olexander Barmak
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| | - Eduard Manziuk
- Department of Computer Science and Information TechnologiesNational University of Khmelnytskyi Khmelnytskyi Ukraine
| |
Collapse
|
29
|
Li Q, Wu Z, Yi L, Seann K, Qu H, Ma X. WeSeer: Visual Analysis for Better Information Cascade Prediction of WeChat Articles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1399-1412. [PMID: 30176600 DOI: 10.1109/tvcg.2018.2867776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Social media, such as Facebook and WeChat, empowers millions of users to create, consume, and disseminate online information on an unprecedented scale. The abundant information on social media intensifies the competition of WeChat Public Official Articles (i.e., posts) for gaining user attention due to the zero-sum nature of attention. Therefore, only a small portion of information tends to become extremely popular while the rest remains unnoticed or quickly disappears. Such a typical "long-tail" phenomenon is very common in social media. Thus, recent years have witnessed a growing interest in predicting the future trend in the popularity of social media posts and understanding the factors that influence the popularity of the posts. Nevertheless, existing predictive models either rely on cumbersome feature engineering or sophisticated parameter tuning, which are difficult to understand and improve. In this paper, we study and enhance a point process-based model by incorporating visual reasoning to support communication between the users and the predictive model for a better prediction result. The proposed system supports users to uncover the working mechanism behind the model and improve the prediction accuracy accordingly based on the insights gained. We use realistic WeChat articles to demonstrate the effectiveness of the system and verify the improved model on a large scale of WeChat articles. We also elicit and summarize the feedback from WeChat domain experts.
Collapse
|
30
|
Khayat M, Karimzadeh M, Zhao J, Ebert DS. VASSL: A Visual Analytics Toolkit for Social Spambot Labeling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:874-883. [PMID: 31425086 DOI: 10.1109/tvcg.2019.2934266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Social media platforms are filled with social spambots. Detecting these malicious accounts is essential, yet challenging, as they continually evolve to evade detection techniques. In this article, we present VASSL, a visual analytics system that assists in the process of detecting and labeling spambots. Our tool enhances the performance and scalability of manual labeling by providing multiple connected views and utilizing dimensionality reduction, sentiment analysis and topic modeling, enabling insights for the identification of spambots. The system allows users to select and analyze groups of accounts in an interactive manner, which enables the detection of spambots that may not be identified when examined individually. We present a user study to objectively evaluate the performance of VASSL users, as well as capturing subjective opinions about the usefulness and the ease of use of the tool.
Collapse
|
31
|
Xu K, Wang Y, Yang L, Wang Y, Qiao B, Qin S, Xu Y, Zhang H, Qu H. CloudDet: Interactive Visual Analysis of Anomalous Performances in Cloud Computing Systems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1107-1117. [PMID: 31442994 DOI: 10.1109/tvcg.2019.2934613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Detecting and analyzing potential anomalous performances in cloud computing systems is essential for avoiding losses to customers and ensuring the efficient operation of the systems. To this end, a variety of automated techniques have been developed to identify anomalies in cloud computing. These techniques are usually adopted to track the performance metrics of the system (e.g., CPU, memory, and disk I/O), represented by a multivariate time series. However, given the complex characteristics of cloud computing data, the effectiveness of these automated methods is affected. Thus, substantial human judgment on the automated analysis results is required for anomaly interpretation. In this paper, we present a unified visual analytics system named CloudDet to interactively detect, inspect, and diagnose anomalies in cloud computing systems. A novel unsupervised anomaly detection algorithm is developed to identify anomalies based on the specific temporal patterns of the given metrics data (e.g., the periodic pattern). Rich visualization and interaction designs are used to help understand the anomalies in the spatial and temporal context. We demonstrate the effectiveness of CloudDet through a quantitative evaluation, two case studies with real-world data, and interviews with domain experts.
Collapse
|
32
|
Chen S, Li S, Chen S, Yuan X. R-Map: A Map Metaphor for Visualizing Information Reposting Process in Social Media. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1204-1214. [PMID: 31425084 DOI: 10.1109/tvcg.2019.2934263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose R-Map (Reposting Map), a visual analytical approach with a map metaphor to support interactive exploration and analysis of the information reposting process in social media. A single original social media post can cause large cascades of repostings (i.e., retweets) on online networks, involving thousands, even millions of people with different opinions. Such reposting behaviors form the reposting tree, in which a node represents a message and a link represents the reposting relation. In R-Map, the reposting tree structure can be spatialized with highlighted key players and tiled nodes. The important reposting behaviors, the following relations and the semantics relations are represented as rivers, routes and bridges, respectively, in a virtual geographical space. R-Map supports a scalable overview of a large number of information repostings with semantics. Additional interactions on the map are provided to support the investigation of temporal patterns and user behaviors in the information diffusion process. We evaluate the usability and effectiveness of our system with two use cases and a formal user study.
Collapse
|
33
|
Liu S, Wang X, Collins C, Dou W, Ouyang F, El-Assady M, Jiang L, Keim DA. Bridging Text Visualization and Mining: A Task-Driven Survey. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2482-2504. [PMID: 29993887 DOI: 10.1109/tvcg.2018.2834341] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization and text mining. From the analysis, we derived around 300 concepts (visualization techniques, mining techniques, and analysis tasks) and built a taxonomy for each type of concept. The co-occurrence relationships between the concepts were also extracted. Our research can be used as a stepping-stone for other researchers to 1) understand a common set of concepts used in this research topic; 2) facilitate the exploration of the relationships between visualization techniques, mining techniques, and analysis tasks; 3) understand the current practice in developing visual text analytics tools; 4) seek potential research opportunities by narrowing the gulf between visualization and mining techniques based on the analysis tasks; and 5) analyze other interdisciplinary research areas in a similar way. We have also contributed a web-based visualization tool for analyzing and understanding research trends and opportunities in visual text analytics.
Collapse
|
34
|
Abstract
Information diffusion analysis is important in social media. In this work, we present a coherent ego-centric and event-centric model to investigate diffusion patterns and user behaviors. Applying the model, we propose Diffusion Map+ (D-Maps+), a novel visualization method to support exploration and analysis of user behaviors and diffusion patterns through a map metaphor. For ego-centric analysis, users who participated in reposting (i.e., resending a message initially posted by others) one central user’s posts (i.e., a series of original tweets) are collected. Event-centric analysis focuses on multiple central users discussing a specific event, with all the people participating and reposting messages about it. Social media users are mapped to a hexagonal grid based on their behavior similarities and in the chronological order of repostings. With the additional interactions and linkings, D-Map+ is capable of providing visual profiling of influential users, describing their social behaviors and analyzing the evolution of significant events in social media. A comprehensive visual analysis system is developed to support interactive exploration with D-Map+. We evaluate our work with real-world social media data and find interesting patterns among users and events. We also perform evaluations including user studies and expert feedback to certify the capabilities of our method.
Collapse
Affiliation(s)
- Siming Chen
- Key Laboratory of Machine Perception (Ministry of Education) and School of EECS, Peking University, Beijing, China
| | - Shuai Chen
- Key Laboratory of Machine Perception (Ministry of Education) and School of EECS, Peking University, Beijing, China
| | - Zhenhuang Wang
- Key Laboratory of Machine Perception (Ministry of Education) and School of EECS, Peking University, Beijing, China
| | - Jie Liang
- Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia
| | - Yadong Wu
- Southwest University of Science and Technology, China
| | - Xiaoru Yuan
- Key Laboratory of Machine Perception (Ministry of Education) and School of EECS, Peking University, Beijing, China
| |
Collapse
|
35
|
Xu K, Xia M, Mu X, Wang Y, Cao N. EnsembleLens: Ensemble-based Visual Exploration of Anomaly Detection Algorithms with Multidimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:109-119. [PMID: 30130216 DOI: 10.1109/tvcg.2018.2864825] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The results of anomaly detection are sensitive to the choice of detection algorithms as they are specialized for different properties of data, especially for multidimensional data. Thus, it is vital to select the algorithm appropriately. To systematically select the algorithms, ensemble analysis techniques have been developed to support the assembly and comparison of heterogeneous algorithms. However, challenges remain due to the absence of the ground truth, interpretation, or evaluation of these anomaly detectors. In this paper, we present a visual analytics system named EnsembleLens that evaluates anomaly detection algorithms based on the ensemble analysis process. The system visualizes the ensemble processes and results by a set of novel visual designs and multiple coordinated contextual views to meet the requirements of correlation analysis, assessment and reasoning of anomaly detection algorithms. We also introduce an interactive analysis workflow that dynamically produces contextualized and interpretable data summaries that allow further refinements of exploration results based on user feedback. We demonstrate the effectiveness of EnsembleLens through a quantitative evaluation, three case studies with real-world data and interviews with two domain experts.
Collapse
|
36
|
Lin H, Gao S, Gotz D, Du F, He J, Cao N. RCLens: Interactive Rare Category Exploration and Identification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2223-2237. [PMID: 28600250 DOI: 10.1109/tvcg.2017.2711030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Rare category identification is an important task in many application domains, ranging from network security, to financial fraud detection, to personalized medicine. These are all applications which require the discovery and characterization of sets of rare but structurally-similar data entities which are obscured within a larger but structurally different dataset. This paper introduces RCLens, a visual analytics system designed to support user-guided rare category exploration and identification. RCLens adopts a novel active learning-based algorithm to iteratively identify more accurate rare categories in response to user-provided feedback. The algorithm is tightly integrated with an interactive visualization-based interface which supports a novel and effective workflow for rare category identification. This paper (1) defines RCLens' underlying active-learning algorithm; (2) describes the visualization and interaction designs, including a discussion of how the designs support user-guided rare category identification; and (3) presents results from an evaluation demonstrating RCLens' ability to support the rare category identification process.
Collapse
|
37
|
Kim J, Bae J, Hastak M. Emergency information diffusion on online social media during storm Cindy in U.S. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2018. [DOI: 10.1016/j.ijinfomgt.2018.02.003] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
38
|
Abstract
Rapid advancement of social media tremendously facilitates and accelerates the information diffusion among users around the world. How and to what extent will the information on social media achieve widespread diffusion across the world? How can we quantify the interaction between users from different geolocations in the diffusion process? How will the spatial patterns of information diffusion change over time? To address these questions, a dynamic social gravity model (SGM) is proposed to quantify the dynamic spatial interaction behavior among social media users in information diffusion. The dynamic SGM includes three factors that are theoretically significant to the spatial diffusion of information: geographic distance, cultural proximity, and linguistic similarity. Temporal dimension is also taken into account to help detect recency effect, and ground-truth data is integrated into the model to help measure the diffusion power. Furthermore, SocialWave, a visual analytic system, is developed to support both spatial and temporal investigative tasks. SocialWave provides a temporal visualization that allows users to quickly identify the overall temporal diffusion patterns, which reflect the spatial characteristics of the diffusion network. When a meaningful temporal pattern is identified, SocialWave utilizes a new occlusion-free spatial visualization, which integrates a node-link diagram into a circular cartogram for further analysis. Moreover, we propose a set of rich user interactions that enable in-depth, multi-faceted analysis of the diffusion on social media. The effectiveness and efficiency of the mathematical model and visualization system are evaluated with two datasets on social media, namely, Ebola Epidemics and Ferguson Unrest.
Collapse
Affiliation(s)
- Guodao Sun
- Zhejiang University of Technology, Hangzhou, China
| | - Tan Tang
- Zhejiang University, Hangzhou, China
| | | | | | | |
Collapse
|
39
|
Cao N, Lin C, Zhu Q, Lin YR, Teng X, Wen X. Voila: Visual Anomaly Detection and Monitoring with Streaming Spatiotemporal Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:23-33. [PMID: 28866547 DOI: 10.1109/tvcg.2017.2744419] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The increasing availability of spatiotemporal data continuously collected from various sources provides new opportunities for a timely understanding of the data in their spatial and temporal context. Finding abnormal patterns in such data poses significant challenges. Given that there is often no clear boundary between normal and abnormal patterns, existing solutions are limited in their capacity of identifying anomalies in large, dynamic and heterogeneous data, interpreting anomalies in their multifaceted, spatiotemporal context, and allowing users to provide feedback in the analysis loop. In this work, we introduce a unified visual interactive system and framework, Voila, for interactively detecting anomalies in spatiotemporal data collected from a streaming data source. The system is designed to meet two requirements in real-world applications, i.e., online monitoring and interactivity. We propose a novel tensor-based anomaly analysis algorithm with visualization and interaction design that dynamically produces contextualized, interpretable data summaries and allows for interactively ranking anomalous patterns based on user input. Using the "smart city" as an example scenario, we demonstrate the effectiveness of the proposed framework through quantitative evaluation and qualitative case studies.
Collapse
|
40
|
A Visual Analytics Approach for Detecting and Understanding Anomalous Resident Behaviors in Smart Healthcare. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7030254] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
41
|
Wang X, Liu S, Liu J, Chen J, Zhu J, Guo B. TopicPanorama: A Full Picture of Relevant Topics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:2508-2521. [PMID: 26761818 DOI: 10.1109/tvcg.2016.2515592] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper presents a visual analytics approach to analyzing a full picture of relevant topics discussed in multiple sources, such as news, blogs, or micro-blogs. The full picture consists of a number of common topics covered by multiple sources, as well as distinctive topics from each source. Our approach models each textual corpus as a topic graph. These graphs are then matched using a consistent graph matching method. Next, we develop a level-of-detail (LOD) visualization that balances both readability and stability. Accordingly, the resulting visualization enhances the ability of users to understand and analyze the matched graph from multiple perspectives. By incorporating metric learning and feature selection into the graph matching algorithm, we allow users to interactively modify the graph matching result based on their information needs. We have applied our approach to various types of data, including news articles, tweets, and blog data. Quantitative evaluation and real-world case studies demonstrate the promise of our approach, especially in support of examining a topic-graph-based full picture at different levels of detail.
Collapse
|
42
|
Liu S, Yin J, Wang X, Cui W, Cao K, Pei J. Online Visual Analytics of Text Streams. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:2451-2466. [PMID: 26701787 DOI: 10.1109/tvcg.2015.2509990] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We present an online visual analytics approach to helping users explore and understand hierarchical topic evolution in high-volume text streams. The key idea behind this approach is to identify representative topics in incoming documents and align them with the existing representative topics that they immediately follow (in time). To this end, we learn a set of streaming tree cuts from topic trees based on user-selected focus nodes. A dynamic Bayesian network model has been developed to derive the tree cuts in the incoming topic trees to balance the fitness of each tree cut and the smoothness between adjacent tree cuts. By connecting the corresponding topics at different times, we are able to provide an overview of the evolving hierarchical topics. A sedimentation-based visualization has been designed to enable the interactive analysis of streaming text data from global patterns to local details. We evaluated our method on real-world datasets and the results are generally favorable.
Collapse
|
43
|
|
44
|
Wu F, Zhu M, Wang Q, Zhao X, Chen W, Maciejewski R. Spatial–temporal visualization of city-wide crowd movement. J Vis (Tokyo) 2016. [DOI: 10.1007/s12650-016-0368-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
45
|
Abstract
In recent years, there has been a growing trend to use publicly available social media sources within the field of journalism. Breaking news has tight reporting deadlines, measured in minutes not days, but content must still be checked and rumors verified. As such, journalists are looking at automated content analysis to prefilter large volumes of social media content prior to manual verification. This article describes a real-time social media analytics framework for journalists. We extend our previously published geoparsing approach to improve its scalability and efficiency. We develop and evaluate a novel approach to geosemantic feature extraction, classifying evidence in terms of situatedness, timeliness, confirmation, and validity. Our approach works for new unseen news topics. We report results from four experiments using five Twitter datasets crawled during different English-language news events. One of our datasets is the standard TREC 2012 microblog corpus. Our classification results are promising, with F1 scores varying by class from 0.64 to 0.92 for unseen event types. We lastly report results from two case studies during real-world news stories, showcasing different ways our system can assist journalists filter and cross-check content as they examine the trust and veracity of content and sources.
Collapse
Affiliation(s)
| | - Vadims Krivcovs
- University of Southampton IT Innovation Centre, Southampton, UK
| |
Collapse
|
46
|
Beigi G, Hu X, Maciejewski R, Liu H. An Overview of Sentiment Analysis in Social Media and Its Applications in Disaster Relief. SENTIMENT ANALYSIS AND ONTOLOGY ENGINEERING 2016. [DOI: 10.1007/978-3-319-30319-2_13] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
47
|
Cao N, Shi C, Lin S, Lu J, Lin YR, Lin CY. TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communication Systems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:280-289. [PMID: 26529707 DOI: 10.1109/tvcg.2015.2467196] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Users with anomalous behaviors in online communication systems (e.g. email and social medial platforms) are potential threats to society. Automated anomaly detection based on advanced machine learning techniques has been developed to combat this issue; challenges remain, though, due to the difficulty of obtaining proper ground truth for model training and evaluation. Therefore, substantial human judgment on the automated analysis results is often required to better adjust the performance of anomaly detection. Unfortunately, techniques that allow users to understand the analysis results more efficiently, to make a confident judgment about anomalies, and to explore data in their context, are still lacking. In this paper, we propose a novel visual analysis system, TargetVue, which detects anomalous users via an unsupervised learning model and visualizes the behaviors of suspicious users in behavior-rich context through novel visualization designs and multiple coordinated contextual views. Particularly, TargetVue incorporates three new ego-centric glyphs to visually summarize a user's behaviors which effectively present the user's communication activities, features, and social interactions. An efficient layout method is proposed to place these glyphs on a triangle grid, which captures similarities among users and facilitates comparisons of behaviors of different users. We demonstrate the power of TargetVue through its application in a social bot detection challenge using Twitter data, a case study based on email records, and an interview with expert users. Our evaluation shows that TargetVue is beneficial to the detection of users with anomalous communication behaviors.
Collapse
|
48
|
Liu M, Liu S, Zhu X, Liao Q, Wei F, Pan S. An Uncertainty-Aware Approach for Exploratory Microblog Retrieval. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:250-259. [PMID: 26529705 DOI: 10.1109/tvcg.2015.2467554] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient posts, users, and hashtags. We extend an existing ranking technique to compute a multifaceted retrieval result: the mutual reinforcement rank of a graph node, the uncertainty of each rank, and the propagation of uncertainty among different graph nodes. To illustrate the three facets, we have also designed a composite visualization with three visual components: a graph visualization, an uncertainty glyph, and a flow map. The graph visualization with glyphs, the flow map, and the uncertainty analysis together enable analysts to effectively find the most uncertain results and interactively refine them. We have applied our approach to several Twitter datasets. Qualitative evaluation and two real-world case studies demonstrate the promise of our approach for retrieving high-quality microblog data.
Collapse
|
49
|
Chen S, Yuan X, Wang Z, Guo C, Liang J, Wang Z, Zhang XL, Zhang J. Interactive Visual Discovering of Movement Patterns from Sparsely Sampled Geo-tagged Social Media Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:270-279. [PMID: 26340781 DOI: 10.1109/tvcg.2015.2467619] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Social media data with geotags can be used to track people's movements in their daily lives. By providing both rich text and movement information, visual analysis on social media data can be both interesting and challenging. In contrast to traditional movement data, the sparseness and irregularity of social media data increase the difficulty of extracting movement patterns. To facilitate the understanding of people's movements, we present an interactive visual analytics system to support the exploration of sparsely sampled trajectory data from social media. We propose a heuristic model to reduce the uncertainty caused by the nature of social media data. In the proposed system, users can filter and select reliable data from each derived movement category, based on the guidance of uncertainty model and interactive selection tools. By iteratively analyzing filtered movements, users can explore the semantics of movements, including the transportation methods, frequent visiting sequences and keyword descriptions. We provide two cases to demonstrate how our system can help users to explore the movement patterns.
Collapse
|