1
|
Artificial Intelligence-Based Medical Data Mining. J Pers Med 2022; 12:jpm12091359. [PMID: 36143144 PMCID: PMC9501106 DOI: 10.3390/jpm12091359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/02/2022] [Accepted: 08/17/2022] [Indexed: 11/17/2022] Open
Abstract
Understanding published unstructured textual data using traditional text mining approaches and tools is becoming a challenging issue due to the rapid increase in electronic open-source publications. The application of data mining techniques in the medical sciences is an emerging trend; however, traditional text-mining approaches are insufficient to cope with the current upsurge in the volume of published data. Therefore, artificial intelligence-based text mining tools are being developed and used to process large volumes of data and to explore the hidden features and correlations in the data. This review provides a clear-cut and insightful understanding of how artificial intelligence-based data-mining technology is being used to analyze medical data. We also describe a standard process of data mining based on CRISP-DM (Cross-Industry Standard Process for Data Mining) and the most common tools/libraries available for each step of medical data mining.
Collapse
|
2
|
Ipenza JCC, Romero NML, Loreto M, Júnior NF, Comba JLD. QDS-COVID: A visual analytics system for interactive exploration of millions of COVID-19 healthcare records in Brazil. Appl Soft Comput 2022; 124:109093. [PMID: 35677032 PMCID: PMC9164519 DOI: 10.1016/j.asoc.2022.109093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 05/23/2022] [Accepted: 05/26/2022] [Indexed: 11/29/2022]
Abstract
COVID-19 is responsible for the deaths of millions of people around the world. The scientific community has devoted its knowledge to finding ways that reduce the impact and understand the pandemic. In this work, the focus is on analyzing electronic health records for one of the largest public healthcare systems globally, the Brazilian public healthcare system called Sistema Único de Saúde (SUS). SUS collected more than 42 million flu records in a year of the pandemic and made this data publicly available. It is crucial, in this context, to apply analysis techniques that can lead to the optimization of the health care resources in SUS. We propose QDS-COVID, a visual analytics prototype for creating insights over SUS records. The prototype relies on a state-of-the-art datacube structure that supports slicing and dicing exploration of charts and Choropleth maps for all states and municipalities in Brazil. A set of analysis questions drives the development of the prototype and the construction of case studies that demonstrate the potential of the approach. The results include comparisons against other studies and feedback from a medical expert.
Collapse
|
3
|
Kwon BC, Anand V, Severson KA, Ghosh S, Sun Z, Frohnert BI, Lundgren M, Ng K. DPVis: Visual Analytics With Hidden Markov Models for Disease Progression Pathways. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3685-3700. [PMID: 32275600 DOI: 10.1109/tvcg.2020.2985689] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this article, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
Collapse
|
4
|
Kamalpour M, Rezaei Aghdam A, Watson J, Tariq A, Buys L, Eden R, Rehan S. Online health communities, contributions to caregivers and resilience of older adults. HEALTH & SOCIAL CARE IN THE COMMUNITY 2021; 29:328-343. [PMID: 33278312 DOI: 10.1111/hsc.13247] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 10/06/2020] [Accepted: 11/11/2020] [Indexed: 06/12/2023]
Abstract
The aim of this paper is twofold. Firstly, to investigate the potential benefits of online health communities (OHCs) for informal caregivers by conducting a systematic literature review. Secondly, to identify the relationship between the potential benefits of OHCs and resilience factors of older adults. Performing a thematic analysis, we identified the potential benefits of OHCs for informal caregivers of older adults, including two salient themes: (a) caregivers sharing and receiving social support and (b) self and moral empowerment of caregivers. Then, we uncovered how these potential benefits can support resilience of older adults. Our findings show that sharing and receiving of social support by informal caregivers, and self and moral empowerment of informal caregivers in OHCs, can support four resilience factors among older adults, including self-care, independence, altruism and external connections. This review enables a better understanding of OHCs and Gerontology, and our outcomes also challenge the way healthcare and aged-care service providers view caregivers and older adults. Furthermore, the identified gap and opportunities would provide avenues for further research in OHCs.
Collapse
Affiliation(s)
| | | | - Jason Watson
- Queensland University of Technology, Brisbane, Australia
| | - Amina Tariq
- Queensland University of Technology, Brisbane, Australia
| | - Laurie Buys
- University of Queensland, Brisbane, Australia
| | - Rebekah Eden
- Queensland University of Technology, Brisbane, Australia
| | - Syed Rehan
- Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
5
|
Liu S, Wang X, Collins C, Dou W, Ouyang F, El-Assady M, Jiang L, Keim DA. Bridging Text Visualization and Mining: A Task-Driven Survey. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2482-2504. [PMID: 29993887 DOI: 10.1109/tvcg.2018.2834341] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization and text mining. From the analysis, we derived around 300 concepts (visualization techniques, mining techniques, and analysis tasks) and built a taxonomy for each type of concept. The co-occurrence relationships between the concepts were also extracted. Our research can be used as a stepping-stone for other researchers to 1) understand a common set of concepts used in this research topic; 2) facilitate the exploration of the relationships between visualization techniques, mining techniques, and analysis tasks; 3) understand the current practice in developing visual text analytics tools; 4) seek potential research opportunities by narrowing the gulf between visualization and mining techniques based on the analysis tasks; and 5) analyze other interdisciplinary research areas in a similar way. We have also contributed a web-based visualization tool for analyzing and understanding research trends and opportunities in visual text analytics.
Collapse
|
6
|
Xiong X, Fu M, Zhu M, Liang J. Visual potential expert prediction in question and answering communities. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
7
|
Kwon BC, Choi MJ, Kim JT, Choi E, Kim YB, Kwon S, Sun J, Choo J. RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:299-309. [PMID: 30136973 DOI: 10.1109/tvcg.2018.2865027] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular prediction. Such black-box nature of RNNs can impede its wide adoption in clinical practice. Furthermore, we have no established methods to interactively leverage users' domain expertise and prior knowledge as inputs for steering the model. Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers. Following the iterative design process between the experts, we design, implement, and evaluate a visual analytics tool called RetainVis, which couples a newly improved, interpretable, and interactive RNN-based model called RetainEX and visualizations for users' exploration of EMR data in the context of prediction tasks. Our study shows the effective use of RetainVis for gaining insights into how individual medical codes contribute to making risk predictions, using EMRs of patients with heart failure and cataract symptoms. Our study also demonstrates how we made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity. This study will provide a useful guideline for researchers that aim to design an interpretable and interactive visual analytics tool for RNNs.
Collapse
|
8
|
Yalcin MA, Elmqvist N, Bederson BB. Keshif: Rapid and Expressive Tabular Data Exploration for Novices. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2339-2352. [PMID: 28692978 DOI: 10.1109/tvcg.2017.2723393] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
General purpose graphical interfaces for data exploration are typically based on manual visualization and interaction specifications. While designing manual specification can be very expressive, it demands high efforts to make effective decisions, therefore reducing exploratory speed. Instead, principled automated designs can increase exploratory speed, decrease learning efforts, help avoid ineffective decisions, and therefore better support data analytics novices. Towards these goals, we present Keshif, a new systematic design for tabular data exploration. To summarize a given dataset, Keshif aggregates records by value within attribute summaries, and visualizes aggregate characteristics using a consistent design based on data types. To reveal data distribution details, Keshif features three complementary linked selections: highlighting, filtering, and comparison. Keshif further increases expressiveness through aggregate metrics, absolute/part-of scale modes, calculated attributes, and saved selections, all working in synchrony. Its automated design approach also simplifies authoring of dashboards composed of summaries and individual records from raw data using fluid interaction. We show examples selected from datasets from diverse domains. Our study with novices shows that after exploring raw data for 15 minutes, our participants reached close to 30 data insights on average, comparable to other studies with skilled users using more complex tools.
Collapse
|
9
|
Chen Y, Dong Y, Sun Y, Liang J. A Multi-comparable visual analytic approach for complex hierarchical data. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.02.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Zhang J, Marmor R, Huh J. Towards Supporting Patient Decision-making In Online Diabetes Communities. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1893-1902. [PMID: 29854261 PMCID: PMC5977569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As of 2014, 29.1 million people in the US have diabetes. Patients with diabetes have evolving information needs around complex lifestyle and medical decisions. As their conditions progress, patients need to sporadically make decisions by understanding alternatives and comparing options. These moments along the decision-making process present a valuable opportunity to support their information needs. An increasing number of patients visit online diabetes communities to fulfill their information needs. To understand how patients attempt to fulfill the information needs around decision-making in online communities, we reviewed 801 posts from an online diabetes community and included 79 posts for in-depth content analysis. The findings revealed motivations for posters' inquiries related to decision-making including the changes in disease state, increased self-awareness, and conflict of information received. Medication and food were the among the most popular topics discussed as part of their decision-making inquiries. Additionally, We present insights for automatically identifying those decision-making inquiries to efficiently support information needs presented in online health communities.
Collapse
Affiliation(s)
- Jing Zhang
- University of California San Diego, San Diego, CA
| | | | - Jina Huh
- University of California San Diego, San Diego, CA
| |
Collapse
|
11
|
Abstract
User grouping in asynchronous online forums is a common phenomenon nowadays. People with similar backgrounds or shared interests like to get together in group discussions. As tens of thousands of archived conversational posts accumulate, challenges emerge for forum administrators and analysts to effectively explore user groups in large-volume threads and gain meaningful insights into the hierarchical discussions. Identifying and comparing groups in discussion threads are nontrivial, since the number of users and posts increases with time and noises may hamper the detection of user groups. Researchers in data mining fields have proposed a large body of algorithms to explore user grouping. However, the mining result is not intuitive to understand and difficult for users to explore the details. To address these issues, we present VisForum, a visual analytic system allowing people to interactively explore user groups in a forum. We work closely with two educators who have released courses in Massive Open Online Courses (MOOC) platforms to compile a list of design goals to guide our design. Then, we design and implement a multi-coordinated interface as well as several novel glyphs, i.e., group glyph, user glyph, and set glyph, with different granularities. Accordingly, we propose the group Detecting 8 Sorting Algorithm to reduce noises in a collection of posts, and employ the concept of “forum-index” for users to identify high-impact forum members. Two case studies using real-world datasets demonstrate the usefulness of the system and the effectiveness of novel glyph designs. Furthermore, we conduct an in-lab user study to present the usability of VisForum.
Collapse
Affiliation(s)
- Siwei Fu
- Hong Kong University of Science and Technology, Hong Kong
| | - Yong Wang
- Hong Kong University of Science and Technology, Hong Kong
| | - Yi Yang
- Hong Kong University of Science and Technology, Hong Kong
| | - Qingqing Bi
- Nanyang Technological University, Nanyang Ave, Singapore
| | | | - Huamin Qu
- Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
12
|
Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N. ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:361-370. [PMID: 28880180 DOI: 10.1109/tvcg.2017.2744478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.
Collapse
|
13
|
VanDam C, Kanthawala S, Pratt W, Chai J, Huh J. Detecting clinically related content in online patient posts. J Biomed Inform 2017; 75:96-106. [PMID: 28986329 PMCID: PMC5685920 DOI: 10.1016/j.jbi.2017.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/14/2017] [Accepted: 09/30/2017] [Indexed: 10/18/2022]
Abstract
Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
Collapse
Affiliation(s)
| | | | - Wanda Pratt
- University of Washington, Seattle, United States.
| | - Joyce Chai
- Michigan State University, United States.
| | - Jina Huh
- University of California San Diego, United States.
| |
Collapse
|
14
|
Nakikj D, Mamykina L. DisVis: Visualizing Discussion Threads in Online Health Communities. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:944-953. [PMID: 28269891 PMCID: PMC5333262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
An increasing number of individuals turn to online health communities (OHC) for information, advice and support about their health condition or disease. As a result of users' active participation, these forums store overwhelming volumes of information, which can make access to this information challenging and frustrating. To help overcome this problem we designed a discussion visualization tool DisVis. DisVis includes features for overviewing, browsing and finding particular information in a discussion. In a between subjects study, we tested the impact of DisVis on individuals' ability to provide an overview of a discussion, find topics of interest and summarize opinions. The study showed that after using the tool, the accuracy of participants' answers increased by 68% (p-value = 0.023) while at the same time exhibiting trends for reducing the time to answer by 38% with no statistical significance (p-value = 0.082). Qualitative interviews showed general enthusiasm regarding tools for improving browsing and searching for information within discussion forums, suggested different usage scenarios, highlighted opportunities for improving the design of DisVis, and outlined new directions for visualizing user-generated content within OHCs.
Collapse
Affiliation(s)
- Drashko Nakikj
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Lena Mamykina
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
15
|
Fu S, Zhao J, Cui W, Qu H. Visual Analysis of MOOC Forums with iForum. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:201-210. [PMID: 27514047 DOI: 10.1109/tvcg.2016.2598444] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Discussion forums of Massive Open Online Courses (MOOC) provide great opportunities for students to interact with instructional staff as well as other students. Exploration of MOOC forum data can offer valuable insights for these staff to enhance the course and prepare the next release. However, it is challenging due to the large, complicated, and heterogeneous nature of relevant datasets, which contain multiple dynamically interacting objects such as users, posts, and threads, each one including multiple attributes. In this paper, we present a design study for developing an interactive visual analytics system, called iForum, that allows for effectively discovering and understanding temporal patterns in MOOC forums. The design study was conducted with three domain experts in an iterative manner over one year, including a MOOC instructor and two official teaching assistants. iForum offers a set of novel visualization designs for presenting the three interleaving aspects of MOOC forums (i.e., posts, users, and threads) at three different scales. To demonstrate the effectiveness and usefulness of iForum, we describe a case study involving field experts, in which they use iForum to investigate real MOOC forum data for a course on JAVA programming.
Collapse
|