1
|
He H, Xiong X, Zheng Y, Hou J, Jiang T, Quan W, Huang J, Xu J, Chen K, Qian J, Cai J, Lu Y, Lian M, Xie C, Luo J. Parkin characteristics and blood biomarkers of Parkinson's disease in WPBLC study. Front Aging Neurosci 2025; 17:1511272. [PMID: 40078640 PMCID: PMC11897490 DOI: 10.3389/fnagi.2025.1511272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Accepted: 02/13/2025] [Indexed: 03/14/2025] Open
Abstract
Background The exact mechanisms of PD are unclear, but Parkin-mediated mitophagy dysfunction is believed to play a key role. We investigated whether blood levels of Parkin and other biomarkers are linked to the risk of developing PD. Methods Baseline blood measures of Parkin and other biomarkers, including Homocysteine, carcinoembryonic antigen, Urea, total proteins, total cholesterol, creatine kinase, and albumin, were collected from 197 clinically diagnosed Parkinson's disease participants and 107 age-matched healthy controls in Wenzhou Parkinson's Biomarkers and Living Characteristics study. We conducted bioinformatics analysis using three datasets from the GEO database: GSE90514 (Cohort 1: PD = 4, HC = 4), GSE7621 (Cohort 2: PD = 16, HC = 9), and GSE205450 (Cohort 3: PD = 69, HC = 81). Results Using a bioinformatic approach, we identified dysregulated biological processes in PD patients with PRKN mutations. Compared to controls, significant abnormalities were observed in blood levels of Parkin, Hcy, total proteins, urea, albumin, and CEA in PD patients. A model incorporating Parkin, Hcy, total proteins, and urea effectively distinguished PD from healthy controls, achieving a higher accuracy (AUC 0.841) than other biomarker combinations. Gene set enrichment analysis suggested that pathways such as PINK1-Parkin-mediated mitophagy, urea cycle, cysteine degradation, and riboflavin metabolism may be involved in PRKN mutation. Additionally, the link between Parkin and PD was partially mediated by CEA and albumin, not by Hcy, total proteins, or urea. Conclusion Our findings indicate that blood Parkin levels may be a minimally invasive biomarker for PD diagnosis. The model, which included Parkin, Hcy, total proteins, and urea, effectively distinguished PD from HC with greater accuracy.
Collapse
Affiliation(s)
- Haijun He
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Department of Physiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Xi Xiong
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yi Zheng
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jialong Hou
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Tao Jiang
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Weiwei Quan
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jiani Huang
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jiaxue Xu
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Keke Chen
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jingjing Qian
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jinlai Cai
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yao Lu
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Department of Neurology, Yuhuan City People’s Hospital, Taizhou, China
| | - Mengjia Lian
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Department of Neurology, The First People’s Hospital of Wenling, Taizhou, China
| | - Chenglong Xie
- Department of Neurology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Ji Luo
- Department of Neurology, The Affiliated Huzhou Hospital, Zhejiang University School of Medicine (Huzhou Central Hospital), Huzhou, China
| |
Collapse
|
2
|
Shi Y, Wang X, Chen S, Zhao Y, Wang Y, Sheng X, Qi X, Zhou L, Feng Y, Liu J, Wang C, Xing K. Identification of key genes affecting intramuscular fat deposition in pigs using machine learning models. Front Genet 2025; 15:1503148. [PMID: 39834552 PMCID: PMC11743517 DOI: 10.3389/fgene.2024.1503148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 12/09/2024] [Indexed: 01/22/2025] Open
Abstract
Intramuscular fat (IMF) is an important indicator for evaluating meat quality. Transcriptome sequencing (RNA-seq) is widely used for the study of IMF deposition. Machine learning (ML) is a new big data fitting method that can effectively fit complex data, accurately identify samples and genes, and it plays an important role in omics research. Therefore, this study aimed to analyze RNA-seq data by ML method to identify differentially expressed genes (DEGs) affecting IMF deposition in pigs. In this study, a total of 74 RNA-seq data from muscle tissue samples were used. A total of 155 DEGs were identified using a limma package between the two groups. 100 and 11 significant genes were identified by support vector machine recursive feature elimination (SVM-RFE) and random forest (RF) models, respectively. A total of six intersecting genes were in both models. KEGG pathway enrichment analysis of the intersecting genes revealed that these genes were enriched in pathways associated with lipid deposition. These pathways include α-linolenic acid metabolism, linoleic acid metabolism, ether lipid metabolism, arachidonic acid metabolism, and glycerophospholipid metabolism. Four key genes affecting intramuscular fat deposition, PLA2G6, MPV17, NUDT2, and ND4L, were identified based on significant pathways. The results of this study are important for the elucidation of the molecular regulatory mechanism of intramuscular fat deposition and the effective improvement of IMF content in pigs.
Collapse
Affiliation(s)
- Yumei Shi
- College of Animal Science and Technology, China Agricultural University, Beijing, China
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China
| | - Xini Wang
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | - Yanhui Zhao
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China
| | - Yan Wang
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China
| | - Xihui Sheng
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China
| | - Xiaolong Qi
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China
| | - Lei Zhou
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Yu Feng
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Jianfeng Liu
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Chuduan Wang
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Kai Xing
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
3
|
Deng D, Zhang C, Zheng H, Pu Y, Ji S, Wu Y. AdversaFlow: Visual Red Teaming for Large Language Models with Multi-Level Adversarial Flow. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:492-502. [PMID: 39283796 DOI: 10.1109/tvcg.2024.3456150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Large Language Models (LLMs) are powerful but also raise significant security concerns, particularly regarding the harm they can cause, such as generating fake news that manipulates public opinion on social media and providing responses to unethical activities. Traditional red teaming approaches for identifying AI vulnerabilities rely on manual prompt construction and expertise. This paper introduces AdversaFlow, a novel visual analytics system designed to enhance LLM security against adversarial attacks through human-AI collaboration. AdversaFlow involves adversarial training between a target model and a red model, featuring unique multi-level adversarial flow and fluctuation path visualizations. These features provide insights into adversarial dynamics and LLM robustness, enabling experts to identify and mitigate vulnerabilities effectively. We present quantitative evaluations and case studies validating our system's utility and offering insights for future AI security solutions. Our method can enhance LLM security, supporting downstream scenarios like social media regulation by enabling more effective detection, monitoring, and mitigation of harmful content and behaviors.
Collapse
|
4
|
Zhang Z, Yang F, Cheng R, Ma Y. ParetoTracker: Understanding Population Dynamics in Multi-Objective Evolutionary Algorithms Through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:820-830. [PMID: 39255166 DOI: 10.1109/tvcg.2024.3456142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Multi-objective evolutionary algorithms (MOEAs) have emerged as powerful tools for solving complex optimization problems characterized by multiple, often conflicting, objectives. While advancements have been made in computational efficiency as well as diversity and convergence of solutions, a critical challenge persists: the internal evolutionary mechanisms are opaque to human users. Drawing upon the successes of explainable AI in explaining complex algorithms and models, we argue that the need to understand the underlying evolutionary operators and population dynamics within MOEAs aligns well with a visual analytics paradigm. This paper introduces ParetoTracker, a visual analytics framework designed to support the comprehension and inspection of population dynamics in the evolutionary processes of MOEAs. Informed by preliminary literature review and expert interviews, the framework establishes a multi-level analysis scheme, which caters to user engagement and exploration ranging from examining overall trends in performance metrics to conducting fine-grained inspections of evolutionary operations. In contrast to conventional practices that require manual plotting of solutions for each generation, ParetoTracker facilitates the examination of temporal trends and dynamics across consecutive generations in an integrated visual interface. The effectiveness of the framework is demonstrated through case studies and expert interviews focused on widely adopted benchmark optimization problems.
Collapse
|
5
|
Mo Y, Bier R, Li X, Daniels M, Smith A, Yu L, Kan J. Agricultural practices influence soil microbiome assembly and interactions at different depths identified by machine learning. Commun Biol 2024; 7:1349. [PMID: 39424928 PMCID: PMC11489707 DOI: 10.1038/s42003-024-07059-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024] Open
Abstract
Agricultural practices affect soil microbes which are critical to soil health and sustainable agriculture. To understand prokaryotic and fungal assembly under agricultural practices, we use machine learning-based methods. We show that fertility source is the most pronounced factor for microbial assembly especially for fungi, and its effect decreases with soil depths. Fertility source also shapes microbial co-occurrence patterns revealed by machine learning, leading to fungi-dominated modules sensitive to fertility down to 30 cm depth. Tillage affects soil microbiomes at 0-20 cm depth, enhancing dispersal and stochastic processes but potentially jeopardizing microbial interactions. Cover crop effects are less pronounced and lack depth-dependent patterns. Machine learning reveals that the impact of agricultural practices on microbial communities is multifaceted and highlights the role of fertility source over the soil depth. Machine learning overcomes the linear limitations of traditional methods and offers enhanced insights into the mechanisms underlying microbial assembly and distributions in agriculture soils.
Collapse
Affiliation(s)
- Yujie Mo
- Sino-French Engineer School, Beihang University, Beijing, China
| | - Raven Bier
- Stroud Water Research Center, Avondale, PA, USA
- Savannah River Ecology Laboratory, University of Georgia, Aiken, SC, USA
| | - Xiaolin Li
- Zibo Vocational Institute, Zibo, Shandong, China
| | | | | | - Lei Yu
- Sino-French Engineer School, Beihang University, Beijing, China.
| | - Jinjun Kan
- Stroud Water Research Center, Avondale, PA, USA.
| |
Collapse
|
6
|
Cálem J, Moreira C, Jorge J. Intelligent systems in healthcare: A systematic survey of explainable user interfaces. Comput Biol Med 2024; 180:108908. [PMID: 39067152 DOI: 10.1016/j.compbiomed.2024.108908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/05/2024] [Accepted: 07/15/2024] [Indexed: 07/30/2024]
Abstract
With radiology shortages affecting over half of the global population, the potential of artificial intelligence to revolutionize medical diagnosis and treatment is ever more important. However, lacking trust from medical professionals hinders the widespread adoption of AI models in health sciences. Explainable AI (XAI) aims to increase trust and understanding of black box models by identifying biases and providing transparent explanations. This is the first survey that explores explainable user interfaces (XUI) from a medical domain perspective, analysing the visualization and interaction methods employed in current medical XAI systems. We analysed 42 explainable interfaces following the PRISMA methodology, emphasizing the critical role of effectively conveying information to users as part of the explanation process. We contribute a taxonomy of interface design properties and identify five distinct clusters of research papers. Future research directions include contestability in medical decision support, counterfactual explanations for images, and leveraging Large Language Models to enhance XAI interfaces in healthcare.
Collapse
Affiliation(s)
- João Cálem
- Instituto Superior Técnico, Universidade de Lisboa, Portugal; INESC-ID, Portugal.
| | - Catarina Moreira
- Data Science Institute, University of Technology Sydney, Australia; INESC-ID, Portugal
| | - Joaquim Jorge
- Instituto Superior Técnico, Universidade de Lisboa, Portugal; INESC-ID, Portugal
| |
Collapse
|
7
|
Angelini M, Blasilli G, Lenti S, Santucci G. A Visual Analytics Conceptual Framework for Explorable and Steerable Partial Dependence Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4497-4513. [PMID: 37027262 DOI: 10.1109/tvcg.2023.3263739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Machine learning techniques are a driving force for research in various fields, from credit card fraud detection to stock analysis. Recently, a growing interest in increasing human involvement has emerged, with the primary goal of improving the interpretability of machine learning models. Among different techniques, Partial Dependence Plots (PDP) represent one of the main model-agnostic approaches for interpreting how the features influence the prediction of a machine learning model. However, its limitations (i.e., visual interpretation, aggregation of heterogeneous effects, inaccuracy, and computability) could complicate or misdirect the analysis. Moreover, the resulting combinatorial space can be challenging to explore both computationally and cognitively when analyzing the effects of more features at the same time. This article proposes a conceptual framework that enables effective analysis workflows, mitigating state-of-the-art limitations. The proposed framework allows for exploring and refining computed partial dependences, observing incrementally accurate results, and steering the computation of new partial dependences on user-selected subspaces of the combinatorial and intractable space. With this approach, the user can save both computational and cognitive costs, in contrast with the standard monolithic approach that computes all the possible combinations of features on all their domains in batch. The framework is the result of a careful design process involving experts' knowledge during its validation and informed the development of a prototype, W4SP1, that demonstrates its applicability traversing its different paths. A case study shows the advantages of the proposed approach.
Collapse
|
8
|
Mesquita F, Bernardino J, Henriques J, Raposo JF, Ribeiro RT, Paredes S. Machine learning techniques to predict the risk of developing diabetic nephropathy: a literature review. J Diabetes Metab Disord 2024; 23:825-839. [PMID: 38932857 PMCID: PMC11196462 DOI: 10.1007/s40200-023-01357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/20/2023] [Indexed: 06/28/2024]
Abstract
Purpose Diabetes is a major public health challenge with widespread prevalence, often leading to complications such as Diabetic Nephropathy (DN)-a chronic condition that progressively impairs kidney function. In this context, it is important to evaluate if Machine learning models can exploit the inherent temporal factor in clinical data to predict the risk of developing DN faster and more accurately than current clinical models. Methods Three different databases were used for this literature review: Scopus, Web of Science, and PubMed. Only articles written in English and published between January 2015 and December 2022 were included. Results We included 11 studies, from which we discuss a number of algorithms capable of extracting knowledge from clinical data, incorporating dynamic aspects in patient assessment, and exploring their evolution over time. We also present a comparison of the different approaches, their performance, advantages, disadvantages, interpretation, and the value that the time factor can bring to a more successful prediction of diabetic nephropathy. Conclusion Our analysis showed that some studies ignored the temporal factor, while others partially exploited it. Greater use of the temporal aspect inherent in Electronic Health Records (EHR) data, together with the integration of omics data, could lead to the development of more reliable and powerful predictive models.
Collapse
Affiliation(s)
- F. Mesquita
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
| | - J. Bernardino
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - J. Henriques
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - JF. Raposo
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - RT. Ribeiro
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - S. Paredes
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| |
Collapse
|
9
|
Yang T, Hu H, Li X, Meng Q, Lu H, Huang Q. An efficient Fusion-Purification Network for Cervical pap-smear image classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 251:108199. [PMID: 38728830 DOI: 10.1016/j.cmpb.2024.108199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/28/2024] [Accepted: 04/21/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND AND OBJECTIVES In cervical cell diagnostics, autonomous screening technology constitutes the foundation of automated diagnostic systems. Currently, numerous deep learning-based classification techniques have been successfully implemented in the analysis of cervical cell images, yielding favorable outcomes. Nevertheless, efficient discrimination of cervical cells continues to be challenging due to large intra-class and small inter-class variations. The key to dealing with this problem is to capture localized informative differences from cervical cell images and to represent discriminative features efficiently. Existing methods neglect the importance of global morphological information, resulting in inadequate feature representation capability. METHODS To address this limitation, we propose a novel cervical cell classification model that focuses on purified fusion information. Specifically, we first integrate the detailed texture information and morphological structure features, named cervical pathology information fusion. Second, in order to enhance the discrimination of cervical cell features and address the data redundancy and bias inherent after fusion, we design a cervical purification bottleneck module. This model strikes a balance between leveraging purified features and facilitating high-efficiency discrimination. Furthermore, we intend to unveil a more intricate cervical cell dataset: Cervical Cytopathology Image Dataset (CCID). RESULTS Extensive experiments on two real-world datasets show that our proposed model outperforms state-of-the-art cervical cell classification models. CONCLUSIONS The results show that our method can well help pathologists to accurately evaluate cervical smears.
Collapse
Affiliation(s)
- Tianjin Yang
- College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Hexuan Hu
- College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Xing Li
- College of information Science and Technology & College of Artificial Intelligence, Nanjing Forestry University, Nanjing 210037, PR China.
| | - Qing Meng
- College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Hao Lu
- College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| | - Qian Huang
- College of Computer and Information, Hohai University, Nanjing, 211100, PR China.
| |
Collapse
|
10
|
Floricel C, Wentzel A, Mohamed A, Fuller CD, Canahuate G, Marai GE. Roses Have Thorns: Understanding the Downside of Oncological Care Delivery Through Visual Analytics and Sequential Rule Mining. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:1227-1237. [PMID: 38015695 PMCID: PMC10842255 DOI: 10.1109/tvcg.2023.3326939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Personalized head and neck cancer therapeutics have greatly improved survival rates for patients, but are often leading to understudied long-lasting symptoms which affect quality of life. Sequential rule mining (SRM) is a promising unsupervised machine learning method for predicting longitudinal patterns in temporal data which, however, can output many repetitive patterns that are difficult to interpret without the assistance of visual analytics. We present a data-driven, human-machine analysis visual system developed in collaboration with SRM model builders in cancer symptom research, which facilitates mechanistic knowledge discovery in large scale, multivariate cohort symptom data. Our system supports multivariate predictive modeling of post-treatment symptoms based on during-treatment symptoms. It supports this goal through an SRM, clustering, and aggregation back end, and a custom front end to help develop and tune the predictive models. The system also explains the resulting predictions in the context of therapeutic decisions typical in personalized care delivery. We evaluate the resulting models and system with an interdisciplinary group of modelers and head and neck oncology researchers. The results demonstrate that our system effectively supports clinical and symptom research.
Collapse
|
11
|
Collaris D, van Wijk JJ. StrategyAtlas: Strategy Analysis for Machine Learning Interpretability. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2996-3008. [PMID: 35085084 DOI: 10.1109/tvcg.2022.3146806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Businesses in high-risk environments have been reluctant to adopt modern machine learning approaches due to their complex and uninterpretable nature. Most current solutions provide local, instance-level explanations, but this is insufficient for understanding the model as a whole. In this work, we show that strategy clusters (i.e., groups of data instances that are treated distinctly by the model) can be used to understand the global behavior of a complex ML model. To support effective exploration and understanding of these clusters, we introduce StrategyAtlas, a system designed to analyze and explain model strategies. Furthermore, it supports multiple ways to utilize these strategies for simplifying and improving the reference model. In collaboration with a large insurance company, we present a use case in automatic insurance acceptance, and show how professional data scientists were enabled to understand a complex model and improve the production model based on these insights.
Collapse
|
12
|
Li Y, Wang J, Fujiwara T, Ma KL. Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3587470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Adversarial attacks on a convolutional neural network (CNN)—injecting human-imperceptible perturbations into an input image—could fool a high-performance CNN into making incorrect predictions. The success of adversarial attacks raises serious concerns about the robustness of CNNs, and prevents them from being used in safety-critical applications, such as medical diagnosis and autonomous driving. Our work introduces a visual analytics approach to understanding adversarial attacks by answering two questions: (1)
which neurons are more vulnerable to attacks
and (2)
which image features do these vulnerable neurons capture during the prediction?
For the first question, we introduce multiple perturbation-based measures to break down the attacking magnitude into individual CNN neurons and rank the neurons by their vulnerability levels. For the second, we identify image features (e.g., cat ears) that highly stimulate a user-selected neuron to augment and validate the neuron’s responsibility. Furthermore, we support an interactive exploration of a large number of neurons by aiding with hierarchical clustering based on the neurons’ roles in the prediction. To this end, a visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks. We validate the effectiveness of our system through multiple case studies as well as feedback from domain experts.
Collapse
Affiliation(s)
- Yiran Li
- University of California, Davis, USA
| | | | | | | |
Collapse
|
13
|
Explainable Ensemble Trees. Comput Stat 2023. [DOI: 10.1007/s00180-022-01312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
14
|
Yang CJ, Huang WK, Lin KP. Three-Dimensional Printing Quality Inspection Based on Transfer Learning with Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2023; 23:491. [PMID: 36617085 PMCID: PMC9824655 DOI: 10.3390/s23010491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 12/26/2022] [Accepted: 12/30/2022] [Indexed: 06/17/2023]
Abstract
Fused deposition modeling (FDM) is a form of additive manufacturing where three-dimensional (3D) models are created by depositing melted thermoplastic polymer filaments in layers. Although FDM is a mature process, defects can occur during printing. Therefore, an image-based quality inspection method for 3D-printed objects of varying geometries was developed in this study. Transfer learning with pretrained models, which were used as feature extractors, was combined with ensemble learning, and the resulting model combinations were used to inspect the quality of FDM-printed objects. Model combinations with VGG16 and VGG19 had the highest accuracy in most situations. Furthermore, the classification accuracies of these model combinations were not significantly affected by differences in color. In summary, the combination of transfer learning with ensemble learning is an effective method for inspecting the quality of 3D-printed objects. It reduces time and material wastage and improves 3D printing quality.
Collapse
Affiliation(s)
- Cheng-Jung Yang
- Program in Interdisciplinary Studies, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Wei-Kai Huang
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Keng-Pei Lin
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| |
Collapse
|
15
|
An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
16
|
Yuan J, Liu M, Tian F, Liu S. Visual Analysis of Neural Architecture Spaces for Summarizing Design Principles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:288-298. [PMID: 36191103 DOI: 10.1109/tvcg.2022.3209404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind our method is to make the architecture space explainable by exploiting structural distances between architectures. We formulate the pairwise distance calculation as solving an all-pairs shortest path problem. To improve efficiency, we decompose this problem into a set of single-source shortest path problems. The time complexity is reduced from O(kn2N) to O(knN). Architectures are hierarchically clustered according to the distances between them. A circle-packing-based architecture visualization has been developed to convey both the global relationships between clusters and local neighborhoods of the architectures in each cluster. Two case studies and a post-analysis are presented to demonstrate the effectiveness of ArchExplorer in summarizing design principles and selecting better-performing architectures.
Collapse
|
17
|
Global reliable data generation for imbalanced binary classification with latent codes reconstruction and feature repulsion. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04330-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
18
|
Yuan J, Barr B, Overton K, Bertini E. Visual Exploration of Machine Learning Model Behavior with Hierarchical Surrogate Rule Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; PP:1470-1488. [PMID: 36327192 DOI: 10.1109/tvcg.2022.3219232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
One of the potential solutions for model interpretation is to train a surrogate model: a more transparent model that approximates the behavior of the model to be explained. Typically, classification rules or decision trees are used due to their logic-based expressions. However, decision trees can grow too deep, and rule sets can become too large to approximate a complex model. Unlike paths on a decision tree that must share ancestor nodes (conditions), rules are more flexible. However, the unstructured visual representation of rules makes it hard to make inferences across rules. In this paper, we focus on tabular data and present novel algorithmic and interactive solutions to address these issues. First, we present Hierarchical Surrogate Rules (HSR), an algorithm that generates hierarchical rules based on user-defined parameters. We also contribute SuRE, a visual analytics (VA) system that integrates HSR and an interactive surrogate rule visualization, the Feature-Aligned Tree, which depicts rules as trees while aligning features for easier comparison. We evaluate the algorithm in terms of parameter sensitivity, time performance, and comparison with surrogate decision trees and find that it scales reasonably well and overcomes the shortcomings of surrogate decision trees. We evaluate the visualization and the system through a usability study and an observational study with domain experts. Our investigation shows that the participants can use feature-aligned trees to perform non-trivial tasks with very high accuracy. We also discuss many interesting findings, including a rule analysis task characterization, that can be used for visualization design and future research.
Collapse
|
19
|
XRRF: An eXplainable Reasonably Randomised Forest algorithm for classification and regression problems. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.09.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
20
|
Streeb D, Metz Y, Schlegel U, Schneider B, El-Assady M, Neth H, Chen M, Keim DA. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3307-3323. [PMID: 33439846 DOI: 10.1109/tvcg.2020.3045560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Visual analytics enables the coupling of machine learning models and humans in a tightly integrated workflow, addressing various analysis tasks. Each task poses distinct demands to analysts and decision-makers. In this survey, we focus on one canonical technique for rule-based classification, namely decision tree classifiers. We provide an overview of available visualizations for decision trees with a focus on how visualizations differ with respect to 16 tasks. Further, we investigate the types of visual designs employed, and the quality measures presented. We find that (i) interactive visual analytics systems for classifier development offer a variety of visual designs, (ii) utilization tasks are sparsely covered, (iii) beyond classifier development, node-link diagrams are omnipresent, (iv) even systems designed for machine learning experts rarely feature visual representations of quality measures other than accuracy. In conclusion, we see a potential for integrating algorithmic techniques, mathematical quality measures, and tailored interactive visualizations to enable human experts to utilize their knowledge more effectively.
Collapse
|
21
|
Sage AJ, Liu Y, Sato J. From Black Box to Shining Spotlight. AM STAT 2022. [DOI: 10.1080/00031305.2022.2107568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Andrew J. Sage
- Department of Mathematics, Statistics, and Computer Science, Lawrence University, Appleton, WI
| | - Yang Liu
- Department of Mathematics, Statistics, and Computer Science, Lawrence University, Appleton, WI
| | - Joe Sato
- Department of Mathematics, Statistics, and Computer Science, Lawrence University, Appleton, WI
| |
Collapse
|
22
|
Conclusive local interpretation rules for random forests. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00839-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
23
|
Cheng X, Doosthosseini A, Kunkel J. Improve the Deep Learning Models in Forestry Based on Explanations and Expertise. FRONTIERS IN PLANT SCIENCE 2022; 13:902105. [PMID: 35677249 PMCID: PMC9169801 DOI: 10.3389/fpls.2022.902105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 04/25/2022] [Indexed: 06/15/2023]
Abstract
In forestry studies, deep learning models have achieved excellent performance in many application scenarios (e.g., detecting forest damage). However, the unclear model decisions (i.e., black-box) undermine the credibility of the results and hinder their practicality. This study intends to obtain explanations of such models through the use of explainable artificial intelligence methods, and then use feature unlearning methods to improve their performance, which is the first such attempt in the field of forestry. Results of three experiments show that the model training can be guided by expertise to gain specific knowledge, which is reflected by explanations. For all three experiments based on synthetic and real leaf images, the improvement of models is quantified in the classification accuracy (up to 4.6%) and three indicators of explanation assessment (i.e., root-mean-square error, cosine similarity, and the proportion of important pixels). Besides, the introduced expertise in annotation matrix form was automatically created in all experiments. This study emphasizes that studies of deep learning in forestry should not only pursue model performance (e.g., higher classification accuracy) but also focus on the explanations and try to improve models according to the expertise.
Collapse
Affiliation(s)
- Ximeng Cheng
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Göttingen, Germany
- Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany
| | - Ali Doosthosseini
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Göttingen, Germany
| | - Julian Kunkel
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Göttingen, Germany
| |
Collapse
|
24
|
Visual Analytics for Predicting Disease Outcomes Using Laboratory Test Results. INFORMATICS 2022. [DOI: 10.3390/informatics9010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Laboratory tests play an essential role in the early and accurate diagnosis of diseases. In this paper, we propose SUNRISE, a visual analytics system that allows the user to interactively explore the relationships between laboratory test results and a disease outcome. SUNRISE integrates frequent itemset mining (i.e., Eclat algorithm) with extreme gradient boosting (XGBoost) to develop more specialized and accurate prediction models. It also includes interactive visualizations to allow the user to interact with the model and track the decision process. SUNRISE helps the user probe the prediction model by generating input examples and observing how the model responds. Furthermore, it improves the user’s confidence in the generated predictions and provides them the means to validate the model’s response by illustrating the underlying working mechanism of the prediction models through visualization representations. SUNRISE offers a balanced distribution of processing load through the seamless integration of analytical methods with interactive visual representations to support the user’s cognitive tasks. We demonstrate the usefulness of SUNRISE through a usage scenario of exploring the association between laboratory test results and acute kidney injury, using large provincial healthcare databases from Ontario, Canada.
Collapse
|
25
|
Liu J, Wang J, Yu W, Wang Z, Zhong G, He F. Semi-supervised deep learning recognition method for the new classes of faults in wind turbine system. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03024-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
Pu J, Shao H, Gao B, Zhu Z, Zhu Y, Rao Y, Xiang Y. matExplorer: Visual Exploration on Predicting Ionic Conductivity for Solid-state Electrolytes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:65-75. [PMID: 34587048 DOI: 10.1109/tvcg.2021.3114812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lithium ion batteries (LIBs) are widely used as important energy sources for mobile phones, electric vehicles, and drones. Experts have attempted to replace liquid electrolytes with solid electrolytes that have wider electrochemical window and higher stability due to the potential safety risks, such as electrolyte leakage, flammable solvents, poor thermal stability, and many side reactions caused by liquid electrolytes. However, finding suitable alternative materials using traditional approaches is very difficult due to the incredibly high cost in searching. Machine learning (ML)-based methods are currently introduced and used for material prediction. However, learning tools designed for domain experts to conduct intuitive performance comparison and analysis of ML models are rare. In this case, we propose an interactive visualization system for experts to select suitable ML models and understand and explore the predication results comprehensively. Our system uses a multifaceted visualization scheme designed to support analysis from various perspectives, such as feature distribution, data similarity, model performance, and result presentation. Case studies with actual lab experiments have been conducted by the experts, and the final results confirmed the effectiveness and helpfulness of our system.
Collapse
|
27
|
|
28
|
Classification Algorithm Using Branches Importance. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10664-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
Comparative evaluation of contribution-value plots for machine learning understanding. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00776-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Abstract
The field of explainable artificial intelligence aims to help experts understand complex machine learning models. One key approach is to show the impact of a feature on the model prediction. This helps experts to verify and validate the predictions the model provides. However, many challenges remain open. For example, due to the subjective nature of interpretability, a strict definition of concepts such as the contribution of a feature remains elusive. Different techniques have varying underlying assumptions, which can cause inconsistent and conflicting views. In this work, we introduce local and global contribution-value plots as a novel approach to visualize feature impact on predictions and the relationship with feature value. We discuss design decisions and show an exemplary visual analytics implementation that provides new insights into the model. We conducted a user study and found the visualizations aid model interpretation by increasing correctness and confidence and reducing the time taken to obtain an insight.
Graphic Abstract
Collapse
|
30
|
Abstract
The use of data analysis techniques in electronic health records (EHRs) offers great promise in improving predictive risk modeling. Although useful, these analysis techniques often suffer from a lack of interpretability and transparency, especially when the data is high-dimensional. The emergence of a type of computational system known as visual analytics has the potential to address these issues by integrating data analysis techniques with interactive visualizations. This paper introduces a visual analytics system called VERONICA that utilizes the natural classification of features in EHRs to identify the group of features with the strongest predictive power. VERONICA incorporates a representative set of supervised machine learning techniques—namely, classification and regression tree, C5.0, random forest, support vector machines, and naive Bayes to support users in developing predictive models using EHRs. It then makes the analytics results accessible through an interactive visual interface. By integrating different sampling strategies, analytics algorithms, visualization techniques, and human-data interaction, VERONICA assists users in comparing prediction models in a systematic way. To demonstrate the usefulness and utility of our proposed system, we use the clinical dataset stored at ICES to identify the best representative feature groups in detecting patients who are at high risk of developing acute kidney injury.
Collapse
|
31
|
MulUBA: multi-level visual analytics of user behaviors for improving online shopping advertising. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00771-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
32
|
Sun G, Wu H, Zhu L, Xu C, Liang H, Xu B, Liang R. VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3458928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.
Collapse
Affiliation(s)
- Guodao Sun
- Zhejiang University of Technology, Hangzhou, China
| | - Hao Wu
- Zhejiang University of Technology, Hangzhou, China
| | - Lin Zhu
- Zhejiang University of Technology, Hangzhou, China
| | - Chaoqing Xu
- Zhejiang University of Technology, Hangzhou, China
| | - Haoran Liang
- Zhejiang University of Technology, Hangzhou, China
| | - Binwei Xu
- Zhejiang University of Technology, Hangzhou, China
| | | |
Collapse
|
33
|
Villa-Pérez ME, Álvarez-Carmona MÁ, Loyola-González O, Medina-Pérez MA, Velazco-Rossell JC, Choo KKR. Semi-supervised anomaly detection algorithms: A comparative summary and future research directions. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106878] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Abuhmed T, El-Sappagh S, Alonso JM. Robust hybrid deep learning models for Alzheimer’s progression detection. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106688] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
35
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1547-1557. [PMID: 33048687 DOI: 10.1109/tvcg.2020.3030352] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
Collapse
|
36
|
Ma Y, Fan A, He J, Nelakurthi AR, Maciejewski R. A Visual Analytics Framework for Explaining and Diagnosing Transfer Learning Processes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1385-1395. [PMID: 33035164 DOI: 10.1109/tvcg.2020.3028888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many statistical learning models hold an assumption that the training data and the future unlabeled data are drawn from the same distribution. However, this assumption is difficult to fulfill in real-world scenarios and creates barriers in reusing existing labels from similar application domains. Transfer Learning is intended to relax this assumption by modeling relationships between domains, and is often applied in deep learning applications to reduce the demand for labeled data and training time. Despite recent advances in exploring deep learning models with visual analytics tools, little work has explored the issue of explaining and diagnosing the knowledge transfer process between deep learning models. In this paper, we present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks. Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks. Based on a comprehensive requirement and task analysis, we employ descriptive visualization with performance measures and detailed inspections of model behaviors from the statistical, instance, feature, and model structure levels. We demonstrate our framework through two case studies on image classification by fine-tuning AlexNets to illustrate how analysts can utilize our framework.
Collapse
|
37
|
Xie X, Du F, Wu Y. A Visual Analytics Approach for Exploratory Causal Analysis: Exploration, Validation, and Applications. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1448-1458. [PMID: 33026999 DOI: 10.1109/tvcg.2020.3028957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Using causal relations to guide decision making has become an essential analytical task across various domains, from marketing and medicine to education and social science. While powerful statistical models have been developed for inferring causal relations from data, domain practitioners still lack effective visual interface for interpreting the causal relations and applying them in their decision-making process. Through interview studies with domain experts, we characterize their current decision-making workflows, challenges, and needs. Through an iterative design process, we developed a visualization tool that allows analysts to explore, validate, and apply causal relations in real-world decision-making scenarios. The tool provides an uncertainty-aware causal graph visualization for presenting a large set of causal relations inferred from high-dimensional data. On top of the causal graph, it supports a set of intuitive user controls for performing what-if analyses and making action plans. We report on two case studies in marketing and student advising to demonstrate that users can effectively explore causal relations and design action plans for reaching their goals.
Collapse
|
38
|
Neto MP, Paulovich FV. Explainable Matrix - Visualization for Global and Local Interpretability of Random Forest Classification Ensembles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1427-1437. [PMID: 33048689 DOI: 10.1109/tvcg.2020.3030354] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Over the past decades, classification models have proven to be essential machine learning tools given their potential and applicability in various domains. In these years, the north of the majority of the researchers had been to improve quantitative metrics, notwithstanding the lack of information about models' decisions such metrics convey. This paradigm has recently shifted, and strategies beyond tables and numbers to assist in interpreting models' decisions are increasing in importance. Part of this trend, visualization techniques have been extensively used to support classification models' interpretability, with a significant focus on rule-based models. Despite the advances, the existing approaches present limitations in terms of visual scalability, and the visualization of large and complex models, such as the ones produced by the Random Forest (RF) technique, remains a challenge. In this paper, we propose Explainable Matrix (ExMatrix), a novel visualization method for RF interpretability that can handle models with massive quantities of rules. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates, enabling the analysis of entire models and auditing classification results. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
Collapse
|
39
|
El-Sappagh S, Alonso JM, Islam SMR, Sultan AM, Kwak KS. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer's disease. Sci Rep 2021; 11:2660. [PMID: 33514817 PMCID: PMC7846613 DOI: 10.1038/s41598-021-82098-3] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 12/29/2020] [Indexed: 01/30/2023] Open
Abstract
Alzheimer's disease (AD) is the most common type of dementia. Its diagnosis and progression detection have been intensively studied. Nevertheless, research studies often have little effect on clinical practice mainly due to the following reasons: (1) Most studies depend mainly on a single modality, especially neuroimaging; (2) diagnosis and progression detection are usually studied separately as two independent problems; and (3) current studies concentrate mainly on optimizing the performance of complex machine learning models, while disregarding their explainability. As a result, physicians struggle to interpret these models, and feel it is hard to trust them. In this paper, we carefully develop an accurate and interpretable AD diagnosis and progression detection model. This model provides physicians with accurate decisions along with a set of explanations for every decision. Specifically, the model integrates 11 modalities of 1048 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) real-world dataset: 294 cognitively normal, 254 stable mild cognitive impairment (MCI), 232 progressive MCI, and 268 AD. It is actually a two-layer model with random forest (RF) as classifier algorithm. In the first layer, the model carries out a multi-class classification for the early diagnosis of AD patients. In the second layer, the model applies binary classification to detect possible MCI-to-AD progression within three years from a baseline diagnosis. The performance of the model is optimized with key markers selected from a large set of biological and clinical measures. Regarding explainability, we provide, for each layer, global and instance-based explanations of the RF classifier by using the SHapley Additive exPlanations (SHAP) feature attribution framework. In addition, we implement 22 explainers based on decision trees and fuzzy rule-based systems to provide complementary justifications for every RF decision in each layer. Furthermore, these explanations are represented in natural language form to help physicians understand the predictions. The designed model achieves a cross-validation accuracy of 93.95% and an F1-score of 93.94% in the first layer, while it achieves a cross-validation accuracy of 87.08% and an F1-Score of 87.09% in the second layer. The resulting system is not only accurate, but also trustworthy, accountable, and medically applicable, thanks to the provided explanations which are broadly consistent with each other and with the AD medical literature. The proposed system can help to enhance the clinical understanding of AD diagnosis and progression processes by providing detailed insights into the effect of different modalities on the disease risk.
Collapse
Affiliation(s)
- Shaker El-Sappagh
- Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain.
- Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Banha, 13518, Egypt.
| | - Jose M Alonso
- Centro Singular de Investigación en Tecnoloxías Intelixentes, Universidade de Santiago de Compostela, 15703, Santiago, Spain
| | - S M Riazul Islam
- Department of Computer Science and Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006, Korea
| | - Ahmad M Sultan
- Gastrointestinal Surgical Center, Faculty of Medicine, Mansoura University, Mansura, 35516, Egypt
| | - Kyung Sup Kwak
- Department of Information and Communication Engineering, Inha University, Incheon, 22212, South Korea.
| |
Collapse
|
40
|
Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100690] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
41
|
Spinner T, Schlegel U, Schafer H, El-Assady M. explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1064-1074. [PMID: 31442998 DOI: 10.1109/tvcg.2019.2934629] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose a framework for interactive and explainable machine learning that enables users to (1) understand machine learning models; (2) diagnose model limitations using different explainable AI methods; as well as (3) refine and optimize the models. Our framework combines an iterative XAI pipeline with eight global monitoring and steering mechanisms, including quality monitoring, provenance tracking, model comparison, and trust building. To operationalize the framework, we present explAIner, a visual analytics system for interactive and explainable machine learning that instantiates all phases of the suggested pipeline within the commonly used TensorBoard environment. We performed a user-study with nine participants across different expertise levels to examine their perception of our workflow and to collect suggestions to fill the gap between our system and framework. The evaluation confirms that our tightly integrated system leads to an informed machine learning process while disclosing opportunities for further extensions.
Collapse
|
42
|
Ma Y, Xie T, Li J, Maciejewski R. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1075-1085. [PMID: 31478859 DOI: 10.1109/tvcg.2019.2934631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Machine learning models are currently being deployed in a variety of real-world applications where model predictions are used to make decisions about healthcare, bank loans, and numerous other critical tasks. As the deployment of artificial intelligence technologies becomes ubiquitous, it is unsurprising that adversaries have begun developing methods to manipulate machine learning models to their advantage. While the visual analytics community has developed methods for opening the black box of machine learning models, little work has focused on helping the user understand their model vulnerabilities in the context of adversarial attacks. In this paper, we present a visual analytics framework for explaining and exploring model vulnerabilities to adversarial attacks. Our framework employs a multi-faceted visualization scheme designed to support the analysis of data poisoning attacks from the perspective of models, data instances, features, and local structures. We demonstrate our framework through two case studies on binary classifiers and illustrate model vulnerabilities with respect to varying attack strategies.
Collapse
|