1
|
Wei S, Gao Z, Yao H, Qi X, Wang M, Huang J. Counterfactual explanations of tree based ensemble models for brain disease analysis with structure function coupling. Sci Rep 2025; 15:8524. [PMID: 40075142 PMCID: PMC11904222 DOI: 10.1038/s41598-025-92316-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 02/26/2025] [Indexed: 03/14/2025] Open
Abstract
Convergent evidence has suggested that the disruption of either structural connectivity (SC) or functional connectivity (FC) in the brain can lead to various neuropsychiatric disorders. Since changes in SC-FC coupling may be more sensitive than a single modality to detect subtle brain connectivity abnormalities, a few learning-based methods have been proposed to explore the relationship between SC and FC. However, these existing methods still fail to explain the relationship between altered SC-FC coupling and brain disorders. Therefore, in this paper, we explore three types of tree-based ensemble models (i.e., Decision Tree, Random Forest, and Adaptive Boosting) toward counterfactual explanations for SC-FC coupling. Specifically, we first construct SC and FC matrices from preprocessed diffusion-weighted DTI and resting-state functional fMRI data. Then, we quantify the SC-FC coupling strength of each region and convert it into feature vectors. Subsequently, we select SC-FC coupling features that can reflect disease-related information and trained three tree-based models to analyze the predictive role of these coupling features for diseases. Finally, we design a tree ensemble counterfactual explanation model to generate a set of counterfactual examples for patients, thereby assisting the diagnosis of brain diseases by fine-tuning the patient's abnormal SC-FC coupling feature vector. Experimental results on two independent datasets (i.e., epilepsy and schizophrenia) validate the effectiveness of the proposed method. The identified discriminative brain regions and generated counterfactual examples provide new insights for brain disease analysis.
Collapse
Affiliation(s)
- Shaolong Wei
- School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China
- Affiliated Hospital 2 of Nantong University, Nantong, China
| | - Zhen Gao
- Affiliated Hospital 2 of Nantong University, Nantong, China.
| | - Hongcheng Yao
- School of Information Science and Technology, Nantong University, Nantong, China
| | - Xiaoyu Qi
- School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China
| | - Mingliang Wang
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China.
| | - Jiashuang Huang
- School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China.
| |
Collapse
|
2
|
Kahng M, Tenney I, Pushkarna M, Liu MX, Wexler J, Reif E, Kallarackal K, Chang M, Terry M, Dixon L. LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:503-513. [PMID: 39255096 DOI: 10.1109/tvcg.2024.3456354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Evaluating large language models (LLMs) presents unique challenges. While automatic side-by-side evaluation, also known as LLM-as-a-judge, has become a promising solution, model developers and researchers face difficulties with scalability and interpretability when analyzing these evaluation outcomes. To address these challenges, we introduce LLM Comparator, a new visual analytics tool designed for side-by-side evaluations of LLMs. This tool provides analytical workflows that help users understand when and why one LLM outperforms or underperforms another, and how their responses differ. Through close collaboration with practitioners developing LLMs at Google, we have iteratively designed, developed, and refined the tool. Qualitative feedback from these users highlights that the tool facilitates in-depth analysis of individual examples while enabling users to visually overview and flexibly slice data. This empowers users to identify undesirable patterns, formulate hypotheses about model behavior, and gain insights for model improvement. LLM Comparator has been integrated into Google's LLM evaluation platforms and open-sourced.
Collapse
|
3
|
Wang AZ, Borland D, Gotz D. Beyond Correlation: Incorporating Counterfactual Guidance to Better Support Exploratory Visual Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:776-786. [PMID: 39255136 DOI: 10.1109/tvcg.2024.3456369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Providing effective guidance for users has long been an important and challenging task for efficient exploratory visual analytics, especially when selecting variables for visualization in high-dimensional datasets. Correlation is the most widely applied metric for guidance in statistical and analytical tools, however a reliance on correlation may lead users towards false positives when interpreting causal relations in the data. In this work, inspired by prior insights on the benefits of counterfactual visualization in supporting visual causal inference, we propose a novel, simple, and efficient counterfactual guidance method to enhance causal inference performance in guided exploratory analytics based on insights and concerns gathered from expert interviews. Our technique aims to capitalize on the benefits of counterfactual approaches while reducing their complexity for users. We integrated counterfactual guidance into an exploratory visual analytics system, and using a synthetically generated ground-truth causal dataset, conducted a comparative user study and evaluated to what extent counterfactual guidance can help lead users to more precise visual causal inferences. The results suggest that counterfactual guidance improved visual causal inference performance, and also led to different exploratory behaviors compared to correlation-based guidance. Based on these findings, we offer future directions and challenges for incorporating counterfactual guidance to better support exploratory visual analytics.
Collapse
|
4
|
Guo G, Deng L, Tandon A, Endert A, Kwon BC. MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models. FACCT '24 : PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY (ACM FACCT '24) : JUNE 3RD-6TH 2024, RIO DE JANEIRO, BRAZIL. ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY (2024 : RIO DE JA... 2024; 2024:1861-1874. [PMID: 39877054 PMCID: PMC11774553 DOI: 10.1145/3630106.3659011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.
Collapse
Affiliation(s)
- Grace Guo
- Georgia Institute of Technology Atlanta, Georgia, USA
| | - Lifu Deng
- Cleveland Clinic Cleveland, Ohio, USA
| | | | - Alex Endert
- Georgia Institute of Technology Atlanta, Georgia, USA
| | | |
Collapse
|
5
|
Kolozali S, White SL, Norris S, Fasli M, van Heerden A. Explainable Early Prediction of Gestational Diabetes Biomarkers by Combining Medical Background and Wearable Devices: A Pilot Study With a Cohort Group in South Africa. IEEE J Biomed Health Inform 2024; 28:1860-1871. [PMID: 38345955 DOI: 10.1109/jbhi.2024.3361505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
This study aims to explore the potential of Internet of Things (IoT) devices and explainable Artificial Intelligence (AI) techniques in predicting biomarker values associated with GDM when measured 13-16 weeks prior to diagnosis. We developed a system that forecasts biomarkers such as LDL, HDL, triglycerides, cholesterol, HbA1c, and results from the Oral Glucose Tolerance Test (OGTT) including fasting glucose, 1-hour, and 2-hour post-load glucose values. These biomarker values are predicted based on sensory measurements collected around week 12 of pregnancy, including continuous glucose levels, short physical movement recordings, and medical background information. To the best of our knowledge, this is the first study to forecast GDM-associated biomarker values 13 to 16 weeks prior to the GDM screening test, using continuous glucose monitoring devices, a wristband for activity detection, and medical background data. We applied machine learning models, specifically Decision Tree and Random Forest Regressors, along with Coupled-Matrix Tensor Factorisation (CMTF) and Elastic Net techniques, examining all possible combinations of these methods across different data modalities. The results demonstrated good performance for most biomarkers. On average, the models achieved Mean Squared Error (MSE) between 0.29 and 0.42 and Mean Absolute Error (MAE) between 0.23 and 0.45 for biomarkers like HDL, LDL, cholesterol, and HbA1c. For the OGTT glucose values, the average MSE ranged from 0.95 to 2.44, and the average MAE ranged from 0.72 to 0.91. Additionally, the utilisation of CMTF with Alternating Least Squares technique yielded slightly better results (0.16 MSE and 0.07 MAE on average) compared to the well-known Elastic Net feature selection technique. While our study was conducted with a limited cohort in South Africa, our findings offer promising indications regarding the potential for predicting biomarker values in pregnant women through the integration of wearable devices and medical background data in the analysis. Nevertheless, further validation on a larger, more diverse cohort is imperative to substantiate these encouraging results.
Collapse
|
6
|
Wentzel A, Floricel C, Canahuate G, Naser MA, Mohamed AS, Fuller CD, van Dijk L, Marai GE. DASS Good: Explainable Data Mining of Spatial Cohort Data. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2023; 42:283-295. [PMID: 37854026 PMCID: PMC10583718 DOI: 10.1111/cgf.14830] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Developing applicable clinical machine learning models is a difficult task when the data includes spatial information, for example, radiation dose distributions across adjacent organs at risk. We describe the co-design of a modeling system, DASS, to support the hybrid human-machine development and validation of predictive models for estimating long-term toxicities related to radiotherapy doses in head and neck cancer patients. Developed in collaboration with domain experts in oncology and data mining, DASS incorporates human-in-the-loop visual steering, spatial data, and explainable AI to augment domain knowledge with automatic data mining. We demonstrate DASS with the development of two practical clinical stratification models and report feedback from domain experts. Finally, we describe the design lessons learned from this collaborative experience.
Collapse
Affiliation(s)
- A Wentzel
- University of Illinois Chicago, Electronic Visualization Lab
| | - C Floricel
- University of Illinois Chicago, Electronic Visualization Lab
| | | | - M A Naser
- University of Texas MD Anderson Cancer Center
| | - A S Mohamed
- University of Texas MD Anderson Cancer Center
| | - C D Fuller
- University of Texas MD Anderson Cancer Center
| | - L van Dijk
- University of Texas MD Anderson Cancer Center
| | - G E Marai
- University of Illinois Chicago, Electronic Visualization Lab
| |
Collapse
|
7
|
Nguyen TM, Quinn TP, Nguyen T, Tran T. Explaining Black Box Drug Target Prediction Through Model Agnostic Counterfactual Samples. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1020-1029. [PMID: 35820003 DOI: 10.1109/tcbb.2022.3190266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Many high-performance DTA deep learning models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and allows to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug-target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time. We benchmark the proposed MACDA framework using the Davis and PDBBind dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge.
Collapse
|
8
|
Bertucci D, Hamid MM, Anand Y, Ruangrotsakun A, Tabatabai D, Perez M, Kahng M. DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:320-330. [PMID: 36166545 DOI: 10.1109/tvcg.2022.3209425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.
Collapse
|
9
|
Wang J, Ma J, Hu K, Zhou Z, Zhang H, Xie X, Wu Y. Tac-Trainer: A Visual Analytics System for IoT-based Racket Sports Training. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:951-961. [PMID: 36179004 DOI: 10.1109/tvcg.2022.3209352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Conventional racket sports training highly relies on coaches' knowledge and experience, leading to biases in the guidance. To solve this problem, smart wearable devices based on Internet of Things technology (IoT) have been extensively investigated to support data-driven training. Considerable studies introduced methods to extract valuable information from the sensor data collected by IoT devices. However, the information cannot provide actionable insights for coaches due to the large data volume and high data dimensions. We proposed an IoT + VA framework, Tac-Trainer, to integrate the sensor data, the information, and coaches' knowledge to facilitate racket sports training. Tac-Trainer consists of four components: device configuration, data interpretation, training optimization, and result visualization. These components collect trainees' kinematic data through IoT devices, transform the data into attributes and indicators, generate training suggestions, and provide an interactive visualization interface for exploration, respectively. We further discuss new research opportunities and challenges inspired by our work from two perspectives, VA for IoT and IoT for VA.
Collapse
|
10
|
Zhang H, Dong J, Lv C, Lin Y, Bai J. Visual analytics of potential dropout behavior patterns in online learning based on counterfactual explanation. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00899-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
11
|
Yuan J, Chan GYY, Barr B, Overton K, Rees K, Nonato LG, Bertini E, Silva CT. SUBPLEX: A Visual Analytics Approach to Understand Local Model Explanations at the Subpopulation Level. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2022; 42:24-36. [PMID: 37015716 DOI: 10.1109/mcg.2022.3199727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts, such as transport control, financial activities, and medical diagnosis. While local explanation techniques are popular methods to interpret ML models on a single instance, they do not scale to the understanding of a model's behavior on the whole dataset. In this article, we outline the challenges and needs of visually analyzing local explanations and propose SUBPLEX, a visual analytics approach to help users understand local explanations with subpopulation visual analysis. SUBPLEX provides steerable clustering and projection visualization techniques that allow users to derive interpretable subpopulations of local explanations with users' expertise. We evaluate our approach through two use cases and experts' feedback.
Collapse
|
12
|
SDA-Vis: A Visualization System for Student Dropout Analysis Based on Counterfactual Exploration. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125785] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
High and persistent dropout rates represent one of the biggest challenges for improving the efficiency of the educational system, particularly in underdeveloped countries. A range of features influence college dropouts, with some belonging to the educational field and others to non-educational fields. Understanding the interplay of these variables to identify a student as a potential dropout could help decision makers interpret the situation and decide what they should do next to reduce student dropout rates based on corrective actions. This paper presents SDA-Vis, a visualization system that supports counterfactual explanations for student dropout dynamics, considering various academic, social, and economic variables. In contrast to conventional systems, our approach provides information about feature-perturbed versions of a student using counterfactual explanations. SDA-Vis comprises a set of linked views that allow users to identify variables alteration to chance predefined students situations. This involves perturbing the variables of a dropout student to achieve synthetic non-dropout students. SDA-Vis has been developed under the guidance and supervision of domain experts, in line with some analytical objectives. We demonstrate the usefulness of SDA-Vis through case studies run in collaboration with domain experts, using a real data set from a Latin American university. The analysis reveals the effectiveness of SDA-Vis in identifying students at risk of dropping out and proposes corrective actions, even for particular cases that have not been shown to be at risk with the traditional tools that experts use.
Collapse
|
13
|
Abstract
AbstractInterpretable machine learning aims at unveiling the reasons behind predictions returned by uninterpretable classifiers. One of the most valuable types of explanation consists of counterfactuals. A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome. For instance, a bank customer asks for a loan that is rejected. The counterfactual explanation consists of what should have been different for the customer in order to have the loan accepted. Recently, there has been an explosion of proposals for counterfactual explainers. The aim of this work is to survey the most recent explainers returning counterfactual explanations. We categorize explainers based on the approach adopted to return the counterfactuals, and we label them according to characteristics of the method and properties of the counterfactuals returned. In addition, we visually compare the explanations, and we report quantitative benchmarking assessing minimality, actionability, stability, diversity, discriminative power, and running time. The results make evident that the current state of the art does not provide a counterfactual explainer able to guarantee all these properties simultaneously.
Collapse
|
14
|
Dimara E, Stasko J. A Critical Reflection on Visualization Research: Where Do Decision Making Tasks Hide? IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1128-1138. [PMID: 34587049 DOI: 10.1109/tvcg.2021.3114813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
It has been widely suggested that a key goal of visualization systems is to assist decision making, but is this true? We conduct a critical investigation on whether the activity of decision making is indeed central to the visualization domain. By approaching decision making as a user task, we explore the degree to which decision tasks are evident in visualization research and user studies. Our analysis suggests that decision tasks are not commonly found in current visualization task taxonomies and that the visualization field has yet to leverage guidance from decision theory domains on how to study such tasks. We further found that the majority of visualizations addressing decision making were not evaluated based on their ability to assist decision tasks. Finally, to help expand the impact of visual analytics in organizational as well as casual decision making activities, we initiate a research agenda on how decision making assistance could be elevated throughout visualization research.
Collapse
|
15
|
Cheng F, Liu D, Du F, Lin Y, Zytek A, Li H, Qu H, Veeramachaneni K. VBridge: Connecting the Dots Between Features and Data to Explain Healthcare Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:378-388. [PMID: 34596543 DOI: 10.1109/tvcg.2021.3114836] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is increasingly applied to Electronic Health Records (EHRs) to solve clinical prediction tasks. Although many ML models perform promisingly, issues with model transparency and interpretability limit their adoption in clinical practice. Directly using existing explainable ML techniques in clinical settings can be challenging. Through literature surveys and collaborations with six clinicians with an average of 17 years of clinical experience, we identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. Following an iterative design process, we further designed and developed VBridge, a visual analytics tool that seamlessly incorporates ML explanations into clinicians' decision-making workflow. The system includes a novel hierarchical display of contribution-based feature explanations and enriched interactions that connect the dots between ML features, explanations, and data. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians, showing that visually associating model explanations with patients' situational records can help clinicians better interpret and use model predictions when making clinician decisions. We further derived a list of design implications for developing future explainable ML tools to support clinical decision-making.
Collapse
|