Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

192
(from Reference Citation Analysis)

Article PDFs (8)

Cited by > 0 (123)

Searched Name

Cynthia Rudin

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection Using Eight Million Samples Labeled With Imprecise Arrhythmia Alarms. IEEE J Biomed Health Inform 2024;28:2650-2661. [PMID: 38300786 DOI: 10.1109/jbhi.2024.3360952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024] Abstract Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, to improve continuous AF detection in ambulatory settings towards a population-wide screening use case, we face several challenges, one of which is the lack of large-scale labeled training data. To address this challenge, we propose to leverage AF alarms from bedside patient monitors to label concurrent PPG signals, resulting in the largest PPG-AF dataset so far (8.5 M 30-second records from 24,100 patients) and demonstrating a practical approach to build large labeled PPG datasets. Furthermore, we recognize that the AF labels thus obtained contain errors because of false AF alarms generated from imperfect built-in algorithms from bedside monitors. Dealing with label noise with unknown distribution characteristics in this case requires advanced algorithms. We, therefore, introduce and open-source a novel loss design, the cluster membership consistency (CMC) loss, to mitigate label errors. By comparing CMC with state-of-the-art methods selected from a noisy label competition, we demonstrate its superiority in handling label noise in PPG data, resilience to poor-quality signals, and computational efficiency. Collapse Key Words Collapse MESH Headings Humans Photoplethysmography/methods Atrial Fibrillation/physiopathology Atrial Fibrillation/diagnosis Signal Processing, Computer-Assisted Algorithms Clinical Alarms Machine Learning Wearable Electronic Devices Collapse Grants Collapse
2	The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance. ARXIV 2024:arXiv:2309.13775v4. [PMID: 37808086 PMCID: PMC10557790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/10/2023] Abstract Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution. Collapse Key Words Collapse MESH Headings Collapse Grants R01 AI143381 NIAID NIH HHS R61 DA053599 NIDA NIH HHS UM1 AI126619 NIAID NIH HHS Collapse
3	AsymMirai: Interpretable Mammography-based Deep Learning Model for 1-5-year Breast Cancer Risk Prediction. Radiology 2024;310:e232780. [PMID: 38501952 DOI: 10.1148/radiol.232780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024] Abstract Background Mirai, a state-of-the-art deep learning-based algorithm for predicting short-term breast cancer risk, outperforms standard clinical risk models. However, Mirai is a black box, risking overreliance on the algorithm and incorrect diagnoses. Purpose To identify whether bilateral dissimilarity underpins Mirai's reasoning process; create a simplified, intelligible model, AsymMirai, using bilateral dissimilarity; and determine if AsymMirai may approximate Mirai's performance in 1-5-year breast cancer risk prediction. Materials and Methods This retrospective study involved mammograms obtained from patients in the EMory BrEast imaging Dataset, known as EMBED, from January 2013 to December 2020. To approximate 1-5-year breast cancer risk predictions from Mirai, another deep learning-based model, AsymMirai, was built with an interpretable module: local bilateral dissimilarity (localized differences between left and right breast tissue). Pearson correlation coefficients were computed between the risk scores of Mirai and those of AsymMirai. Subgroup analysis was performed in patients for whom AsymMirai's year-over-year reasoning was consistent. AsymMirai and Mirai risk scores were compared using the area under the receiver operating characteristic curve (AUC), and 95% CIs were calculated using the DeLong method. Results Screening mammograms (n = 210 067) from 81 824 patients (mean age, 59.4 years ± 11.4 [SD]) were included in the study. Deep learning-extracted bilateral dissimilarity produced similar risk scores to those of Mirai (1-year risk prediction, r = 0.6832; 4-5-year prediction, r = 0.6988) and achieved similar performance as Mirai. For AsymMirai, the 1-year breast cancer risk AUC was 0.79 (95% CI: 0.73, 0.85) (Mirai, 0.84; 95% CI: 0.79, 0.89; P = .002), and the 5-year risk AUC was 0.66 (95% CI: 0.63, 0.69) (Mirai, 0.71; 95% CI: 0.68, 0.74; P < .001). In a subgroup of 183 patients for whom AsymMirai repeatedly highlighted the same tissue over time, AsymMirai achieved a 3-year AUC of 0.92 (95% CI: 0.86, 0.97). Conclusion Localized bilateral dissimilarity, an imaging marker for breast cancer risk, approximated the predictive power of Mirai and was a key to Mirai's reasoning. © RSNA, 2024 Supplemental material is available for this article See also the editorial by Freitas in this issue. Collapse Key Words Collapse MESH Headings Humans Middle Aged Female Breast Neoplasms/diagnostic imaging Breast Neoplasms/epidemiology Deep Learning Retrospective Studies Mammography Breast Collapse Grants Collapse
4	Exploring and Interacting with the Set of Good Sparse Generalized Additive Models. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023;36:56673-56699. [PMID: 38623077 PMCID: PMC11018320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Abstract In real applications, interaction between machine learning models and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present algorithms to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); and (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
5	A Path to Simpler Models Starts With Noise. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023;36:3362-3401. [PMID: 38577617 PMCID: PMC10993912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Abstract The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
6	OKRidge: Scalable Optimal k-Sparse Ridge Regression. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023;36:41076-41258. [PMID: 38505104 PMCID: PMC10950455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/21/2024] Abstract We consider an important problem in scientific discovery, namely identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
7	Interpretable algorithmic forensics. Proc Natl Acad Sci U S A 2023;120:e2301842120. [PMID: 37782786 PMCID: PMC10576126 DOI: 10.1073/pnas.2301842120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open Abstract One of the most troubling trends in criminal investigations is the growing use of "black box" technology, in which law enforcement rely on artificial intelligence (AI) models or algorithms that are either too complex for people to understand or they simply conceal how it functions. In criminal cases, black box systems have proliferated in forensic areas such as DNA mixture interpretation, facial recognition, and recidivism risk assessments. The champions and critics of AI argue, mistakenly, that we face a catch 22: While black box AI is not understandable by people, they assume that it produces more accurate forensic evidence. In this Article, we question this assertion, which has so powerfully affected judges, policymakers, and academics. We describe a mature body of computer science research showing how "glass box" AI-designed to be interpretable-can be more accurate than black box alternatives. Indeed, black box AI performs predictably worse in settings like the criminal system. Debunking the black box performance myth has implications for forensic evidence, constitutional criminal procedure rights, and legislative policy. Absent some compelling-or even credible-government interest in keeping AI as a black box, and given the constitutional rights and public safety interests at stake, we argue that a substantial burden rests on the government to justify black box AI in criminal cases. We conclude by calling for judicial rulings and legislation to safeguard a right to interpretable forensic AI. Collapse Key Words AI algorithms explainability glass box interpretability Collapse MESH Headings Humans Artificial Intelligence Forensic Medicine Law Enforcement Algorithms Criminals Collapse Grants Collapse
8	How Many Patients Do You Need? Investigating Trial Designs for Anti-Seizure Treatment in Acute Brain Injury Patients. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.21.23294339. [PMID: 37662339 PMCID: PMC10473786 DOI: 10.1101/2023.08.21.23294339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023] Abstract Objectives Epileptiform activity (EA) worsens outcomes in patients with acute brain injuries (e.g., aneurysmal subarachnoid hemorrhage [aSAH]). Randomized trials (RCTs) assessing anti-seizure interventions are needed. Due to scant drug efficacy data and ethical reservations with placebo utilization, RCTs are lacking or hindered by design constraints. We used a pharmacological model-guided simulator to design and determine feasibility of RCTs evaluating EA treatment. Methods In a single-center cohort of adults (age >18) with aSAH and EA, we employed a mechanistic pharmacokinetic-pharmacodynamic framework to model treatment response using observational data. We subsequently simulated RCTs for levetiracetam and propofol, each with three treatment arms mirroring clinical practice and an additional placebo arm. Using our framework we simulated EA trajectories across treatment arms. We predicted discharge modified Rankin Scale as a function of baseline covariates, EA burden, and drug doses using a double machine learning model learned from observational data. Differences in outcomes across arms were used to estimate the required sample size. Results Sample sizes ranged from 500 for levetiracetam 7 mg/kg vs placebo, to >4000 for levetiracetam 15 vs. 7 mg/kg to achieve 80% power (5% type I error). For propofol 1mg/kg/hr vs. placebo 1200 participants were needed. Simulations comparing propofol at varying doses did not reach 80% power even at samples >1200. Interpretation Our simulations using drug efficacy show sample sizes are infeasible, even for potentially unethical placebo-control trials. We highlight the strength of simulations with observational data to inform the null hypotheses and assess feasibility of future trials of EA treatment. Collapse Key Words Collapse MESH Headings Collapse Grants K23 NS114201 NINDS NIH HHS R01 NS102190 NINDS NIH HHS R01 NS102574 NINDS NIH HHS R01 NS107291 NINDS NIH HHS Collapse
9	Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023;14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open Abstract Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions. Collapse Key Words biomedical materials computer science biomaterials biomedical engineering polymer synthesis Collapse MESH Headings Biocompatible Materials Polymers Collapse Grants DGE-2022040 National Science Foundation (NSF) National Science Foundation Artificial Intelligence for Designing and Understanding Materials - National Research Traineeship (aiM-NRT) at Duke University for funding this effort under grant DGE-2022040 Collapse
10	Effects of epileptiform activity on discharge outcome in critically ill patients in the USA: a retrospective cross-sectional study. Lancet Digit Health 2023;5:e495-e502. [PMID: 37295971 PMCID: PMC10528143 DOI: 10.1016/s2589-7500(23)00088-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 02/13/2023] [Accepted: 04/19/2023] [Indexed: 06/12/2023] Abstract BACKGROUND Epileptiform activity is associated with worse patient outcomes, including increased risk of disability and death. However, the effect of epileptiform activity on neurological outcome is confounded by the feedback between treatment with antiseizure medications and epileptiform activity burden. We aimed to quantify the heterogeneous effects of epileptiform activity with an interpretability-centred approach. METHODS We did a retrospective, cross-sectional study of patients in the intensive care unit who were admitted to Massachusetts General Hospital (Boston, MA, USA). Participants were aged 18 years or older and had electrographic epileptiform activity identified by a clinical neurophysiologist or epileptologist. The outcome was the dichotomised modified Rankin Scale (mRS) at discharge and the exposure was epileptiform activity burden defined as mean or maximum proportion of time spent with epileptiform activity in 6 h windows in the first 24 h of electroencephalography. We estimated the change in discharge mRS if everyone in the dataset had experienced a specific epileptiform activity burden and were untreated. We combined pharmacological modelling with an interpretable matching method to account for confounding and epileptiform activity-antiseizure medication feedback. The quality of the matched groups was validated by the neurologists. FINDINGS Between Dec 1, 2011, and Oct 14, 2017, 1514 patients were admitted to Massachusetts General Hospital intensive care unit, 995 (66%) of whom were included in the analysis. Compared with patients with a maximum epileptiform activity of 0 to less than 25%, patients with a maximum epileptiform activity burden of 75% or more when untreated had a mean 22·27% (SD 0·92) increased chance of a poor outcome (severe disability or death). Moderate but long-lasting epileptiform activity (mean epileptiform activity burden 2% to <10%) increased the risk of a poor outcome by mean 13·52% (SD 1·93). The effect sizes were heterogeneous depending on preadmission profile-eg, patients with hypoxic-ischaemic encephalopathy or acquired brain injury were more adversely affected compared with patients without these conditions. INTERPRETATION Our results suggest that interventions should put a higher priority on patients with an average epileptiform activity burden 10% or greater, and treatment should be more conservative when maximum epileptiform activity burden is low. Treatment should also be tailored to individual preadmission profiles because the potential for epileptiform activity to cause harm depends on age, medical history, and reason for admission. FUNDING National Institutes of Health and National Science Foundation. Collapse Key Words Collapse MESH Headings United States Humans Retrospective Studies Cross-Sectional Studies Patient Discharge Critical Illness Treatment Outcome Collapse Grants R01 NS102190 NINDS NIH HHS R01 AG073410 NIA NIH HHS RF1 NS120947 NINDS NIH HHS R01 HL161253 NHLBI NIH HHS RF1 AG064312 NIA NIH HHS R01 NS102574 NINDS NIH HHS R01 NS126282 NINDS NIH HHS R01 NS107291 NINDS NIH HHS Collapse
11	Tensile performance data of 3D printed photopolymer gyroid lattices. Data Brief 2023;49:109396. [PMID: 37600123 PMCID: PMC10439287 DOI: 10.1016/j.dib.2023.109396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 06/30/2023] [Accepted: 07/06/2023] [Indexed: 08/22/2023] Open Abstract Additive manufacturing has provided the ability to manufacture complex structures using a wide variety of materials and geometries. Structures such as triply periodic minimal surface (TPMS) lattices have been incorporated into products across many fields due to their unique combinations of mechanical, geometric, and physical properties. Yet, the near limitless possibility of combining geometry and material into these lattices leaves much to be discovered. This article provides a dataset of experimentally gathered tensile stress-strain curves and measured porosity values for 389 unique gyroid lattice structures manufactured using vat photopolymerization 3D printing. The lattice samples were printed from one of twenty different photopolymer materials available from either Formlabs, LOCTITE AM, or ETEC that range from strong and brittle to elastic and ductile and were printed on commercially available 3D printers, specifically the Formlabs Form2, Prusa SL1, and ETEC Envision One cDLM Mechanical. The stress-strain curves were recorded with an MTS Criterion C43.504 mechanical testing apparatus and following ASTM standards, and the void fraction or "porosity" of each lattice was measured using a calibrated scale. This data serves as a valuable resource for use in the development of novel printing materials and lattice geometries and provides insight into the influence of photopolymer material properties on the printability, geometric accuracy, and mechanical performance of 3D printed lattice structures. The data described in this article was used to train a machine learning model capable of predicting mechanical properties of 3D printed gyroid lattices based on the base mechanical properties of the printing material and porosity of the lattice in the research article [1]. Collapse Key Words Additive manufacturing Lattice structures Machine learning Mechanics Photopolymer Porosity Collapse MESH Headings Collapse Grants Collapse
12	Optimal Sparse Regression Trees. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2023;37:11270-11279. [PMID: 38650922 PMCID: PMC11034802 DOI: 10.1609/aaai.v37i9.26334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024] Abstract Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm on one dimensional data. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
13	PP 1.33 – 00167 Integrated single-cell multi-omic profiling of HIV latency reversal. J Virus Erad 2022. [DOI: 10.1016/j.jve.2022.100137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	Fast Optimization of Weighted Sparse Decision Trees for use in Optimal Treatment Regimes and Optimal Policy Design. CEUR WORKSHOP PROCEEDINGS 2022;3318:26. [PMID: 36970634 PMCID: PMC10039433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/29/2023] Abstract Sparse decision trees are one of the most common forms of interpretable models. While recent advances have produced algorithms that fully optimize sparse decision trees for prediction, that work does not address policy design, because the algorithms cannot handle weighted data samples. Specifically, they rely on the discreteness of the loss function, which means that real-valued weights cannot be directly used. For example, none of the existing techniques produce policies that incorporate inverse propensity weighting on individual data points. We present three algorithms for efficient sparse weighted decision tree optimization. The first approach directly optimizes the weighted loss function; however, it tends to be computationally inefficient for large datasets. Our second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart. Our third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight. We present theoretical bounds on the error of the two fast methods and show experimentally that these methods can be two orders of magnitude faster than the direct optimization of the weighted loss, without losing significant accuracy. Collapse Key Words Explainability Interpretable Machine Learning Optimal Sparse Decision Trees Optimal Treatment Regimes Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
15	The Importance of Being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-Based Ethnicity Classification. HARVARD DATA SCIENCE REVIEW 2022. [DOI: 10.1162/99608f92.db1aba8b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization. Commun Biol 2022;5:719. [PMID: 35853932 PMCID: PMC9296444 DOI: 10.1038/s42003-022-03628-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/23/2022] [Indexed: 12/11/2022] Open Abstract Dimension reduction (DR) algorithms project data from high dimensions to lower dimensions to enable visualization of interesting high-dimensional structure. DR algorithms are widely used for analysis of single-cell transcriptomic data. Despite widespread use of DR algorithms such as t-SNE and UMAP, these algorithms have characteristics that lead to lack of trust: they do not preserve important aspects of high-dimensional structure and are sensitive to arbitrary user choices. Given the importance of gaining insights from DR, DR methods should be evaluated carefully before trusting their results. In this paper, we introduce and perform a systematic evaluation of popular DR methods, including t-SNE, art-SNE, UMAP, PaCMAP, TriMap and ForceAtlas2. Our evaluation considers five components: preservation of local structure, preservation of global structure, sensitivity to parameter choices, sensitivity to preprocessing choices, and computational efficiency. This evaluation can help us to choose DR tools that align with the scientific goals of the user. Collapse Key Words machine learning data mining data processing Collapse MESH Headings Algorithms Data Visualization Transcriptome Collapse Grants R01 DA054994 NIDA NIH HHS R01 AI143381 NIAID NIH HHS UM1 AI164567 NIAID NIH HHS R61 DA053599 NIDA NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of Allergy and Infectious Diseases (NIAID) U.S. Department of Health & Human Services \| NIH \| National Institute on Drug Abuse (NIDA) U.S. Department of Health & Human Services \| National Institutes of Health (NIH) Collapse
17	Fast Sparse Decision Tree Optimization via Reference Ensembles. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2022;36:9604-9613. [PMID: 36051654 PMCID: PMC9429834 DOI: 10.1609/aaai.v36i9.21194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Abstract Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at the core of interpretable machine learning. Sparse decision tree optimization is computationally hard, and despite steady effort since the 1960's, breakthroughs have been made on the problem only within the past few years, primarily on the problem of finding optimal sparse decision trees. However, current state-of-the-art algorithms often require impractical amounts of computation time and memory to find optimal or near-optimal trees for some real-world datasets, particularly those having several continuous-valued features. Given that the search spaces of these decision tree optimization problems are massive, can we practically hope to find a sparse decision tree that competes in accuracy with a black box machine learning model? We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. The guesses come from knowledge gleaned from black box models. We show that by using these guesses, we can reduce the run time by multiple orders of magnitude while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressive power. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree. Our experiments show that in many cases we can rapidly construct sparse decision trees that match the accuracy of black box models. To summarize: when you are having trouble optimizing, just guess. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
18	Data solidarity for machine learning for embryo selection; a call for the creation of an open access repository of embryo data. Reprod Biomed Online 2022;45:10-13. [DOI: 10.1016/j.rbmo.2022.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 03/15/2022] [Accepted: 03/16/2022] [Indexed: 10/18/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	Fast Sparse Classification for Generalized Linear and Additive Models. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022;151:9304-9333. [PMID: 35601052 PMCID: PMC9122737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Abstract We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated features. For fast sparse logistic regression, our computational speed-up over other best-subset search techniques owes to linear and quadratic surrogate cuts for the logistic loss that allow us to efficiently screen features for elimination, as well as use of a priority queue that favors a more uniform exploration of features. As an alternative to the logistic loss, we propose the exponential loss, which permits an analytical solution to the line search at each iteration. Our algorithms are generally 2 to 5 times faster than previous approaches. They produce interpretable models that have accuracy comparable to black box models on challenging datasets. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
20	A highly virulent variant of HIV-1 circulating in the Netherlands. Science 2022;375:540-545. [PMID: 35113714 DOI: 10.1126/science.abk1688] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Abstract We discovered a highly virulent variant of subtype-B HIV-1 in the Netherlands. One hundred nine individuals with this variant had a 0.54 to 0.74 log₁₀ increase (i.e., a ~3.5-fold to 5.5-fold increase) in viral load compared with, and exhibited CD4 cell decline twice as fast as, 6604 individuals with other subtype-B strains. Without treatment, advanced HIV-CD4 cell counts below 350 cells per cubic millimeter, with long-term clinical consequences-is expected to be reached, on average, 9 months after diagnosis for individuals in their thirties with this variant. Age, sex, suspected mode of transmission, and place of birth for the aforementioned 109 individuals were typical for HIV-positive people in the Netherlands, which suggests that the increased virulence is attributable to the viral strain. Genetic sequence analysis suggests that this variant arose in the 1990s from de novo mutation, not recombination, with increased transmissibility and an unfamiliar molecular mechanism of virulence. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	Interpretable machine learning: Fundamental principles and 10 grand challenges. STATISTICS SURVEYS 2022. [DOI: 10.1214/21-ss133] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	Exploring the Whole Rashomon Set of Sparse Decision Trees. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2022;35:14071-14084. [PMID: 37786624 PMCID: PMC10544768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Abstract In any given machine learning problem, there might be many models that explain the data almost equally well. However, most learning algorithms return only one of these models, leaving practitioners with no practical way to explore alternative models that might have desirable properties beyond what could be expressed by a loss function. The Rashomon set is the set of these all almost-optimal models. Rashomon sets can be large in size and complicated in structure, particularly for highly nonlinear function classes that allow complex interaction terms, such as decision trees. We provide the first technique for completely enumerating the Rashomon set for sparse decision trees; in fact, our work provides the first complete enumeration of any Rashomon set for a non-trivial problem with a highly nonlinear discrete function class. This allows the user an unprecedented level of control over model choice among all models that are approximately equally good. We represent the Rashomon set in a specialized data structure that supports efficient querying and sampling. We show three applications of the Rashomon set: 1) it can be used to study variable importance for the set of almost-optimal trees (as opposed to a single tree), 2) the Rashomon set for accuracy enables enumeration of the Rashomon sets for balanced accuracy and F1-score, and 3) the Rashomon set for a full dataset can be used to produce Rashomon sets constructed with only subsets of the data set. Thus, we are able to examine Rashomon sets across problems with a new lens, enabling users to choose models rather than be at the mercy of an algorithm that produces only a single model. Collapse Key Words Collapse MESH Headings Collapse Grants R01 DA054994 NIDA NIH HHS Collapse
23	A supervised machine learning semantic segmentation approach for detecting artifacts in plethysmography signals from wearables. Physiol Meas 2021;42. [PMID: 34794126 DOI: 10.1088/1361-6579/ac3b3d] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 11/18/2021] [Indexed: 12/21/2022] Abstract Objective. Wearable devices equipped with plethysmography (PPG) sensors provided a low-cost, long-term solution to early diagnosis and continuous screening of heart conditions. However PPG signals collected from such devices often suffer from corruption caused by artifacts. The objective of this study is to develop an effective supervised algorithm to locate the regions of artifacts within PPG signals.Approach. We treat artifact detection as a 1D segmentation problem. We solve it via a novel combination of an active-contour-based loss and an adapted U-Net architecture. The proposed algorithm was trained on the PPG DaLiA training set, and further evaluated on the PPG DaLiA testing set, WESAD dataset and TROIKA dataset.Main results. We evaluated with the DICE score, a well-established metric for segmentation accuracy evaluation in the field of computer vision. The proposed method outperforms baseline methods on all three datasets by a large margin (≈7 percentage points above the next best method). On the PPG DaLiA testing set, WESAD dataset and TROIKA dataset, the proposed method achieved 0.8734 ± 0.0018, 0.9114 ± 0.0033 and 0.8050 ± 0.0116 respectively. The next best method only achieved 0.8068 ± 0.0014, 0.8446 ± 0.0013 and 0.7247 ± 0.0050.Significance. The proposed method is able to pinpoint exact locations of artifacts with high precision; in the past, we had only a binary classification of whether a PPG signal has good or poor quality. This more nuanced information will be critical to further inform the design of algorithms to detect cardiac arrhythmia. Collapse Key Words PPG plethysmography signal artifacts wearables Collapse MESH Headings Collapse Grants Collapse
24	Interpretable, not black-box, artificial intelligence should be used for embryo selection. Hum Reprod Open 2021;2021:hoab040. [PMID: 34938903 PMCID: PMC8687137 DOI: 10.1093/hropen/hoab040] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/18/2021] [Indexed: 11/23/2022] Open Abstract Artificial intelligence (AI) techniques are starting to be used in IVF, in particular for selecting which embryos to transfer to the woman. AI has the potential to process complex data sets, to be better at identifying subtle but important patterns, and to be more objective than humans when evaluating embryos. However, a current review of the literature shows much work is still needed before AI can be ethically implemented for this purpose. No randomized controlled trials (RCTs) have been published, and the efficacy studies which exist demonstrate that algorithms can broadly differentiate well between 'good-' and 'poor-' quality embryos but not necessarily between embryos of similar quality, which is the actual clinical need. Almost universally, the AI models were opaque ('black-box') in that at least some part of the process was uninterpretable. This gives rise to a number of epistemic and ethical concerns, including problems with trust, the possibility of using algorithms that generalize poorly to different populations, adverse economic implications for IVF clinics, potential misrepresentation of patient values, broader societal implications, a responsibility gap in the case of poor selection choices and introduction of a more paternalistic decision-making process. Use of interpretable models, which are constrained so that a human can easily understand and explain them, could overcome these concerns. The contribution of AI to IVF is potentially significant, but we recommend that AI models used in this field should be interpretable, and rigorously evaluated with RCTs before implementation. We also recommend long-term follow-up of children born after AI for embryo selection, regulatory oversight for implementation, and public availability of data and code to enable research teams to independently reproduce and validate existing models. Collapse Key Words AI IVF ML artificial intelligence black-box embryo selection ethics interpretable machine learning Collapse MESH Headings Collapse Grants 212708/Z/18/Z Wellcome Trust Collapse
25	MA11.06 Multi-Omic Characterization of Lung Tumors Implicates AKT and MYC Signaling in Adenocarcinoma to Squamous Cell Transdifferentiation. J Thorac Oncol 2021. [DOI: 10.1016/j.jtho.2021.08.167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	MA16.03 CRISPR Screen Reveals XPO1 as a Therapeutic Target Strongly Sensitizing to First and Second Line Therapy in Small Cell Lung Cancer. J Thorac Oncol 2021. [DOI: 10.1016/j.jtho.2021.08.196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
27	OA07.01 Signatures of Plasticity and Immunosuppression in a Single-Cell Atlas of Human Small Cell Lung Cancer. J Thorac Oncol 2021. [DOI: 10.1016/j.jtho.2021.08.054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
28	O-098 Embryo selection using Artificial Intelligence (AI): Epistemic and ethical considerations. Hum Reprod 2021. [DOI: 10.1093/humrep/deab125.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Abstract Study question What are the epistemic and ethical considerations of clinically implementing Artificial Intelligence (AI) algorithms in embryo selection? Summary answer AI embryo selection algorithms used to date are “black-box” models with significant epistemic and ethical issues, and there are no trials assessing their clinical effectiveness. What is known already The innovation of time-lapse imaging offers the potential to generate vast quantities of data for embryo assessment. Computer Vision allows image data to be analysed using algorithms developed via machine learning which learn and adapt as they are exposed to more data. Most algorithms are developed using neural networks and are uninterpretable (or “black box”). Uninterpretable models are either too complicated to understand or proprietary, in which case comprehension is impossible for outsiders. In the IVF context, these outsiders include doctors, embryologists and patients, which raises ethical questions for its use in embryo selection. Study design, size, duration We performed a scoping review of articles evaluating AI for embryo selection in IVF. We considered the epistemic and ethical implications of current approaches. Participants/materials, setting, methods We searched Medline, Embase, ClinicalTrials.gov and the EU Clinical Trials Register for full text papers evaluating AI for embryo selection using the following key words: artificial intelligence* OR AI OR neural network* OR machine learning OR support vector machine OR automatic classification AND IVF OR in vitro fertilisation OR embryo, as well as relevant MeSH and Emtree terms for Medline and Embase respectively. Main results and the role of chance We found no trials evaluating clinical effectiveness either published or registered. We found efficacy studies which looked at 2 types of outcomes – accuracy for predicting pregnancy or live birth and agreement with embryologist evaluation. Some algorithms were shown to broadly differentiate well between “good-” and “poor-” quality embryos but not between embryos of similar quality, which is the clinical need. Almost universally, the AI models were opaque (“black box”) in that at least some part of the process was uninterpretable. “Black box” models are problematic for epistemic and ethical reasons. Epistemic concerns include information asymmetries between algorithm developers and doctors, embryologists and patients; the risk of biased prediction caused by known and/or unknown confounders during the training process; difficulties in real-time error checking due to limited interpretability; the economics of buying into commercial proprietary models, brittle to variation in the treatment process; and an overall difficulty troubleshooting. Ethical pitfalls include the risk of misrepresenting patient values; concern for the health and well-being of future children; the risk of disvaluing disability; possible societal implications; and a responsibility gap, in the event of adverse events. Limitations, reasons for caution Our search was limited to the two main medical research databases. Although we checked article references for more publications, we were less likely to identify studies that were not indexed in Medline or Embase, especially if they were not cited in studies identified in our search. Wider implications of the findings It is premature to implement AI for embryo selection outside of a clinical trial. AI for embryo selection is potentially useful, but must be done carefully and transparently, as the epistemic and ethical issues are significant. We advocate for the use of interpretable AI models to overcome these issues. Trial registration number not applicable Collapse Key Words* Collapse MESH Headings Collapse Grants Collapse
29	Playing Codenames with Language Graphs and Word Embeddings. J ARTIF INTELL RES 2021. [DOI: 10.1613/jair.1.12665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open Abstract Although board games and video games have been studied for decades in artificial intelligence research, challenging word games remain relatively unexplored. Word games are not as constrained as games like chess or poker. Instead, word game strategy is defined by the players’ understanding of the way words relate to each other. The word game Codenames provides a unique opportunity to investigate common sense understanding of relationships between words, an important open challenge. We propose an algorithm that can generate Codenames clues from the language graph BabelNet or from any of several embedding methods – word2vec, GloVe, fastText or BERT. We introduce a new scoring function that measures the quality of clues, and we propose a weighting term called DETECT that incorporates dictionary-based word representations and document frequency to improve clue selection. We develop BabelNet-Word Selection Framework (BabelNetWSF) to improve BabelNet clue quality and overcome the computational barriers that previously prevented leveraging language graphs for Codenames. Extensive experiments with human evaluators demonstrate that our proposed innovations yield state-of-the-art performance, with up to 102.8% improvement in precision@2 in some cases. Overall, this work advances the formal study of word games and approaches for common sense language understanding. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	P47.06 Delta-Radiomics Features for Assessment of Individualized Therapeutic Response in Small Cell Lung Cancer – A Pilot Study. J Thorac Oncol 2021. [DOI: 10.1016/j.jtho.2021.01.862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	Concept whitening for interpretable image recognition. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00265-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	Prediction, Machine Learning, and Individual Lives: an Interview with Matthew Salganik. HARVARD DATA SCIENCE REVIEW 2020. [DOI: 10.1162/99608f92.eecdfa4e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
33	Chemsex drugs on the rise: a longitudinal analysis of the Swiss HIV Cohort Study from 2007 to 2017. HIV Med 2020;21:228-239. [DOI: 10.1111/hiv.12821] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2019] [Indexed: 12/22/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
34	Changes in Renal Function After Switching From TDF to TAF in HIV-Infected Individuals: A Prospective Cohort Study. J Infect Dis 2020;222:637-645. [DOI: 10.1093/infdis/jiaa125] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 03/16/2020] [Indexed: 12/12/2022] Open Abstract AbstractBackgroundReplacing tenofovir disoproxil fumarate (TDF) with tenofovir alafenamide (TAF) improves renal tubular markers in HIV-infected individuals but the impact on estimated glomerular filtration rate (eGFR) remains unclear.MethodsIn all participants from the Swiss HIV Cohort Study who switched from TDF to TAF-containing antiretroviral regimen or continued TDF, we estimated changes in eGFR and urine protein-to-creatinine ratio (UPCR) after 18 months using mixed-effect models.ResultsOf 3520 participants (26.6% women, median age 50 years), 2404 (68.5%) switched to TAF. Overall, 1664 (47.3%) had an eGFR <90 mL/min and 1087 (30.9%) an UPCR ≥15 mg/mmol. In patients with baseline eGFR ≥90 mL/min, eGFR decreased with the use of TDF and TAF (−1.7 mL/min). Switching to TAF was associated with increases in eGFR of 1.5 mL/min (95% confidence interval [CI], .5–2.5) if the baseline eGFR was 60–89 mL/min, and 4.1 mL/min (95% CI, 1.6–6.6) if <60 mL/min. In contrast, eGFR decreased by 5.8 mL/min (95% CI, 2.3–9.3) with continued use of TDF in individuals with baseline eGFR <60 mL/min. UPCR decreased after replacing TDF by TAF, independent of baseline eGFR.ConclusionsSwitching from TDF to TAF improves eGFR and proteinuria in patients with renal dysfunction. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
35	AI reflections in 2019. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-019-0141-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
36	P1.01-122 A Clinical Utility Study of Plasma DNA Next Generation Sequencing Guided Treatment of Uncommon Drivers in Advanced Non-Small-Cell Lung Cancers. J Thorac Oncol 2019. [DOI: 10.1016/j.jtho.2019.08.837] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	P1.12-05 Microenvironment Characterization of Small Cell Lung Cancer Xenografts Implanted in Hematopoietic Humanized Mice. J Thorac Oncol 2019. [DOI: 10.1016/j.jtho.2019.08.1118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
38	Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. NAT MACH INTELL 2019;1:206-215. [PMID: 35603010 PMCID: PMC9122117 DOI: 10.1038/s42256-019-0048-x] [Citation(s) in RCA: 1225] [Impact Index Per Article: 245.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Accepted: 03/26/2019] [Indexed: 11/08/2022] Abstract Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward - it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision. Collapse Key Words Collapse MESH Headings Collapse Grants R01 EB025021 NIBIB NIH HHS Collapse
39	Interpretable Almost-Exact Matching for Causal Inference. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019;89:2445-2453. [PMID: 31198908 PMCID: PMC6563929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023] Abstract Matching methods are heavily used in the social and health sciences due to their interpretability. We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible. To do this, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of the units' optimization problems simultaneously. Notable advantages of our method over existing matching procedures are its high-quality interpretable matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible. Collapse Key Words Collapse MESH Headings Collapse Grants R01 EB025021 NIBIB NIH HHS Collapse
40	All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2019;20:177. [PMID: 34335110 PMCID: PMC8323609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023] Abstract Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model f (x) = x ^T β with a fixed coefficient vector β) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models. Collapse Key Words Rashomon U-statistics conditional variable importance interpretable models permutation importance transparency Collapse MESH Headings Collapse Grants R35 CA197449 NCI NIH HHS R01 ES028033 NIEHS NIH HHS P50 MD010428 NIMHD NIH HHS R01 GM111339 NIGMS NIH HHS R01 ES026217 NIEHS NIH HHS P01 CA134294 NCI NIH HHS R01 ES024332 NIEHS NIH HHS DP2 MD012722 NIMHD NIH HHS R01 MD012769 NIMHD NIH HHS Collapse
41	P3.12-06 SLFN11 Expression and Efficacy of PARP Inhibitor Therapy in Extensive Stage Small Cell Lung Cancer: ECOG-ACRIN 2511 Study. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.1829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	MA19.09 Concurrent Mutations in STK11 and KEAP1 is Associated with Resistance to PD-(L)1 Blockade in Patients with NSCLC Despite High TMB. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.480] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
43	MA22.01 PARP Inhibitor Radiosensitization of Small Cell Lung Cancer Differs by PARP Trapping Potency. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
44	P1.01-99 Detecting HER2 Alterations by Next Generation Sequencing (NGS) in Patients with Advanced NSCLC from the United States and China. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	P2.13-44 Targeting NFE2L2 Mutations in Advanced Squamous Cell Lung Cancers with the TORC1/2 Inhibitor TAK-228. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.1439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
46	MA15.02 Long-Term Safety and Clinical Activity Results from a Phase Ib Study of Erlotinib Plus Atezolizumab in Advanced NSCLC. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.440] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
47	P1.01-75 Utility of cfDNA Testing for Acquired Resistance: The Memorial Sloan Kettering Experience with Plasma EGFR T790M Clinical Testing. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
48	P1.13-43 Molecular and Imaging Predictors of Response to Ado-Trastuzumab Emtansine in Patients with HER2 Mutant Lung Cancers: An Exploratory Phase 2 Trial. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.900] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
49	High Cure Rates With Grazoprevir-Elbasvir With or Without Ribavirin Guided by Genotypic Resistance Testing Among Human Immunodeficiency Virus/Hepatitis C Virus–coinfected Men Who Have Sex With Men. Clin Infect Dis 2018;68:569-576. [DOI: 10.1093/cid/ciy547] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 06/29/2018] [Indexed: 12/22/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
50	Modeling recovery curves with application to prostatectomy. Biostatistics 2018;20:549-564. [DOI: 10.1093/biostatistics/kxy002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 11/22/2017] [Indexed: 11/14/2022] Open Abstract Summary In many clinical settings, a patient outcome takes the form of a scalar time series with a recovery curve shape, which is characterized by a sharp drop due to a disruptive event (e.g., surgery) and subsequent monotonic smooth rise towards an asymptotic level not exceeding the pre-event value. We propose a Bayesian model that predicts recovery curves based on information available before the disruptive event. A recovery curve of interest is the quantified sexual function of prostate cancer patients after prostatectomy surgery. We illustrate the utility of our model as a pre-treatment medical decision aid, producing personalized predictions that are both interpretable and accurate. We uncover covariate relationships that agree with and supplement that in existing medical literature. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse