1
|
Sun K, Lan T, Goh YM, Safiena S, Huang YH, Lytle B, He Y. An interpretable clustering approach to safety climate analysis: Examining driver group distinctions. ACCIDENT; ANALYSIS AND PREVENTION 2024; 196:107420. [PMID: 38159513 DOI: 10.1016/j.aap.2023.107420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 11/23/2023] [Accepted: 12/01/2023] [Indexed: 01/03/2024]
Abstract
The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these problems, this study introduces an interpretable clustering approach for safety climate analysis. This study compares five algorithms for clustering truck drivers based on their safety climate perceptions. It also proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). Then, to better interpret the clustering results, this study introduces different interpretable machine learning measures (Shapley additive explanations, permutation feature importance, and QPDP). The Python code used in this study is available at https://github.com/NUS-DBE/truck-driver-safety-climate. This study explains the clusters based on the importance of different safety climate factors. Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. Moreover, it showcases the advantages of employing machine learning techniques, such as cluster analysis, to enrich the scientific knowledge in this field. Future studies could involve experimental methods to assess strategies for enhancing supervisory care promotion, as well as integrating deep learning clustering techniques with safety climate evaluation.
Collapse
Affiliation(s)
- Kailai Sun
- National University of Singapore, Singapore
| | | | | | | | | | | | - Yimin He
- University of Nebraska Omaha, United States
| |
Collapse
|
2
|
Fehr J, Citro B, Malpani R, Lippert C, Madai VI. A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare. Front Digit Health 2024; 6:1267290. [PMID: 38455991 PMCID: PMC10919164 DOI: 10.3389/fdgth.2024.1267290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/05/2024] [Indexed: 03/09/2024] Open
Abstract
Trustworthy medical AI requires transparency about the development and testing of underlying algorithms to identify biases and communicate potential risks of harm. Abundant guidance exists on how to achieve transparency for medical AI products, but it is unclear whether publicly available information adequately informs about their risks. To assess this, we retrieved public documentation on the 14 available CE-certified AI-based radiology products of the II b risk category in the EU from vendor websites, scientific publications, and the European EUDAMED database. Using a self-designed survey, we reported on their development, validation, ethical considerations, and deployment caveats, according to trustworthy AI guidelines. We scored each question with either 0, 0.5, or 1, to rate if the required information was "unavailable", "partially available," or "fully available." The transparency of each product was calculated relative to all 55 questions. Transparency scores ranged from 6.4% to 60.9%, with a median of 29.1%. Major transparency gaps included missing documentation on training data, ethical considerations, and limitations for deployment. Ethical aspects like consent, safety monitoring, and GDPR-compliance were rarely documented. Furthermore, deployment caveats for different demographics and medical settings were scarce. In conclusion, public documentation of authorized medical AI products in Europe lacks sufficient public transparency to inform about safety and risks. We call on lawmakers and regulators to establish legally mandated requirements for public and substantive transparency to fulfill the promise of trustworthy AI for health.
Collapse
Affiliation(s)
- Jana Fehr
- Digital Health & Machine Learning, Hasso Plattner Institute, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Brian Citro
- Independent Researcher, Chicago, IL, United States
| | | | - Christoph Lippert
- Digital Health & Machine Learning, Hasso Plattner Institute, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Vince I. Madai
- QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité Universitätsmedizin Berlin, Berlin, Germany
- Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
| |
Collapse
|
3
|
Kasun M, Ryan K, Paik J, Lane-McKinley K, Dunn LB, Roberts LW, Kim JP. Academic machine learning researchers' ethical perspectives on algorithm development for health care: a qualitative study. J Am Med Inform Assoc 2024; 31:563-573. [PMID: 38069455 PMCID: PMC10873830 DOI: 10.1093/jamia/ocad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/20/2023] [Accepted: 12/05/2023] [Indexed: 02/18/2024] Open
Abstract
OBJECTIVES We set out to describe academic machine learning (ML) researchers' ethical considerations regarding the development of ML tools intended for use in clinical care. MATERIALS AND METHODS We conducted in-depth, semistructured interviews with a sample of ML researchers in medicine (N = 10) as part of a larger study investigating stakeholders' ethical considerations in the translation of ML tools in medicine. We used a qualitative descriptive design, applying conventional qualitative content analysis in order to allow participant perspectives to emerge directly from the data. RESULTS Every participant viewed their algorithm development work as holding ethical significance. While participants shared positive attitudes toward continued ML innovation, they described concerns related to data sampling and labeling (eg, limitations to mitigating bias; ensuring the validity and integrity of data), and algorithm training and testing (eg, selecting quantitative targets; assessing reproducibility). Participants perceived a need to increase interdisciplinary training across stakeholders and to envision more coordinated and embedded approaches to addressing ethics issues. DISCUSSION AND CONCLUSION Participants described key areas where increased support for ethics may be needed; technical challenges affecting clinical acceptability; and standards related to scientific integrity, beneficence, and justice that may be higher in medicine compared to other industries engaged in ML innovation. Our results help shed light on the perspectives of ML researchers in medicine regarding the range of ethical issues they encounter or anticipate in their work, including areas where more attention may be needed to support the successful development and integration of medical ML tools.
Collapse
Affiliation(s)
- Max Kasun
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Katie Ryan
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Jodi Paik
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Kyle Lane-McKinley
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Laura Bodin Dunn
- Department of Psychiatry, University of Arkansas for Medical Sciences, Little Rock, AK 72205, United States
| | - Laura Weiss Roberts
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Jane Paik Kim
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, United States
| |
Collapse
|
4
|
Graham SS, Shifflet S, Amjad M, Claborn K. An interpretable machine learning framework for opioid overdose surveillance from emergency medical services records. PLoS One 2024; 19:e0292170. [PMID: 38289927 PMCID: PMC10826931 DOI: 10.1371/journal.pone.0292170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 09/14/2023] [Indexed: 02/01/2024] Open
Abstract
The goal of this study is to develop and validate a lightweight, interpretable machine learning (ML) classifier to identify opioid overdoses in emergency medical services (EMS) records. We conducted a comparative assessment of three feature engineering approaches designed for use with unstructured narrative data. Opioid overdose annotations were provided by two harm reduction paramedics and two supporting annotators trained to reliably match expert annotations. Candidate feature engineering techniques included term frequency-inverse document frequency (TF-IDF), a highly performant approach to concept vectorization, and a custom approach based on the count of empirically-identified keywords. Each feature set was trained using four model architectures: generalized linear model (GLM), Naïve Bayes, neural network, and Extreme Gradient Boost (XGBoost). Ensembles of trained models were also evaluated. The custom feature models were also assessed for variable importance to aid interpretation. Models trained using TF-IDF feature engineering ranged from AUROC = 0.59 (95% CI: 0.53-0.66) for the Naïve Bayes to AUROC = 0.76 (95% CI: 0.71-0.81) for the neural network. Models trained using concept vectorization features ranged from AUROC = 0.83 (95% 0.78-0.88)for the Naïve Bayes to AUROC = 0.89 (95% CI: 0.85-0.94) for the ensemble. Models trained using custom features were the most performant, with benchmarks ranging from AUROC = 0.92 (95% CI: 0.88-0.95) with the GLM to 0.93 (95% CI: 0.90-0.96) for the ensemble. The custom features model achieved positive predictive values (PPV) ranging for 80 to 100%, which represent substantial improvements over previously published EMS encounter opioid overdose classifiers. The application of this approach to county EMS data can productively inform local and targeted harm reduction initiatives.
Collapse
Affiliation(s)
- S. Scott Graham
- Department of Rhetoric & Writing, Center for Health Communication, University of Texas at Austin, Austin, TX, United States of Amedrica
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Savannah Shifflet
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Maaz Amjad
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
| | - Kasey Claborn
- Addiction Research Institute, University of Texas at Austin, Austin, TX, United States of Amedrica
- Steve Hicks School of Social Work, University of Texas at Austin, Austin, TX, United States of Amedrica
| |
Collapse
|
5
|
Green BL, Murphy A, Robinson E. Accelerating health disparities research with artificial intelligence. Front Digit Health 2024; 6:1330160. [PMID: 38322109 PMCID: PMC10844447 DOI: 10.3389/fdgth.2024.1330160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 01/10/2024] [Indexed: 02/08/2024] Open
Affiliation(s)
- B. Lee Green
- Department of Health Outcomes and Behavior, Moffitt Cancer Center, Tampa, FL, United States
| | - Anastasia Murphy
- Department of Health Outcomes and Behavior, Moffitt Cancer Center, Tampa, FL, United States
| | - Edmondo Robinson
- Center for Digital Health, Moffitt Cancer Center, Tampa, FL, United States
| |
Collapse
|
6
|
Cen HS, Dandamudi S, Lei X, Weight C, Desai M, Gill I, Duddalwar V. Diversity in Renal Mass Data Cohorts: Implications for Urology AI Researchers. Oncology 2023:000535841. [PMID: 38104555 PMCID: PMC11178677 DOI: 10.1159/000535841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/08/2023] [Indexed: 12/19/2023]
Abstract
Objective We examine the heterogeneity and distribution of the cohort populations in two publicly used radiological image cohorts, Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCIA TCGA KIRC) collection and 2019 MICCAI Kidney Tumor Segmentation Challenge (KiTS19), and deviations in real world population renal cancer data from National Cancer Database (NCDB) Participant User Data File (PUF) and tertiary center data. PUF data is used as an anchor for prevalence rate bias assessment. Specific gene expression and therefore biology of RCC differ by self-reported race especially between the African American and Caucasian population. AI algorithms learn from datasets, but if the dataset misrepresents the population, reinforcing bias may occur. Ignoring these demographic features may lead to inaccurate downstream effects, thereby limiting the translation of these analyses to clinical practice. Consciousness of model training biases is vital to patient care decisions when using models in clinical settings. Method Data evaluated included the gender, demographic and reported pathologic grading and cancer staging. American Urological Association risk levels were used. Poisson regression was used to estimate the population-based and sample specific estimation for prevalence rate and corresponding 95% confidence interval. SAS 9.4 was used for data analysis. Result Compared to PUF, KiTS19 and TCGA KIRC over sampled Caucasian by 9.5% (95% CI, -3.7% to 22.7%) and 15.1% (95% CI, 1.5% to 28.8%), under sampled African American by -6.7% (95% CI, -10% to -3.3%), -5.5% (95% CI, -9.3% to -1.8%). Tertiary also under sampled African American by -6.6% (95% CI, -8.7% to -4.6%). The tertiary cohort largely under sampled aggressive cancers by -14.7% (95% CI, -20.9% to -8.4%). No statistically significant difference was found among PUF, TCGA, and KiTS19 in aggressive rate, however heterogeneities in risk are notable. Conclusion Heterogeneities between cohorts need to be considered in future AI training and cross-validation for renal masses.
Collapse
Affiliation(s)
- Harmony Selena Cen
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Xiaomeng Lei
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Chris Weight
- Urologic Oncology, Cleveland Clinic, Cleveland, OH, USA
| | - Mihir Desai
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Inderbir Gill
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Vinay Duddalwar
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
7
|
Chin MH, Afsar-Manesh N, Bierman AS, Chang C, Colón-Rodríguez CJ, Dullabh P, Duran DG, Fair M, Hernandez-Boussard T, Hightower M, Jain A, Jordan WB, Konya S, Moore RH, Moore TT, Rodriguez R, Shaheen G, Snyder LP, Srinivasan M, Umscheid CA, Ohno-Machado L. Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care. JAMA Netw Open 2023; 6:e2345050. [PMID: 38100101 PMCID: PMC11181958 DOI: 10.1001/jamanetworkopen.2023.45050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Importance Health care algorithms are used for diagnosis, treatment, prognosis, risk stratification, and allocation of resources. Bias in the development and use of algorithms can lead to worse outcomes for racial and ethnic minoritized groups and other historically marginalized populations such as individuals with lower income. Objective To provide a conceptual framework and guiding principles for mitigating and preventing bias in health care algorithms to promote health and health care equity. Evidence Review The Agency for Healthcare Research and Quality and the National Institute for Minority Health and Health Disparities convened a diverse panel of experts to review evidence, hear from stakeholders, and receive community feedback. Findings The panel developed a conceptual framework to apply guiding principles across an algorithm's life cycle, centering health and health care equity for patients and communities as the goal, within the wider context of structural racism and discrimination. Multiple stakeholders can mitigate and prevent bias at each phase of the algorithm life cycle, including problem formulation (phase 1); data selection, assessment, and management (phase 2); algorithm development, training, and validation (phase 3); deployment and integration of algorithms in intended settings (phase 4); and algorithm monitoring, maintenance, updating, or deimplementation (phase 5). Five principles should guide these efforts: (1) promote health and health care equity during all phases of the health care algorithm life cycle; (2) ensure health care algorithms and their use are transparent and explainable; (3) authentically engage patients and communities during all phases of the health care algorithm life cycle and earn trustworthiness; (4) explicitly identify health care algorithmic fairness issues and trade-offs; and (5) establish accountability for equity and fairness in outcomes from health care algorithms. Conclusions and Relevance Multiple stakeholders must partner to create systems, processes, regulations, incentives, standards, and policies to mitigate and prevent algorithmic bias. Reforms should implement guiding principles that support promotion of health and health care equity in all phases of the algorithm life cycle as well as transparency and explainability, authentic community engagement and ethical partnerships, explicit identification of fairness issues and trade-offs, and accountability for equity and fairness.
Collapse
Affiliation(s)
| | | | | | - Christine Chang
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | | | | | - Malika Fair
- Association of American Medical Colleges, Washington, DC
| | | | | | - Anjali Jain
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | - Stephen Konya
- Office of the National Coordinator for Health Information Technology, Washington, DC
| | - Roslyn Holliday Moore
- US Department of Health and Human Services Office of Minority Health, Rockville, Maryland
| | | | | | | | | | | | | | | |
Collapse
|
8
|
McCradden MD, Joshi S, Anderson JA, London AJ. A normative framework for artificial intelligence as a sociotechnical system in healthcare. PATTERNS (NEW YORK, N.Y.) 2023; 4:100864. [PMID: 38035190 PMCID: PMC10682751 DOI: 10.1016/j.patter.2023.100864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Artificial intelligence (AI) tools are of great interest to healthcare organizations for their potential to improve patient care, yet their translation into clinical settings remains inconsistent. One of the reasons for this gap is that good technical performance does not inevitably result in patient benefit. We advocate for a conceptual shift wherein AI tools are seen as components of an intervention ensemble. The intervention ensemble describes the constellation of practices that, together, bring about benefit to patients or health systems. Shifting from a narrow focus on the tool itself toward the intervention ensemble prioritizes a "sociotechnical" vision for translation of AI that values all components of use that support beneficial patient outcomes. The intervention ensemble approach can be used for regulation, institutional oversight, and for AI adopters to responsibly and ethically appraise, evaluate, and use AI tools.
Collapse
Affiliation(s)
- Melissa D. McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada
- Genetics & Genome Biology Research Program, Peter Gilgan Center for Research & Learning, Toronto, ON, Canada
- Division of Clinical & Public Health, Dalla Lana School of Public Health, Toronto, ON, Canada
| | - Shalmali Joshi
- Department of Biomedical Informatics, Department of Computer Science (Affliate), Data Science Institute, Columbia University, New York, NY, USA
| | - James A. Anderson
- Department of Bioethics, The Hospital for Sick Children, Toronto, ON, Canada
- Institute for Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Alex John London
- Department of Philosophy and Center for Ethics and Policy, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
9
|
Li LT, Haley LC, Boyd AK, Bernstam EV. Technical/Algorithm, Stakeholder, and Society (TASS) barriers to the application of artificial intelligence in medicine: A systematic review. J Biomed Inform 2023; 147:104531. [PMID: 37884177 DOI: 10.1016/j.jbi.2023.104531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 09/14/2023] [Accepted: 10/22/2023] [Indexed: 10/28/2023]
Abstract
INTRODUCTION The use of artificial intelligence (AI), particularly machine learning and predictive analytics, has shown great promise in health care. Despite its strong potential, there has been limited use in health care settings. In this systematic review, we aim to determine the main barriers to successful implementation of AI in healthcare and discuss potential ways to overcome these challenges. METHODS We conducted a literature search in PubMed (1/1/2001-1/1/2023). The search was restricted to publications in the English language, and human study subjects. We excluded articles that did not discuss AI, machine learning, predictive analytics, and barriers to the use of these techniques in health care. Using grounded theory methodology, we abstracted concepts to identify major barriers to AI use in medicine. RESULTS We identified a total of 2,382 articles. After reviewing the 306 included papers, we developed 19 major themes, which we categorized into three levels: the Technical/Algorithm, Stakeholder, and Social levels (TASS). These themes included: Lack of Explainability, Need for Validation Protocols, Need for Standards for Interoperability, Need for Reporting Guidelines, Need for Standardization of Performance Metrics, Lack of Plan for Updating Algorithm, Job Loss, Skills Loss, Workflow Challenges, Loss of Patient Autonomy and Consent, Disturbing the Patient-Clinician Relationship, Lack of Trust in AI, Logistical Challenges, Lack of strategic plan, Lack of Cost-effectiveness Analysis and Proof of Efficacy, Privacy, Liability, Bias and Social Justice, and Education. CONCLUSION We identified 19 major barriers to the use of AI in healthcare and categorized them into three levels: the Technical/Algorithm, Stakeholder, and Social levels (TASS). Future studies should expand on barriers in pediatric care and focus on developing clearly defined protocols to overcome these barriers.
Collapse
Affiliation(s)
- Linda T Li
- Department of Surgery, Division of Pediatric Surgery, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, United States; McWilliams School of Biomedical Informatics at UT Health Houston, 7000 Fannin St, Suite 600, Houston, TX 77030, United States.
| | - Lauren C Haley
- McGovern Medical School at the University of Texas Health Science Center at Houston, 6431 Fannin St, Houston, TX 77030, United States.
| | - Alexandra K Boyd
- McGovern Medical School at the University of Texas Health Science Center at Houston, 6431 Fannin St, Houston, TX 77030, United States.
| | - Elmer V Bernstam
- McWilliams School of Biomedical Informatics at UT Health Houston, 7000 Fannin St, Suite 600, Houston, TX 77030, United States; McGovern Medical School at the University of Texas Health Science Center at Houston, 6431 Fannin St, Houston, TX 77030, United States.
| |
Collapse
|
10
|
Wang Y, Song Y, Ma Z, Han X. Multidisciplinary considerations of fairness in medical AI: A scoping review. Int J Med Inform 2023; 178:105175. [PMID: 37595374 DOI: 10.1016/j.ijmedinf.2023.105175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/20/2023]
Abstract
INTRODUCTION Artificial Intelligence (AI) technology has been developed significantly in recent years. The fairness of medical AI is of great concern due to its direct relation to human life and health. This review aims to analyze the existing research literature on fairness in medical AI from the perspectives of computer science, medical science, and social science (including law and ethics). The objective of the review is to examine the similarities and differences in the understanding of fairness, explore influencing factors, and investigate potential measures to implement fairness in medical AI across English and Chinese literature. METHODS This study employed a scoping review methodology and selected the following databases: Web of Science, MEDLINE, Pubmed, OVID, CNKI, WANFANG Data, etc., for the fairness issues in medical AI through February 2023. The search was conducted using various keywords such as "artificial intelligence," "machine learning," "medical," "algorithm," "fairness," "decision-making," and "bias." The collected data were charted, synthesized, and subjected to descriptive and thematic analysis. RESULTS After reviewing 468 English papers and 356 Chinese papers, 53 and 42 were included in the final analysis. Our results show the three different disciplines all show significant differences in the research on the core issues. Data is the foundation that affects medical AI fairness in addition to algorithmic bias and human bias. Legal, ethical, and technological measures all promote the implementation of medical AI fairness. CONCLUSIONS Our review indicates a consensus regarding the importance of data fairness as the foundation for achieving fairness in medical AI across multidisciplinary perspectives. However, there are substantial discrepancies in core aspects such as the concept, influencing factors, and implementation measures of fairness in medical AI. Consequently, future research should facilitate interdisciplinary discussions to bridge the cognitive gaps between different fields and enhance the practical implementation of fairness in medical AI.
Collapse
Affiliation(s)
- Yue Wang
- School of Law, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an, Shaanxi, 710049, PR China.
| | - Yaxin Song
- School of Law, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an, Shaanxi, 710049, PR China.
| | - Zhuo Ma
- School of Law, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an, Shaanxi, 710049, PR China.
| | - Xiaoxue Han
- Xi'an Jiaotong University Library, No.28, Xianning West Road, Xi'an, Shaanxi, 710049, PR China.
| |
Collapse
|
11
|
Teeple S, Chivers C, Linn KA, Halpern SD, Eneanya N, Draugelis M, Courtright K. Evaluating equity in performance of an electronic health record-based 6-month mortality risk model to trigger palliative care consultation: a retrospective model validation analysis. BMJ Qual Saf 2023; 32:503-516. [PMID: 37001995 PMCID: PMC10898860 DOI: 10.1136/bmjqs-2022-015173] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 03/08/2023] [Indexed: 04/03/2023]
Abstract
OBJECTIVE Evaluate predictive performance of an electronic health record (EHR)-based, inpatient 6-month mortality risk model developed to trigger palliative care consultation among patient groups stratified by age, race, ethnicity, insurance and socioeconomic status (SES), which may vary due to social forces (eg, racism) that shape health, healthcare and health data. DESIGN Retrospective evaluation of prediction model. SETTING Three urban hospitals within a single health system. PARTICIPANTS All patients ≥18 years admitted between 1 January and 31 December 2017, excluding observation, obstetric, rehabilitation and hospice (n=58 464 encounters, 41 327 patients). MAIN OUTCOME MEASURES General performance metrics (c-statistic, integrated calibration index (ICI), Brier Score) and additional measures relevant to health equity (accuracy, false positive rate (FPR), false negative rate (FNR)). RESULTS For black versus non-Hispanic white patients, the model's accuracy was higher (0.051, 95% CI 0.044 to 0.059), FPR lower (-0.060, 95% CI -0.067 to -0.052) and FNR higher (0.049, 95% CI 0.023 to 0.078). A similar pattern was observed among patients who were Hispanic, younger, with Medicaid/missing insurance, or living in low SES zip codes. No consistent differences emerged in c-statistic, ICI or Brier Score. Younger age had the second-largest effect size in the mortality prediction model, and there were large standardised group differences in age (eg, 0.32 for non-Hispanic white versus black patients), suggesting age may contribute to systematic differences in the predicted probabilities between groups. CONCLUSIONS An EHR-based mortality risk model was less likely to identify some marginalised patients as potentially benefiting from palliative care, with younger age pinpointed as a possible mechanism. Evaluating predictive performance is a critical preliminary step in addressing algorithmic inequities in healthcare, which must also include evaluating clinical impact, and governance and regulatory structures for oversight, monitoring and accountability.
Collapse
Affiliation(s)
- Stephanie Teeple
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | - Kristin A Linn
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Scott D Halpern
- Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Nwamaka Eneanya
- Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | | | - Katherine Courtright
- Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Walsh G, Stogiannos N, van de Venter R, Rainey C, Tam W, McFadden S, McNulty JP, Mekis N, Lewis S, O'Regan T, Kumar A, Huisman M, Bisdas S, Kotter E, Pinto dos Santos D, Sá dos Reis C, van Ooijen P, Brady AP, Malamateniou C. Responsible AI practice and AI education are central to AI implementation: a rapid review for all medical imaging professionals in Europe. BJR Open 2023; 5:20230033. [PMID: 37953871 PMCID: PMC10636340 DOI: 10.1259/bjro.20230033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 05/27/2023] [Accepted: 05/30/2023] [Indexed: 11/14/2023] Open
Abstract
Artificial intelligence (AI) has transitioned from the lab to the bedside, and it is increasingly being used in healthcare. Radiology and Radiography are on the frontline of AI implementation, because of the use of big data for medical imaging and diagnosis for different patient groups. Safe and effective AI implementation requires that responsible and ethical practices are upheld by all key stakeholders, that there is harmonious collaboration between different professional groups, and customised educational provisions for all involved. This paper outlines key principles of ethical and responsible AI, highlights recent educational initiatives for clinical practitioners and discusses the synergies between all medical imaging professionals as they prepare for the digital future in Europe. Responsible and ethical AI is vital to enhance a culture of safety and trust for healthcare professionals and patients alike. Educational and training provisions for medical imaging professionals on AI is central to the understanding of basic AI principles and applications and there are many offerings currently in Europe. Education can facilitate the transparency of AI tools, but more formalised, university-led training is needed to ensure the academic scrutiny, appropriate pedagogy, multidisciplinarity and customisation to the learners' unique needs are being adhered to. As radiographers and radiologists work together and with other professionals to understand and harness the benefits of AI in medical imaging, it becomes clear that they are faced with the same challenges and that they have the same needs. The digital future belongs to multidisciplinary teams that work seamlessly together, learn together, manage risk collectively and collaborate for the benefit of the patients they serve.
Collapse
Affiliation(s)
- Gemma Walsh
- Division of Midwifery & Radiography, City University of London, London, United Kingdom
| | | | | | - Clare Rainey
- School of Health Sciences, Ulster University, Derry~Londonderry, Northern Ireland
| | - Winnie Tam
- Division of Midwifery & Radiography, City University of London, London, United Kingdom
| | - Sonyia McFadden
- School of Health Sciences, Ulster University, Coleraine, United Kingdom
| | | | - Nejc Mekis
- Medical Imaging and Radiotherapy Department, University of Ljubljana, Faculty of Health Sciences, Ljubljana, Slovenia
| | - Sarah Lewis
- Discipline of Medical Imaging Science, Sydney School of Health Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| | - Tracy O'Regan
- The Society and College of Radiographers, London, United Kingdom
| | - Amrita Kumar
- Frimley Health NHS Foundation Trust, Frimley, United Kingdom
| | - Merel Huisman
- Department of Radiology, University Medical Center Utrecht, Utrecht, Netherlands
| | | | | | | | - Cláudia Sá dos Reis
- School of Health Sciences (HESAV), University of Applied Sciences and Arts Western Switzerland (HES-SO), Lausanne, Switzerland
| | | | | | | |
Collapse
|
13
|
Vorisek CN, Stellmach C, Mayer PJ, Klopfenstein SAI, Bures DM, Diehl A, Henningsen M, Ritter K, Thun S. Artificial Intelligence Bias in Health Care: Web-Based Survey. J Med Internet Res 2023; 25:e41089. [PMID: 37347528 PMCID: PMC10337406 DOI: 10.2196/41089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/11/2022] [Accepted: 04/20/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND Resources are increasingly spent on artificial intelligence (AI) solutions for medical applications aiming to improve diagnosis, treatment, and prevention of diseases. While the need for transparency and reduction of bias in data and algorithm development has been addressed in past studies, little is known about the knowledge and perception of bias among AI developers. OBJECTIVE This study's objective was to survey AI specialists in health care to investigate developers' perceptions of bias in AI algorithms for health care applications and their awareness and use of preventative measures. METHODS A web-based survey was provided in both German and English language, comprising a maximum of 41 questions using branching logic within the REDCap web application. Only the results of participants with experience in the field of medical AI applications and complete questionnaires were included for analysis. Demographic data, technical expertise, and perceptions of fairness, as well as knowledge of biases in AI, were analyzed, and variations among gender, age, and work environment were assessed. RESULTS A total of 151 AI specialists completed the web-based survey. The median age was 30 (IQR 26-39) years, and 67% (101/151) of respondents were male. One-third rated their AI development projects as fair (47/151, 31%) or moderately fair (51/151, 34%), 12% (18/151) reported their AI to be barely fair, and 1% (2/151) not fair at all. One participant identifying as diverse rated AI developments as barely fair, and among the 2 undefined gender participants, AI developments were rated as barely fair or moderately fair, respectively. Reasons for biases selected by respondents were lack of fair data (90/132, 68%), guidelines or recommendations (65/132, 49%), or knowledge (60/132, 45%). Half of the respondents worked with image data (83/151, 55%) from 1 center only (76/151, 50%), and 35% (53/151) worked with national data exclusively. CONCLUSIONS This study shows that the perception of biases in AI overall is moderately fair. Gender minorities did not once rate their AI development as fair or very fair. Therefore, further studies need to focus on minorities and women and their perceptions of AI. The results highlight the need to strengthen knowledge about bias in AI and provide guidelines on preventing biases in AI health care applications.
Collapse
Affiliation(s)
- Carina Nina Vorisek
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Caroline Stellmach
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Paula Josephine Mayer
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sophie Anne Ines Klopfenstein
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Institute for Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | - Anke Diehl
- Stabsstelle Digitale Transformation, Universitätsmedizin Essen, Essen, Germany
| | - Maike Henningsen
- Faculty of Health, University of Witten/Herdecke, Witten, Germany
| | - Kerstin Ritter
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sylvia Thun
- Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
14
|
de Hond AAH, Kant IMJ, Fornasa M, Cinà G, Elbers PWG, Thoral PJ, Sesmu Arbous M, Steyerberg EW. Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model. Crit Care Med 2023; 51:291-300. [PMID: 36524820 PMCID: PMC9848213 DOI: 10.1097/ccm.0000000000005758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
OBJECTIVES Many machine learning (ML) models have been developed for application in the ICU, but few models have been subjected to external validation. The performance of these models in new settings therefore remains unknown. The objective of this study was to assess the performance of an existing decision support tool based on a ML model predicting readmission or death within 7 days after ICU discharge before, during, and after retraining and recalibration. DESIGN A gradient boosted ML model was developed and validated on electronic health record data from 2004 to 2021. We performed an independent validation of this model on electronic health record data from 2011 to 2019 from a different tertiary care center. SETTING Two ICUs in tertiary care centers in The Netherlands. PATIENTS Adult patients who were admitted to the ICU and stayed for longer than 12 hours. INTERVENTIONS None. MEASUREMENTS AND MAIN RESULTS We assessed discrimination by area under the receiver operating characteristic curve (AUC) and calibration (slope and intercept). We retrained and recalibrated the original model and assessed performance via a temporal validation design. The final retrained model was cross-validated on all data from the new site. Readmission or death within 7 days after ICU discharge occurred in 577 of 10,052 ICU admissions (5.7%) at the new site. External validation revealed moderate discrimination with an AUC of 0.72 (95% CI 0.67-0.76). Retrained models showed improved discrimination with AUC 0.79 (95% CI 0.75-0.82) for the final validation model. Calibration was poor initially and good after recalibration via isotonic regression. CONCLUSIONS In this era of expanding availability of ML models, external validation and retraining are key steps to consider before applying ML models to new settings. Clinicians and decision-makers should take this into account when considering applying new ML models to their local settings.
Collapse
Affiliation(s)
- Anne A H de Hond
- Department of Information Technology and Digital Innovation, Leiden University Medical Centre, Leiden, The Netherlands
- Department of Biomedical Informatics, Stanford Medicine, Stanford, CA
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
| | - Ilse M J Kant
- Department of Information Technology and Digital Innovation, Leiden University Medical Centre, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
| | | | - Giovanni Cinà
- Pacmed, Stadhouderskade 55, Amsterdam, The Netherlands
- Institute of Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands
| | - Paul W G Elbers
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam UMC, Amsterdam, The Netherlands
| | - Patrick J Thoral
- Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam UMC, Amsterdam, The Netherlands
| | - M Sesmu Arbous
- Department of Intensive Care Medicine, Leiden University Medical Centre, Leiden, The Netherlands
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
| |
Collapse
|
15
|
Rubeis G, Fang ML, Sixsmith A. Equity in AgeTech for Ageing Well in Technology-Driven Places: The Role of Social Determinants in Designing AI-based Assistive Technologies. SCIENCE AND ENGINEERING ETHICS 2022; 28:49. [PMID: 36301408 PMCID: PMC9613787 DOI: 10.1007/s11948-022-00397-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 08/01/2022] [Indexed: 06/16/2023]
Abstract
AgeTech involves the use of emerging technologies to support the health, well-being and independent living of older adults. In this paper we focus on how AgeTech based on artificial intelligence (AI) may better support older adults to remain in their own living environment for longer, provide social connectedness, support wellbeing and mental health, and enable social participation. In order to assess and better understand the positive as well as negative outcomes of AI-based AgeTech, a critical analysis of ethical design, digital equity, and policy pathways is required. A crucial question is how AI-based AgeTech may drive practical, equitable, and inclusive multilevel solutions to support healthy, active ageing.In our paper, we aim to show that a focus on equity is key for AI-based AgeTech if it is to realize its full potential. We propose that equity should not just be an extra benefit or minimum requirement, but the explicit aim of designing AI-based health tech. This means that social determinants that affect the use of or access to these technologies have to be addressed. We will explore how complexity management as a crucial element of AI-based AgeTech may potentially create and exacerbate social inequities by marginalising or ignoring social determinants. We identify bias, standardization, and access as main ethical issues in this context and subsequently, make recommendations as to how inequities that stem form AI-based AgeTech can be addressed.
Collapse
Affiliation(s)
- Giovanni Rubeis
- Department General Health Studies, Division Biomedical and Public Health Ethics, Karl Landsteiner University of Health Sciences, Dr.-Karl-Dorrek-Straße 30, 3500 Krems, Austria
| | - Mei Lan Fang
- Department General Health Studies, Division Biomedical and Public Health Ethics, Karl Landsteiner University of Health Sciences, Dr.-Karl-Dorrek-Straße 30, 3500 Krems, Austria
- School of Health Sciences, University of Dundee Nethergate, DD1 4HN Dundee Scotland, UK
| | - Andrew Sixsmith
- Department General Health Studies, Division Biomedical and Public Health Ethics, Karl Landsteiner University of Health Sciences, Dr.-Karl-Dorrek-Straße 30, 3500 Krems, Austria
- Department of Gerontology, Simon Fraser University Vancouver Harbour Centre, British Columbia, Room, 2800, V6B 5K3 Vancouver, Canada
| |
Collapse
|
16
|
Chavez-Yenter D, Goodman MS, Chen Y, Chu X, Bradshaw RL, Lorenz Chambers R, Chan PA, Daly BM, Flynn M, Gammon A, Hess R, Kessler C, Kohlmann WK, Mann DM, Monahan R, Peel S, Kawamoto K, Del Fiol G, Sigireddi M, Buys SS, Ginsburg O, Kaphingst KA. Association of Disparities in Family History and Family Cancer History in the Electronic Health Record With Sex, Race, Hispanic or Latino Ethnicity, and Language Preference in 2 Large US Health Care Systems. JAMA Netw Open 2022; 5:e2234574. [PMID: 36194411 PMCID: PMC9533178 DOI: 10.1001/jamanetworkopen.2022.34574] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Clinical decision support (CDS) algorithms are increasingly being implemented in health care systems to identify patients for specialty care. However, systematic differences in missingness of electronic health record (EHR) data may lead to disparities in identification by CDS algorithms. OBJECTIVE To examine the availability and comprehensiveness of cancer family history information (FHI) in patients' EHRs by sex, race, Hispanic or Latino ethnicity, and language preference in 2 large health care systems in 2021. DESIGN, SETTING, AND PARTICIPANTS This retrospective EHR quality improvement study used EHR data from 2 health care systems: University of Utah Health (UHealth) and NYU Langone Health (NYULH). Participants included patients aged 25 to 60 years who had a primary care appointment in the previous 3 years. Data were collected or abstracted from the EHR from December 10, 2020, to October 31, 2021, and analyzed from June 15 to October 31, 2021. EXPOSURES Prior collection of cancer FHI in primary care settings. MAIN OUTCOMES AND MEASURES Availability was defined as having any FHI and any cancer FHI in the EHR and was examined at the patient level. Comprehensiveness was defined as whether a cancer family history observation in the EHR specified the type of cancer diagnosed in a family member, the relationship of the family member to the patient, and the age at onset for the family member and was examined at the observation level. RESULTS Among 144 484 patients in the UHealth system, 53.6% were women; 74.4% were non-Hispanic or non-Latino and 67.6% were White; and 83.0% had an English language preference. Among 377 621 patients in the NYULH system, 55.3% were women; 63.2% were non-Hispanic or non-Latino, and 55.3% were White; and 89.9% had an English language preference. Patients from historically medically undeserved groups-specifically, Black vs White patients (UHealth: 17.3% [95% CI, 16.1%-18.6%] vs 42.8% [95% CI, 42.5%-43.1%]; NYULH: 24.4% [95% CI, 24.0%-24.8%] vs 33.8% [95% CI, 33.6%-34.0%]), Hispanic or Latino vs non-Hispanic or non-Latino patients (UHealth: 27.2% [95% CI, 26.5%-27.8%] vs 40.2% [95% CI, 39.9%-40.5%]; NYULH: 24.4% [95% CI, 24.1%-24.7%] vs 31.6% [95% CI, 31.4%-31.8%]), Spanish-speaking vs English-speaking patients (UHealth: 18.4% [95% CI, 17.2%-19.1%] vs 40.0% [95% CI, 39.7%-40.3%]; NYULH: 15.1% [95% CI, 14.6%-15.6%] vs 31.1% [95% CI, 30.9%-31.2%), and men vs women (UHealth: 30.8% [95% CI, 30.4%-31.2%] vs 43.0% [95% CI, 42.6%-43.3%]; NYULH: 23.1% [95% CI, 22.9%-23.3%] vs 34.9% [95% CI, 34.7%-35.1%])-had significantly lower availability and comprehensiveness of cancer FHI (P < .001). CONCLUSIONS AND RELEVANCE These findings suggest that systematic differences in the availability and comprehensiveness of FHI in the EHR may introduce informative presence bias as inputs to CDS algorithms. The observed differences may also exacerbate disparities for medically underserved groups. System-, clinician-, and patient-level efforts are needed to improve the collection of FHI.
Collapse
Affiliation(s)
- Daniel Chavez-Yenter
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Communication, University of Utah, Salt Lake City
| | - Melody S. Goodman
- School of Global Public Health, New York University, New York, New York
| | - Yuyu Chen
- School of Global Public Health, New York University, New York, New York
| | - Xiangying Chu
- School of Global Public Health, New York University, New York, New York
| | - Richard L. Bradshaw
- Department of Biomedical Informatics, University of Utah, Salt Lake City
- School of Medicine, University of Utah Health, Salt Lake City, Utah
| | | | | | - Brianne M. Daly
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Michael Flynn
- School of Medicine, University of Utah Health, Salt Lake City, Utah
| | - Amanda Gammon
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Rachel Hess
- Department of Population Health Sciences, University of Utah, Salt Lake City
- Department of Internal Medicine, University of Utah, Salt Lake City
| | - Cecelia Kessler
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | | | - Devin M. Mann
- Department of Population Health, New York University Grossman School of Medicine, New York University, New York, New York
| | - Rachel Monahan
- Perlmutter Cancer Center, NYU Langone Health, New York, New York
- Department of Population Health, New York University Grossman School of Medicine, New York University, New York, New York
| | - Sara Peel
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Kensaku Kawamoto
- Department of Biomedical Informatics, University of Utah, Salt Lake City
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City
| | | | - Saundra S. Buys
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Internal Medicine, University of Utah, Salt Lake City
| | - Ophira Ginsburg
- Center for Global Health, National Cancer Institute, Rockville, Maryland
| | - Kimberly A. Kaphingst
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Communication, University of Utah, Salt Lake City
| |
Collapse
|
17
|
Albert K, Delano M. Sex trouble: Sex/gender slippage, sex confusion, and sex obsession in machine learning using electronic health records. PATTERNS (NEW YORK, N.Y.) 2022; 3:100534. [PMID: 36033589 PMCID: PMC9403398 DOI: 10.1016/j.patter.2022.100534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
False assumptions that sex and gender are binary, static, and concordant are deeply embedded in the medical system. As machine learning researchers use medical data to build tools to solve novel problems, understanding how existing systems represent sex/gender incorrectly is necessary to avoid perpetuating harm. In this perspective, we identify and discuss three factors to consider when working with sex/gender in research: "sex/gender slippage," the frequent substitution of sex and sex-related terms for gender and vice versa; "sex confusion," the fact that any given sex variable holds many different potential meanings; and "sex obsession," the idea that the relevant variable for most inquiries related to sex/gender is sex assigned at birth. We then explore how these phenomena show up in medical machine learning research using electronic health records, with a specific focus on HIV risk prediction. Finally, we offer recommendations about how machine learning researchers can engage more carefully with questions of sex/gender.
Collapse
Affiliation(s)
- Kendra Albert
- Cyberlaw Clinic, Harvard Law School, Cambridge, MA 02138, USA
| | - Maggie Delano
- Engineering Department, Swarthmore College, Swarthmore, PA 19146, USA
| |
Collapse
|
18
|
Čartolovni A, Tomičić A, Lazić Mosler E. Ethical, legal, and social considerations of AI-based medical decision-support tools: A scoping review. Int J Med Inform 2022; 161:104738. [PMID: 35299098 DOI: 10.1016/j.ijmedinf.2022.104738] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 02/11/2022] [Accepted: 03/10/2022] [Indexed: 10/18/2022]
Abstract
INTRODUCTION Recent developments in the field of Artificial Intelligence (AI) applied to healthcare promise to solve many of the existing global issues in advancing human health and managing global health challenges. This comprehensive review aims not only to surface the underlying ethical and legal but also social implications (ELSI) that have been overlooked in recent reviews while deserving equal attention in the development stage, and certainly ahead of implementation in healthcare. It is intended to guide various stakeholders (eg. designers, engineers, clinicians) in addressing the ELSI of AI at the design stage using the Ethics by Design (EbD) approach. METHODS The authors followed a systematised scoping methodology and searched the following databases: Pubmed, Web of science, Ovid, Scopus, IEEE Xplore, EBSCO Search (Academic Search Premier, CINAHL, PSYCINFO, APA PsycArticles, ERIC) for the ELSI of AI in healthcare through January 2021. Data were charted and synthesised, and the authors conducted a descriptive and thematic analysis of the collected data. RESULTS After reviewing 1108 papers, 94 were included in the final analysis. Our results show a growing interest in the academic community for ELSI in the field of AI. The main issues of concern identified in our analysis fall into four main clusters of impact: AI algorithms, physicians, patients, and healthcare in general. The most prevalent issues are patient safety, algorithmic transparency, lack of proper regulation, liability & accountability, impact on patient-physician relationship and governance of AI empowered healthcare. CONCLUSIONS The results of our review confirm the potential of AI to significantly improve patient care, but the drawbacks to its implementation relate to complex ELSI that have yet to be addressed. Most ELSI refer to the impact on and extension of the reciprocal and fiduciary patient-physician relationship. With the integration of AIbased decision making tools, a bilateral patient-physician relationship may shift into a trilateral one.
Collapse
Affiliation(s)
- Anto Čartolovni
- Digital Healthcare Ethics Laboratory (Digit-HeaL), Catholic University of Croatia, Ilica 242, 10 000 Zagreb, Croatia; School of Medicine, Catholic University of Croatia, Ilica 242, 10 000 Zagreb, Croatia.
| | - Ana Tomičić
- Digital Healthcare Ethics Laboratory (Digit-HeaL), Catholic University of Croatia, Ilica 242, 10 000 Zagreb, Croatia.
| | - Elvira Lazić Mosler
- School of Medicine, Catholic University of Croatia, Ilica 242, 10 000 Zagreb, Croatia; General Hospital Dr. Ivo Pedišić, Sisak, Croatia.
| |
Collapse
|
19
|
SHIFTing artificial intelligence to be responsible in healthcare: A systematic review. Soc Sci Med 2022; 296:114782. [DOI: 10.1016/j.socscimed.2022.114782] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 02/02/2022] [Accepted: 02/03/2022] [Indexed: 12/12/2022]
|
20
|
Leo CG, Tumolo MR, Sabina S, Colella R, Recchia V, Ponzini G, Fotiadis DI, Bodini A, Mincarone P. Health Technology Assessment for In Silico Medicine: Social, Ethical and Legal Aspects. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19031510. [PMID: 35162529 PMCID: PMC8835251 DOI: 10.3390/ijerph19031510] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/25/2022] [Accepted: 01/26/2022] [Indexed: 12/28/2022]
Abstract
The application of in silico medicine is constantly growing in the prevention, diagnosis, and treatment of diseases. These technologies allow us to support medical decisions and self-management and reduce, refine, and partially replace real studies of medical technologies. In silico medicine may challenge some key principles: transparency and fairness of data usage; data privacy and protection across platforms and systems; data availability and quality; data integration and interoperability; intellectual property; data sharing; equal accessibility for persons and populations. Several social, ethical, and legal issues may consequently arise from its adoption. In this work, we provide an overview of these issues along with some practical suggestions for their assessment from a health technology assessment perspective. We performed a narrative review with a search on MEDLINE/Pubmed, ISI Web of Knowledge, Scopus, and Google Scholar. The following key aspects emerge as general reflections with an impact on the operational level: cultural resistance, level of expertise of users, degree of patient involvement, infrastructural requirements, risks for health, respect of several patients’ rights, potential discriminations for access and use of the technology, and intellectual property of innovations. Our analysis shows that several challenges still need to be debated to allow in silico medicine to express all its potential in healthcare processes.
Collapse
Affiliation(s)
- Carlo Giacomo Leo
- Institute of Clinical Physiology, National Research Council, 73100 Lecce, Italy; (C.G.L.); (M.R.T.); (V.R.)
| | - Maria Rosaria Tumolo
- Institute of Clinical Physiology, National Research Council, 73100 Lecce, Italy; (C.G.L.); (M.R.T.); (V.R.)
- Department of Biological and Environmental Sciences and Technology, University of Salento, 73100 Lecce, Italy;
| | - Saverio Sabina
- Institute of Clinical Physiology, National Research Council, 73100 Lecce, Italy; (C.G.L.); (M.R.T.); (V.R.)
- Correspondence:
| | - Riccardo Colella
- Department of Biological and Environmental Sciences and Technology, University of Salento, 73100 Lecce, Italy;
| | - Virginia Recchia
- Institute of Clinical Physiology, National Research Council, 73100 Lecce, Italy; (C.G.L.); (M.R.T.); (V.R.)
| | - Giuseppe Ponzini
- Institute for Research on Population and Social Policies, National Research Council, 72100 Brindisi, Italy; (G.P.); (P.M.)
| | - Dimitrios Ioannis Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece;
- Department of Biomedical Research, Institute of Molecular Biology and Biotechnology—Foundation for Research and Technology Hellas (IMBB-FORTH), 45115 Ioannina, Greece
| | - Antonella Bodini
- Institute for Applied Mathematics and Information Technologies “E. Magenes”, National Research Council, 20133 Milan, Italy;
| | - Pierpaolo Mincarone
- Institute for Research on Population and Social Policies, National Research Council, 72100 Brindisi, Italy; (G.P.); (P.M.)
| |
Collapse
|
21
|
de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, Aardoom JJ, Debray TPA, Schuit E, van Smeden M, Reitsma JB, Steyerberg EW, Chavannes NH, Moons KGM. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med 2022; 5:2. [PMID: 35013569 PMCID: PMC8748878 DOI: 10.1038/s41746-021-00549-7] [Citation(s) in RCA: 105] [Impact Index Per Article: 52.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022] Open
Abstract
While the opportunities of ML and AI in healthcare are promising, the growth of complex data-driven prediction models requires careful quality and applicability assessment before they are applied and disseminated in daily practice. This scoping review aimed to identify actionable guidance for those closely involved in AI-based prediction model (AIPM) development, evaluation and implementation including software engineers, data scientists, and healthcare professionals and to identify potential gaps in this guidance. We performed a scoping review of the relevant literature providing guidance or quality criteria regarding the development, evaluation, and implementation of AIPMs using a comprehensive multi-stage screening strategy. PubMed, Web of Science, and the ACM Digital Library were searched, and AI experts were consulted. Topics were extracted from the identified literature and summarized across the six phases at the core of this review: (1) data preparation, (2) AIPM development, (3) AIPM validation, (4) software development, (5) AIPM impact assessment, and (6) AIPM implementation into daily healthcare practice. From 2683 unique hits, 72 relevant guidance documents were identified. Substantial guidance was found for data preparation, AIPM development and AIPM validation (phases 1-3), while later phases clearly have received less attention (software development, impact assessment and implementation) in the scientific literature. The six phases of the AIPM development, evaluation and implementation cycle provide a framework for responsible introduction of AI-based prediction models in healthcare. Additional domain and technology specific research may be necessary and more practical experience with implementing AIPMs is needed to support further guidance.
Collapse
Affiliation(s)
- Anne A H de Hond
- Department of Information Technology and Digital Innovation, Leiden University Medical Center, Leiden, The Netherlands.
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands.
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ilse M J Kant
- Department of Information Technology and Digital Innovation, Leiden University Medical Center, Leiden, The Netherlands
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Steven W J Nijman
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Hendrikus J A van Os
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- National eHealth Living Lab, Leiden, The Netherlands
| | - Jiska J Aardoom
- National eHealth Living Lab, Leiden, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewout W Steyerberg
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Niels H Chavannes
- National eHealth Living Lab, Leiden, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
22
|
Chauhan C, Gullapalli RR. Ethics of AI in Pathology: Current Paradigms and Emerging Issues. THE AMERICAN JOURNAL OF PATHOLOGY 2021; 191:1673-1683. [PMID: 34252382 PMCID: PMC8485059 DOI: 10.1016/j.ajpath.2021.06.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 06/18/2021] [Accepted: 06/24/2021] [Indexed: 02/06/2023]
Abstract
Deep learning has rapidly advanced artificial intelligence (AI) and algorithmic decision-making (ADM) paradigms, affecting many traditional fields of medicine, including pathology, which is a heavily data-centric specialty of medicine. The structured nature of pathology data repositories makes it highly attractive to AI researchers to train deep learning models to improve health care delivery. Additionally, there are enormous financial incentives driving adoption of AI and ADM due to promise of increased efficiency of the health care delivery process. AI, if used unethically, may exacerbate existing inequities of health care, especially if not implemented correctly. There is an urgent need to harness the vast power of AI in an ethically and morally justifiable manner. This review explores the key issues involving AI ethics in pathology. Issues related to ethical design of pathology AI studies and the potential risks associated with implementation of AI and ADM within the pathology workflow are discussed. Three key foundational principles of ethical AI: transparency, accountability, and governance, are described in the context of pathology. The future practice of pathology must be guided by these principles. Pathologists should be aware of the potential of AI to deliver superlative health care and the ethical pitfalls associated with it. Finally, pathologists must have a seat at the table to drive future implementation of ethical AI in the practice of pathology.
Collapse
Affiliation(s)
- Chhavi Chauhan
- American Society of Investigative Pathology, Rockville, Maryland
| | - Rama R Gullapalli
- Department of Pathology, University of New Mexico, Albuquerque, New Mexico; Department of Chemical and Biological Engineering, University of New Mexico, Albuquerque, New Mexico.
| |
Collapse
|
23
|
Goirand M, Austin E, Clay-Williams R. Implementing Ethics in Healthcare AI-Based Applications: A Scoping Review. SCIENCE AND ENGINEERING ETHICS 2021; 27:61. [PMID: 34480239 DOI: 10.1007/s11948-021-00336-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 08/04/2021] [Indexed: 06/13/2023]
Abstract
A number of Artificial Intelligence (AI) ethics frameworks have been published in the last 6 years in response to the growing concerns posed by the adoption of AI in different sectors, including healthcare. While there is a strong culture of medical ethics in healthcare applications, AI-based Healthcare Applications (AIHA) are challenging the existing ethics and regulatory frameworks. This scoping review explores how ethics frameworks have been implemented in AIHA, how these implementations have been evaluated and whether they have been successful. AI specific ethics frameworks in healthcare appear to have a limited adoption and they are mostly used in conjunction with other ethics frameworks. The operationalisation of ethics frameworks is a complex endeavour with challenges at different levels: ethics principles, design, technology, organisational, and regulatory. Strategies identified in this review are proactive, contextual, technological, checklist, organisational and/or evidence-based approaches. While interdisciplinary approaches show promises, how an ethics framework is implemented in an AI-based Healthcare Application is not widely reported, and there is a need for transparency for trustworthy AI.
Collapse
Affiliation(s)
- Magali Goirand
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Elizabeth Austin
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Robyn Clay-Williams
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
24
|
Martinez-Martin N, Greely HT, Cho MK. Ethical Development of Digital Phenotyping Tools for Mental Health Applications: Delphi Study. JMIR Mhealth Uhealth 2021; 9:e27343. [PMID: 34319252 PMCID: PMC8367187 DOI: 10.2196/27343] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 05/06/2021] [Accepted: 05/21/2021] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Digital phenotyping (also known as personal sensing, intelligent sensing, or body computing) involves the collection of biometric and personal data in situ from digital devices, such as smartphones, wearables, or social media, to measure behavior or other health indicators. The collected data are analyzed to generate moment-by-moment quantification of a person's mental state and potentially predict future mental states. Digital phenotyping projects incorporate data from multiple sources, such as electronic health records, biometric scans, or genetic testing. As digital phenotyping tools can be used to study and predict behavior, they are of increasing interest for a range of consumer, government, and health care applications. In clinical care, digital phenotyping is expected to improve mental health diagnoses and treatment. At the same time, mental health applications of digital phenotyping present significant areas of ethical concern, particularly in terms of privacy and data protection, consent, bias, and accountability. OBJECTIVE This study aims to develop consensus statements regarding key areas of ethical guidance for mental health applications of digital phenotyping in the United States. METHODS We used a modified Delphi technique to identify the emerging ethical challenges posed by digital phenotyping for mental health applications and to formulate guidance for addressing these challenges. Experts in digital phenotyping, data science, mental health, law, and ethics participated as panelists in the study. The panel arrived at consensus recommendations through an iterative process involving interviews and surveys. The panelists focused primarily on clinical applications for digital phenotyping for mental health but also included recommendations regarding transparency and data protection to address potential areas of misuse of digital phenotyping data outside of the health care domain. RESULTS The findings of this study showed strong agreement related to these ethical issues in the development of mental health applications of digital phenotyping: privacy, transparency, consent, accountability, and fairness. Consensus regarding the recommendation statements was strongest when the guidance was stated broadly enough to accommodate a range of potential applications. The privacy and data protection issues that the Delphi participants found particularly critical to address related to the perceived inadequacies of current regulations and frameworks for protecting sensitive personal information and the potential for sale and analysis of personal data outside of health systems. CONCLUSIONS The Delphi study found agreement on a number of ethical issues to prioritize in the development of digital phenotyping for mental health applications. The Delphi consensus statements identified general recommendations and principles regarding the ethical application of digital phenotyping to mental health. As digital phenotyping for mental health is implemented in clinical care, there remains a need for empirical research and consultation with relevant stakeholders to further understand and address relevant ethical issues.
Collapse
Affiliation(s)
- Nicole Martinez-Martin
- Center for Biomedical Ethics, School of Medicine, Stanford University, Stanford, CA, United States
| | | | - Mildred K Cho
- Center for Biomedical Ethics, School of Medicine, Stanford University, Stanford, CA, United States
| |
Collapse
|
25
|
Antes AL, Burrous S, Sisk BA, Schuelke MJ, Keune JD, DuBois JM. Exploring perceptions of healthcare technologies enabled by artificial intelligence: an online, scenario-based survey. BMC Med Inform Decis Mak 2021; 21:221. [PMID: 34284756 PMCID: PMC8293482 DOI: 10.1186/s12911-021-01586-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/02/2021] [Indexed: 01/14/2023] Open
Abstract
Background Healthcare is expected to increasingly integrate technologies enabled by artificial intelligence (AI) into patient care. Understanding perceptions of these tools is essential to successful development and adoption. This exploratory study gauged participants’ level of openness, concern, and perceived benefit associated with AI-driven healthcare technologies. We also explored socio-demographic, health-related, and psychosocial correlates of these perceptions. Methods We developed a measure depicting six AI-driven technologies that either diagnose, predict, or suggest treatment. We administered the measure via an online survey to adults (N = 936) in the United States using MTurk, a crowdsourcing platform. Participants indicated their level of openness to using the AI technology in the healthcare scenario. Items reflecting potential concerns and benefits associated with each technology accompanied the scenarios. Participants rated the extent that the statements of concerns and benefits influenced their perception of favorability toward the technology. Participants completed measures of socio-demographics, health variables, and psychosocial variables such as trust in the healthcare system and trust in technology. Exploratory and confirmatory factor analyses of the concern and benefit items identified two factors representing overall level of concern and perceived benefit. Descriptive analyses examined levels of openness, concern, and perceived benefit. Correlational analyses explored associations of socio-demographic, health, and psychosocial variables with openness, concern, and benefit scores while multivariable regression models examined these relationships concurrently. Results Participants were moderately open to AI-driven healthcare technologies (M = 3.1/5.0 ± 0.9), but there was variation depending on the type of application, and the statements of concerns and benefits swayed views. Trust in the healthcare system and trust in technology were the strongest, most consistent correlates of openness, concern, and perceived benefit. Most other socio-demographic, health-related, and psychosocial variables were less strongly, or not, associated, but multivariable models indicated some personality characteristics (e.g., conscientiousness and agreeableness) and socio-demographics (e.g., full-time employment, age, sex, and race) were modestly related to perceptions. Conclusions Participants’ openness appears tenuous, suggesting early promotion strategies and experiences with novel AI technologies may strongly influence views, especially if implementation of AI technologies increases or undermines trust. The exploratory nature of these findings warrants additional research. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01586-8.
Collapse
Affiliation(s)
- Alison L Antes
- Bioethics Research Center, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
| | - Sara Burrous
- Bioethics Research Center, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Bryan A Sisk
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Matthew J Schuelke
- Division of Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Jason D Keune
- Departments of Surgery and Health Care Ethics, Bander Center for Medical Business Ethics, Saint Louis University, St. Louis, MO, USA
| | - James M DuBois
- Bioethics Research Center, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| |
Collapse
|
26
|
Abstract
Machine learning models are built using training data, which is collected from human experience and is prone to bias. Humans demonstrate a cognitive bias in their thinking and behavior, which is ultimately reflected in the collected data. From Amazon’s hiring system, which was built using ten years of human hiring experience, to a judicial system that was trained using human judging practices, these systems all include some element of bias. The best machine learning models are said to mimic humans’ cognitive ability, and thus such models are also inclined towards bias. However, detecting and evaluating bias is a very important step for better explainable models. In this work, we aim to explain bias in learning models in relation to humans’ cognitive bias and propose a wrapper technique to detect and evaluate bias in machine learning models using an openly accessible dataset from UCI Machine Learning Repository. In the deployed dataset, the potentially biased attributes (PBAs) are gender and race. This study introduces the concept of alternation functions to swap the values of PBAs, and evaluates the impact on prediction using KL divergence. Results demonstrate females and Asians to be associated with low wages, placing some open research questions for the research community to ponder over.
Collapse
|
27
|
Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care. Crit Care Explor 2021; 3:e0450. [PMID: 34136824 PMCID: PMC8202578 DOI: 10.1097/cce.0000000000000450] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Supplemental Digital Content is available in the text. OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data. SETTING: ICUs at University of California San Francisco and Beth Israel Deaconess Medical Center. SUBJECTS: Ten thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Mortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80–0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85–0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83). CONCLUSIONS: Differences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting.
Collapse
|
28
|
Petersen C, Smith J, Freimuth RR, Goodman KW, Jackson GP, Kannry J, Liu H, Madhavan S, Sittig DF, Wright A. Recommendations for the safe, effective use of adaptive CDS in the US healthcare system: an AMIA position paper. J Am Med Inform Assoc 2021; 28:677-684. [PMID: 33447854 DOI: 10.1093/jamia/ocaa319] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 12/01/2020] [Indexed: 02/07/2023] Open
Abstract
The development and implementation of clinical decision support (CDS) that trains itself and adapts its algorithms based on new data-here referred to as Adaptive CDS-present unique challenges and considerations. Although Adaptive CDS represents an expected progression from earlier work, the activities needed to appropriately manage and support the establishment and evolution of Adaptive CDS require new, coordinated initiatives and oversight that do not currently exist. In this AMIA position paper, the authors describe current and emerging challenges to the safe use of Adaptive CDS and lay out recommendations for the effective management and monitoring of Adaptive CDS.
Collapse
Affiliation(s)
- Carolyn Petersen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jeffery Smith
- The Office of the National Coordinator for Health Information Technology, Washington, DC, USA
| | - Robert R Freimuth
- Division of Digital Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Kenneth W Goodman
- Institute for Bioethics and Health Policy, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Gretchen Purcell Jackson
- IBM Watson Health, Cambridge, Massachusetts, USA.,Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Joseph Kannry
- Mount Sinai Health System, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Subha Madhavan
- Department of Oncology, Georgetown Lombardi Comprehensive Cancer Center, Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Dean F Sittig
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, UT-Memorial Hermann Center for Healthcare Quality & Safety, Houston, Texas, USA
| | - Adam Wright
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
29
|
Jackson BR, Ye Y, Crawford JM, Becich MJ, Roy S, Botkin JR, de Baca ME, Pantanowitz L. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice. Acad Pathol 2021; 8:2374289521990784. [PMID: 33644301 PMCID: PMC7894680 DOI: 10.1177/2374289521990784] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/24/2020] [Accepted: 12/28/2020] [Indexed: 12/24/2022] Open
Abstract
Growing numbers of artificial intelligence applications are being developed and applied to pathology and laboratory medicine. These technologies introduce risks and benefits that must be assessed and managed through the lens of ethics. This article describes how long-standing principles of medical and scientific ethics can be applied to artificial intelligence using examples from pathology and laboratory medicine.
Collapse
Affiliation(s)
- Brian R. Jackson
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
- ARUP Laboratories, Salt Lake City, UT, USA
| | - Ye Ye
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - James M. Crawford
- Department of Pathology and Laboratory Medicine, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
| | - Michael J. Becich
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Somak Roy
- Division of Pathology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Jeffrey R. Botkin
- Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | |
Collapse
|
30
|
Smith J. Setting the agenda: an informatics-led policy framework for adaptive CDS. J Am Med Inform Assoc 2020; 27:1831-1833. [PMID: 33301025 PMCID: PMC7727380 DOI: 10.1093/jamia/ocaa239] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Indexed: 03/31/2024] Open
Affiliation(s)
- Jeffery Smith
- American Medical Informatics Association, Bethesda, Maryland, USA
| |
Collapse
|
31
|
Pfohl SR, Foryciarz A, Shah NH. An empirical characterization of fair machine learning for clinical risk prediction. J Biomed Inform 2020; 113:103621. [PMID: 33220494 DOI: 10.1016/j.jbi.2020.103621] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/06/2020] [Accepted: 11/05/2020] [Indexed: 11/19/2022]
Abstract
The use of machine learning to guide clinical decision making has the potential to worsen existing health disparities. Several recent works frame the problem as that of algorithmic fairness, a framework that has attracted considerable attention and criticism. However, the appropriateness of this framework is unclear due to both ethical as well as technical considerations, the latter of which include trade-offs between measures of fairness and model performance that are not well-understood for predictive models of clinical outcomes. To inform the ongoing debate, we conduct an empirical study to characterize the impact of penalizing group fairness violations on an array of measures of model performance and group fairness. We repeat the analysis across multiple observational healthcare databases, clinical outcomes, and sensitive attributes. We find that procedures that penalize differences between the distributions of predictions across groups induce nearly-universal degradation of multiple performance metrics within groups. On examining the secondary impact of these procedures, we observe heterogeneity of the effect of these procedures on measures of fairness in calibration and ranking across experimental conditions. Beyond the reported trade-offs, we emphasize that analyses of algorithmic fairness in healthcare lack the contextual grounding and causal awareness necessary to reason about the mechanisms that lead to health disparities, as well as about the potential of algorithmic fairness methods to counteract those mechanisms. In light of these limitations, we encourage researchers building predictive models for clinical use to step outside the algorithmic fairness frame and engage critically with the broader sociotechnical context surrounding the use of machine learning in healthcare.
Collapse
Affiliation(s)
- Stephen R Pfohl
- Stanford Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA 94305, United States of America.
| | - Agata Foryciarz
- Stanford Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA 94305, United States of America; Computer Science Department, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, United States of America.
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA 94305, United States of America.
| |
Collapse
|