1
|
Wahid KA, Cardenas CE, Marquez B, Netherton TJ, Kann BH, Court LE, He R, Naser MA, Moreno AC, Fuller CD, Fuentes D. Evolving Horizons in Radiation Therapy Auto-Contouring: Distilling Insights, Embracing Data-Centric Frameworks, and Moving Beyond Geometric Quantification. Adv Radiat Oncol 2024; 9:101521. [PMID: 38799110 PMCID: PMC11111585 DOI: 10.1016/j.adro.2024.101521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/26/2024] [Indexed: 05/29/2024] Open
Affiliation(s)
- Kareem A. Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Carlos E. Cardenas
- Department of Radiation Oncology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Barbara Marquez
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, Texas
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Tucker J. Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Benjamin H. Kann
- Department of Radiation Oncology, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts
| | - Laurence E. Court
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Amy C. Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
2
|
Zhang L, Richter LR, Wang Y, Ostropolets A, Elhadad N, Blei DM, Hripcsak G. Causal fairness assessment of treatment allocation with electronic health records. J Biomed Inform 2024; 155:104656. [PMID: 38782170 PMCID: PMC11180553 DOI: 10.1016/j.jbi.2024.104656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/31/2023] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
OBJECTIVE Healthcare continues to grapple with the persistent issue of treatment disparities, sparking concerns regarding the equitable allocation of treatments in clinical practice. While various fairness metrics have emerged to assess fairness in decision-making processes, a growing focus has been on causality-based fairness concepts due to their capacity to mitigate confounding effects and reason about bias. However, the application of causal fairness notions in evaluating the fairness of clinical decision-making with electronic health record (EHR) data remains an understudied domain. This study aims to address the methodological gap in assessing causal fairness of treatment allocation with electronic health records data. In addition, we investigate the impact of social determinants of health on the assessment of causal fairness of treatment allocation. METHODS We propose a causal fairness algorithm to assess fairness in clinical decision-making. Our algorithm accounts for the heterogeneity of patient populations and identifies potential unfairness in treatment allocation by conditioning on patients who have the same likelihood to benefit from the treatment. We apply this framework to a patient cohort with coronary artery disease derived from an EHR database to evaluate the fairness of treatment decisions. RESULTS Our analysis reveals notable disparities in coronary artery bypass grafting (CABG) allocation among different patient groups. Women were found to be 4.4%-7.7% less likely to receive CABG than men in two out of four treatment response strata. Similarly, Black or African American patients were 5.4%-8.7% less likely to receive CABG than others in three out of four response strata. These results were similar when social determinants of health (insurance and area deprivation index) were dropped from the algorithm. These findings highlight the presence of disparities in treatment allocation among similar patients, suggesting potential unfairness in the clinical decision-making process. CONCLUSION This study introduces a novel approach for assessing the fairness of treatment allocation in healthcare. By incorporating responses to treatment into fairness framework, our method explores the potential of quantifying fairness from a causal perspective using EHR data. Our research advances the methodological development of fairness assessment in healthcare and highlight the importance of causality in determining treatment fairness.
Collapse
Affiliation(s)
- Linying Zhang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Lauren R Richter
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Yixin Wang
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Department of Computer Science, Columbia University, New York, NY, USA
| | - David M Blei
- Department of Statistics, Columbia University, New York, NY, USA; Department of Computer Science, Columbia University, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
3
|
Jain SS, Elias P, Poterucha T, Randazzo M, Lopez Jimenez F, Khera R, Perez M, Ouyang D, Pirruccello J, Salerno M, Einstein AJ, Avram R, Tison GH, Nadkarni G, Natarajan V, Pierson E, Beecy A, Kumaraiah D, Haggerty C, Avari Silva JN, Maddox TM. Artificial Intelligence in Cardiovascular Care-Part 2: Applications: JACC Review Topic of the Week. J Am Coll Cardiol 2024; 83:2487-2496. [PMID: 38593945 DOI: 10.1016/j.jacc.2024.03.401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 03/14/2024] [Indexed: 04/11/2024]
Abstract
Recent artificial intelligence (AI) advancements in cardiovascular care offer potential enhancements in effective diagnosis, treatment, and outcomes. More than 600 U.S. Food and Drug Administration-approved clinical AI algorithms now exist, with 10% focusing on cardiovascular applications, highlighting the growing opportunities for AI to augment care. This review discusses the latest advancements in the field of AI, with a particular focus on the utilization of multimodal inputs and the field of generative AI. Further discussions in this review involve an approach to understanding the larger context in which AI-augmented care may exist, and include a discussion of the need for rigorous evaluation, appropriate infrastructure for deployment, ethics and equity assessments, regulatory oversight, and viable business cases for deployment. Embracing this rapidly evolving technology while setting an appropriately high evaluation benchmark with careful and patient-centered implementation will be crucial for cardiology to leverage AI to enhance patient care and the provider experience.
Collapse
Affiliation(s)
- Sneha S Jain
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - Pierre Elias
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA; Department of Biomedical Informatics Columbia University Irving Medical Center, New York, New York, USA
| | - Timothy Poterucha
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA
| | - Michael Randazzo
- Division of Cardiology, University of Chicago Medical Center, Chicago, Illinois, USA
| | | | - Rohan Khera
- Division of Cardiology, Yale School of Medicine, New Haven, Connecticut, USA
| | - Marco Perez
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - David Ouyang
- Division of Cardiology, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, California, USA
| | - Michael Salerno
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - Andrew J Einstein
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA
| | - Robert Avram
- Division of Cardiology, Montreal Heart Institute, Montreal, Quebec, Canada
| | - Geoffrey H Tison
- Division of Cardiology, University of California San Francisco, San Francisco, California, USA
| | - Girish Nadkarni
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | - Emma Pierson
- Department of Computer Science, Cornell Tech, New York, New York, USA
| | - Ashley Beecy
- NewYork-Presbyterian Health System, New York, New York, USA; Division of Cardiology, Weill Cornell Medical College, New York, New York, USA
| | - Deepa Kumaraiah
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA; NewYork-Presbyterian Health System, New York, New York, USA
| | - Chris Haggerty
- Department of Biomedical Informatics Columbia University Irving Medical Center, New York, New York, USA; NewYork-Presbyterian Health System, New York, New York, USA
| | - Jennifer N Avari Silva
- Division of Cardiology, Washington University School of Medicine, St Louis, Missouri, USA
| | - Thomas M Maddox
- Division of Cardiology, Washington University School of Medicine, St Louis, Missouri, USA.
| |
Collapse
|
4
|
Yang H, Zhu D, He S, Xu Z, Liu Z, Zhang W, Cai J. Enhancing psychiatric rehabilitation outcomes through a multimodal multitask learning model based on BERT and TabNet: An approach for personalized treatment and improved decision-making. Psychiatry Res 2024; 336:115896. [PMID: 38626625 DOI: 10.1016/j.psychres.2024.115896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 04/03/2024] [Accepted: 04/05/2024] [Indexed: 04/18/2024]
Abstract
Evaluating the rehabilitation status of individuals with serious mental illnesses (SMI) necessitates a comprehensive analysis of multimodal data, including unstructured text records and structured diagnostic data. However, progress in the effective assessment of rehabilitation status remains limited. Our study develops a deep learning model integrating Bidirectional Encoder Representations from Transformers (BERT) and TabNet through a late fusion strategy to enhance rehabilitation prediction, including referral risk, dangerous behaviors, self-awareness, and medication adherence, in patients with SMI. BERT processes unstructured textual data, such as doctor's notes, whereas TabNet manages structured diagnostic information. The model's interpretability function serves to assist healthcare professionals in understanding the model's predictive decisions, improving patient care. Our model exhibited excellent predictive performance for all four tasks, with an accuracy exceeding 0.78 and an area under the curve of 0.70. In addition, a series of tests proved the model's robustness, fairness, and interpretability. This study combines multimodal and multitask learning strategies into a model and applies it to rehabilitation assessment tasks, offering a promising new tool that can be seamlessly integrated with the clinical workflow to support the provision of optimized patient care.
Collapse
Affiliation(s)
- Hongyi Yang
- School of Design, Shanghai Jiao Tong University, Shanghai, China
| | - Dian Zhu
- School of Design, Shanghai Jiao Tong University, Shanghai, China
| | - Siyuan He
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhiqi Xu
- School of Design, Shanghai Jiao Tong University, Shanghai, China
| | - Zhao Liu
- School of Design, Shanghai Jiao Tong University, Shanghai, China.
| | - Weibo Zhang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China; Mental Health Branch, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China.
| | - Jun Cai
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Mental Health Branch, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
5
|
McMahon GT. The Risks and Challenges of Artificial Intelligence in Endocrinology. J Clin Endocrinol Metab 2024; 109:e1468-e1471. [PMID: 38471009 DOI: 10.1210/clinem/dgae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Indexed: 03/14/2024]
Abstract
Artificial intelligence (AI) holds the promise of addressing many of the numerous challenges healthcare faces, which include a growing burden of illness, an increase in chronic health conditions and disabilities due to aging and epidemiological changes, higher demand for health services, overworked and burned-out clinicians, greater societal expectations, and rising health expenditures. While technological advancements in processing power, memory, storage, and the abundance of data have empowered computers to handle increasingly complex tasks with remarkable success, AI introduces a variety of meaningful risks and challenges. Among these are issues related to accuracy and reliability, bias and equity, errors and accountability, transparency, misuse, and privacy of data. As AI systems continue to rapidly integrate into healthcare settings, it is crucial to recognize the inherent risks they bring. These risks demand careful consideration to ensure the responsible and safe deployment of AI in healthcare.
Collapse
Affiliation(s)
- Graham T McMahon
- Accreditation Council for Continuing Medical Education, Chicago, IL 60611, USA
- Department of Medical Education and Division of Endocrinology, Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
6
|
Teotia K, Jia Y, Link Woite N, Celi LA, Matos J, Struja T. Variation in monitoring: Glucose measurement in the ICU as a case study to preempt spurious correlations. J Biomed Inform 2024; 153:104643. [PMID: 38621640 PMCID: PMC11103268 DOI: 10.1016/j.jbi.2024.104643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 03/29/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]
Abstract
OBJECTIVE Health inequities can be influenced by demographic factors such as race and ethnicity, proficiency in English, and biological sex. Disparities may manifest as differential likelihood of testing which correlates directly with the likelihood of an intervention to address an abnormal finding. Our retrospective observational study evaluated the presence of variation in glucose measurements in the Intensive Care Unit (ICU). METHODS Using the MIMIC-IV database (2008-2019), a single-center, academic referral hospital in Boston (USA), we identified adult patients meeting sepsis-3 criteria. Exclusion criteria were diabetic ketoacidosis, ICU length of stay under 1 day, and unknown race or ethnicity. We performed a logistic regression analysis to assess differential likelihoods of glucose measurements on day 1. A negative binomial regression was fitted to assess the frequency of subsequent glucose readings. Analyses were adjusted for relevant clinical confounders, and performed across three disparity proxy axes: race and ethnicity, sex, and English proficiency. RESULTS We studied 24,927 patients, of which 19.5% represented racial and ethnic minority groups, 42.4% were female, and 9.8% had limited English proficiency. No significant differences were found for glucose measurement on day 1 in the ICU. This pattern was consistent irrespective of the axis of analysis, i.e. race and ethnicity, sex, or English proficiency. Conversely, subsequent measurement frequency revealed potential disparities. Specifically, males (incidence rate ratio (IRR) 1.06, 95% confidence interval (CI) 1.01 - 1.21), patients who identify themselves as Hispanic (IRR 1.11, 95% CI 1.01 - 1.21), or Black (IRR 1.06, 95% CI 1.01 - 1.12), and patients being English proficient (IRR 1.08, 95% CI 1.01 - 1.15) had higher chances of subsequent glucose readings. CONCLUSION We found disparities in ICU glucose measurements among patients with sepsis, albeit the magnitude was small. Variation in disease monitoring is a source of data bias that may lead to spurious correlations when modeling health data.
Collapse
Affiliation(s)
- Khushboo Teotia
- Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yueran Jia
- Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Naira Link Woite
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| | - João Matos
- Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Faculty of Engineering, University of Porto (FEUP), Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science (INESCTEC), Porto, Portugal.
| | - Tristan Struja
- Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Medical University Clinic, Kantonsspital Aarau, Aarau, Switzerland.
| |
Collapse
|
7
|
Fernández-Alvarez J, Molinari G, Kilcullen R, Delgadillo J, Drill R, Errázuriz P, Falkenstrom F, Firth N, O'Shea A, Paz C, Youn SJ, Castonguay LG. The Importance of Conducting Practice-oriented Research with Underserved Populations. ADMINISTRATION AND POLICY IN MENTAL HEALTH AND MENTAL HEALTH SERVICES RESEARCH 2024; 51:358-375. [PMID: 38157130 DOI: 10.1007/s10488-023-01337-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2023] [Indexed: 01/03/2024]
Abstract
There has been a growing emphasis on dissemination of empirically supported treatments. Dissemination, however, should not be restricted to treatment. It can and, in the spirit of the scientific-practitioner model, should also involve research. Because it focuses on the investigation of clinical routine as it takes place in local settings and because it can involve the collaboration of several stakeholders, practice-oriented research (POR) can be viewed as an optimal research method to be disseminated. POR has the potential of addressing particularly relevant gaps of knowledge and action when implemented in regions of the world that have limited resources for or experiences with empirical research, and/or in clinical settings that are serving clinical populations who are not typically receiving optimal mental care services - specifically, individuals in rural and inner cities that have limited economic and social resources. The establishment and maintenance of POR in such regions and/or settings, however, come with specific obstacles and challenges. Integrating the experiences acquired from research conducted in various continents (Africa, Europe, Latin America, and North America), the goal of this paper is to describe some of these challenges, strategies that have been implemented to address them, as well as new possible directions to facilitate the creation and growth of POR. It also describes how these challenges and ways to deal with them can provide helpful lessons for already existing POR infrastructures.
Collapse
Affiliation(s)
| | - Guadalupe Molinari
- International University of Valencia, Valencia, Spain
- Aiglé Valencia, Valencia, Spain
| | - Ryan Kilcullen
- Department of Psychology, The Pennsylvania State University, Pennsylvania, USA
| | - Jaime Delgadillo
- Clinical and Applied Psychology Unit, Department of Psychology, University of Sheffield, Sheffield, UK
| | - Rebecca Drill
- Department of Psychiatry, Cambridge Health Alliance, Cambridge, USA
| | - Paula Errázuriz
- Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
- Millennium Institute for Research on Depression and Personality, Chile, PsiConecta, Chile
| | | | - Nick Firth
- School of Health and Related Research, University of Sheffield, Sheffield, UK
| | - Amber O'Shea
- Department of Educational Psychology, Counseling, and Special Education, The Pennsylvania State University, Pennsylvania, USA
| | - Clara Paz
- Universidad de Las Américas, Ciudad de México, Ecuador
| | - Soo Jeong Youn
- Reliant Medical Group, OptumCare, Harvard Medical School, Worcester, MA, USA
| | - Louis G Castonguay
- Department of Psychology, The Pennsylvania State University, Pennsylvania, USA
| |
Collapse
|
8
|
Barea Mendoza JA, Valiente Fernandez M, Pardo Fernandez A, Gómez Álvarez J. Current perspectives on the use of artificial intelligence in critical patient safety. Med Intensiva 2024:S2173-5727(24)00080-8. [PMID: 38677902 DOI: 10.1016/j.medine.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/11/2024] [Indexed: 04/29/2024]
Abstract
Intensive Care Units (ICUs) have undergone enhancements in patient safety, and artificial intelligence (AI) emerges as a disruptive technology offering novel opportunities. While the published evidence is limited and presents methodological issues, certain areas show promise, such as decision support systems, detection of adverse events, and prescription error identification. The application of AI in safety may pursue predictive or diagnostic objectives. Implementing AI-based systems necessitates procedures to ensure secure assistance, addressing challenges including trust in such systems, biases, data quality, scalability, and ethical and confidentiality considerations. The development and application of AI demand thorough testing, encompassing retrospective data assessments, real-time validation with prospective cohorts, and efficacy demonstration in clinical trials. Algorithmic transparency and explainability are essential, with active involvement of clinical professionals being crucial in the implementation process.
Collapse
Affiliation(s)
- Jesús Abelardo Barea Mendoza
- UCI de Trauma y Emergencias. Servicio de Medicina Intensiva. Hospital Universitario 12 de Octubre. Instituto de Investigación Hospital 12 de Octubre, Spain.
| | - Marcos Valiente Fernandez
- UCI de Trauma y Emergencias. Servicio de Medicina Intensiva. Hospital Universitario 12 de Octubre. Instituto de Investigación Hospital 12 de Octubre, Spain
| | | | - Josep Gómez Álvarez
- Hospital Universitari de Tarragona Joan XXIII. Universitat Rovira i Virgili. Institut d'Investigació Sanitària Pere i Virgili, Tarragona, Spain
| |
Collapse
|
9
|
Wang HE, Weiner JP, Saria S, Kharrazi H. Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis. J Med Internet Res 2024; 26:e47125. [PMID: 38422347 PMCID: PMC11066744 DOI: 10.2196/47125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 12/28/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. OBJECTIVE This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. METHODS We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. RESULTS The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. CONCLUSIONS Caution must be taken when interpreting fairness measures' face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities.
Collapse
Affiliation(s)
- H Echo Wang
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
| | - Jonathan P Weiner
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
- Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States
| | - Suchi Saria
- Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Hadi Kharrazi
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
- Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States
| |
Collapse
|
10
|
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024; 385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
| | - Jennifer Catherine Camaradou
- Patient representative, Health Data Research UK patient and public involvement and engagement group
- Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
| | - Leo Anthony Celi
- Beth Israel Deaconess Medical Center, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, London, UK
| | - Alastair K Denniston
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Ben Glocker
- Department of Computing, Imperial College London, London, UK
| | - Robert M Golub
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | - Georg Heinze
- Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | | | - Emily Lam
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Naomi Lee
- National Institute for Health and Care Excellence, London, UK
| | - Elizabeth W Loder
- The BMJ, London, UK
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lena Maier-Hein
- Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
| | - Bilal A Mateen
- Institute of Health Informatics, University College London, London, UK
- Wellcome Trust, London, UK
- Alan Turing Institute, London, UK
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Johan Ordish
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Richard Parnell
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Sherri Rose
- Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
| | - Karandeep Singh
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
11
|
Perets O, Stagno E, Yehuda EB, McNichol M, Anthony Celi L, Rappoport N, Dorotic M. Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.09.24305594. [PMID: 38680842 PMCID: PMC11046491 DOI: 10.1101/2024.04.09.24305594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Objectives 1.1Biases inherent in electronic health records (EHRs), and therefore in medical artificial intelligence (AI) models may significantly exacerbate health inequities and challenge the adoption of ethical and responsible AI in healthcare. Biases arise from multiple sources, some of which are not as documented in the literature. Biases are encoded in how the data has been collected and labeled, by implicit and unconscious biases of clinicians, or by the tools used for data processing. These biases and their encoding in healthcare records undermine the reliability of such data and bias clinical judgments and medical outcomes. Moreover, when healthcare records are used to build data-driven solutions, the biases are further exacerbated, resulting in systems that perpetuate biases and induce healthcare disparities. This literature scoping review aims to categorize the main sources of biases inherent in EHRs. Methods 1.2We queried PubMed and Web of Science on January 19th, 2023, for peer-reviewed sources in English, published between 2016 and 2023, using the PRISMA approach to stepwise scoping of the literature. To select the papers that empirically analyze bias in EHR, from the initial yield of 430 papers, 27 duplicates were removed, and 403 studies were screened for eligibility. 196 articles were removed after the title and abstract screening, and 96 articles were excluded after the full-text review resulting in a final selection of 116 articles. Results 1.3Systematic categorizations of diverse sources of bias are scarce in the literature, while the effects of separate studies are often convoluted and methodologically contestable. Our categorization of published empirical evidence identified the six main sources of bias: a) bias arising from past clinical trials; b) data-related biases arising from missing, incomplete information or poor labeling of data; human-related bias induced by c) implicit clinician bias, d) referral and admission bias; e) diagnosis or risk disparities bias and finally, (f) biases in machinery and algorithms. Conclusions 1.4Machine learning and data-driven solutions can potentially transform healthcare delivery, but not without limitations. The core inputs in the systems (data and human factors) currently contain several sources of bias that are poorly documented and analyzed for remedies. The current evidence heavily focuses on data-related biases, while other sources are less often analyzed or anecdotal. However, these different sources of biases add to one another exponentially. Therefore, to understand the issues holistically we need to explore these diverse sources of bias. While racial biases in EHR have been often documented, other sources of biases have been less frequently investigated and documented (e.g. gender-related biases, sexual orientation discrimination, socially induced biases, and implicit, often unconscious, human-related cognitive biases). Moreover, some existing studies lack causal evidence, illustrating the different prevalences of disease across groups, which does not per se prove the causality. Our review shows that data-, human- and machine biases are prevalent in healthcare and they significantly impact healthcare outcomes and judgments and exacerbate disparities and differential treatment. Understanding how diverse biases affect AI systems and recommendations is critical. We suggest that researchers and medical personnel should develop safeguards and adopt data-driven solutions with a "bias-in-mind" approach. More empirical evidence is needed to tease out the effects of different sources of bias on health outcomes.
Collapse
|
12
|
Nichol AA, Halley M, Federico C, Cho MK, Sankar PL. Moral Engagement and Disengagement in Health Care AI Development. AJOB Empir Bioeth 2024:1-10. [PMID: 38588388 DOI: 10.1080/23294515.2024.2336906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
BACKGROUND Machine learning (ML) is utilized increasingly in health care, and can pose harms to patients, clinicians, health systems, and the public. In response, regulators have proposed an approach that would shift more responsibility to ML developers for mitigating potential harms. To be effective, this approach requires ML developers to recognize, accept, and act on responsibility for mitigating harms. However, little is known regarding the perspectives of developers themselves regarding their obligations to mitigate harms. METHODS We conducted 40 semi-structured interviews with developers of ML predictive analytics applications for health care in the United States. RESULTS Participants varied widely in their perspectives on personal responsibility and included examples of both moral engagement and disengagement, albeit in a variety of forms. While most (70%) of participants made a statement indicative of moral engagement, most of these statements reflected an awareness of moral issues, while only a subset of these included additional elements of engagement such as recognizing responsibility, alignment with personal values, addressing conflicts of interests, and opportunities for action. Further, we identified eight distinct categories of moral disengagement reflecting efforts to minimize potential harms or deflect personal responsibility for preventing or mitigating harms. CONCLUSIONS These findings suggest possible facilitators and barriers to the development of ethical ML that could act by encouraging moral engagement or discouraging moral disengagement. Regulatory approaches that depend on the ability of ML developers to recognize, accept, and act on responsibility for mitigating harms might have limited success without education and guidance for ML developers about the extent of their responsibilities and how to implement them.
Collapse
Affiliation(s)
- Ariadne A Nichol
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA
| | - Meghan Halley
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA
| | - Carole Federico
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA
| | - Mildred K Cho
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA
| | - Pamela L Sankar
- Department of Medical Ethics & Health Policy, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
13
|
Didier AJ, Nigro A, Noori Z, Omballi MA, Pappada SM, Hamouda DM. Application of machine learning for lung cancer survival prognostication-A systematic review and meta-analysis. Front Artif Intell 2024; 7:1365777. [PMID: 38646415 PMCID: PMC11026647 DOI: 10.3389/frai.2024.1365777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/18/2024] [Indexed: 04/23/2024] Open
Abstract
Introduction Machine learning (ML) techniques have gained increasing attention in the field of healthcare, including predicting outcomes in patients with lung cancer. ML has the potential to enhance prognostication in lung cancer patients and improve clinical decision-making. In this systematic review and meta-analysis, we aimed to evaluate the performance of ML models compared to logistic regression (LR) models in predicting overall survival in patients with lung cancer. Methods We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. A comprehensive search was conducted in Medline, Embase, and Cochrane databases using a predefined search query. Two independent reviewers screened abstracts and conflicts were resolved by a third reviewer. Inclusion and exclusion criteria were applied to select eligible studies. Risk of bias assessment was performed using predefined criteria. Data extraction was conducted using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS) checklist. Meta-analytic analysis was performed to compare the discriminative ability of ML and LR models. Results The literature search resulted in 3,635 studies, and 12 studies with a total of 211,068 patients were included in the analysis. Six studies reported confidence intervals and were included in the meta-analysis. The performance of ML models varied across studies, with C-statistics ranging from 0.60 to 0.85. The pooled analysis showed that ML models had higher discriminative ability compared to LR models, with a weighted average C-statistic of 0.78 for ML models compared to 0.70 for LR models. Conclusion Machine learning models show promise in predicting overall survival in patients with lung cancer, with superior discriminative ability compared to logistic regression models. However, further validation and standardization of ML models are needed before their widespread implementation in clinical practice. Future research should focus on addressing the limitations of the current literature, such as potential bias and heterogeneity among studies, to improve the accuracy and generalizability of ML models for predicting outcomes in patients with lung cancer. Further research and development of ML models in this field may lead to improved patient outcomes and personalized treatment strategies.
Collapse
Affiliation(s)
- Alexander J. Didier
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| | - Anthony Nigro
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| | - Zaid Noori
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| | - Mohamed A. Omballi
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| | - Scott M. Pappada
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
- Department of Anesthesiology, The University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| | - Danae M. Hamouda
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
- Division of Hematology and Oncology, Department of Medicine, The University of Toledo College of Medicine and Life Sciences, Toledo, OH, United States
| |
Collapse
|
14
|
Mehandru N, Miao BY, Almaraz ER, Sushil M, Butte AJ, Alaa A. Evaluating large language models as agents in the clinic. NPJ Digit Med 2024; 7:84. [PMID: 38570554 PMCID: PMC10991271 DOI: 10.1038/s41746-024-01083-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Affiliation(s)
- Nikita Mehandru
- University of California, Berkeley, 2195 Hearst Ave, Warren Hall Suite, 120C, Berkeley, CA, USA
| | - Brenda Y Miao
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Eduardo Rodriguez Almaraz
- Neurosurgery Department Division of Neuro-Oncology, University of California San Francisco, 400 Parnassus Avenue, 8th floor, RM A808, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, 400 Parnassus Avenue, 8th floor, RM A808, San Francisco, CA, USA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
- Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Ahmed Alaa
- University of California, Berkeley, 2195 Hearst Ave, Warren Hall Suite, 120C, Berkeley, CA, USA.
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
15
|
Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, Shalit U, van der Schaar M, Wagstaff KL. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS DIGITAL HEALTH 2024; 3:e0000474. [PMID: 38620047 PMCID: PMC11018283 DOI: 10.1371/journal.pdig.0000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/18/2024] [Indexed: 04/17/2024]
Abstract
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.
Collapse
Affiliation(s)
- Aparna Balagopalan
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
| | - Ioana Baldini
- IBM Research; Yorktown Heights, New York, United States of America
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center; Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, School of Medicine, Emory University; Atlanta, Georgia, United States of America
| | - Liam G. McCoy
- Division of Neurology, Department of Medicine, University of Alberta; Edmonton, Alberta, Canada
| | - Tristan Naumann
- Microsoft Research; Redmond, Washington, United States of America
| | - Uri Shalit
- The Faculty of Data and Decision Sciences, Technion; Haifa, Israel
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge; Cambridge, United Kingdom
- The Alan Turing Institute; London, United Kingdom
| | | |
Collapse
|
16
|
Wang R, Kuo PC, Chen LC, Seastedt KP, Gichoya JW, Celi LA. Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images. EBioMedicine 2024; 102:105047. [PMID: 38471396 PMCID: PMC10945176 DOI: 10.1016/j.ebiom.2024.105047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation. METHODS This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques. FINDINGS In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features. INTERPRETATION The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds. FUNDING National Science and Technology Council (Taiwan), National Institutes of Health.
Collapse
Affiliation(s)
- Ryan Wang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Po-Chih Kuo
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
| | - Li-Ching Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Kenneth Patrick Seastedt
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA; Department of Thoracic Surgery, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | | | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
17
|
Zhan K, Buhler KA, Chen IY, Fritzler MJ, Choi MY. Systemic lupus in the era of machine learning medicine. Lupus Sci Med 2024; 11:e001140. [PMID: 38443092 PMCID: PMC11146397 DOI: 10.1136/lupus-2023-001140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 01/26/2024] [Indexed: 03/07/2024]
Abstract
Artificial intelligence and machine learning applications are emerging as transformative technologies in medicine. With greater access to a diverse range of big datasets, researchers are turning to these powerful techniques for data analysis. Machine learning can reveal patterns and interactions between variables in large and complex datasets more accurately and efficiently than traditional statistical methods. Machine learning approaches open new possibilities for studying SLE, a multifactorial, highly heterogeneous and complex disease. Here, we discuss how machine learning methods are rapidly being integrated into the field of SLE research. Recent reports have focused on building prediction models and/or identifying novel biomarkers using both supervised and unsupervised techniques for understanding disease pathogenesis, early diagnosis and prognosis of disease. In this review, we will provide an overview of machine learning techniques to discuss current gaps, challenges and opportunities for SLE studies. External validation of most prediction models is still needed before clinical adoption. Utilisation of deep learning models, access to alternative sources of health data and increased awareness of the ethics, governance and regulations surrounding the use of artificial intelligence in medicine will help propel this exciting field forward.
Collapse
Affiliation(s)
- Kevin Zhan
- University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Katherine A Buhler
- University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Irene Y Chen
- Computational Precision Health, University of California Berkeley and University of California San Francisco, Berkeley, California, USA
- Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, California, USA
| | - Marvin J Fritzler
- University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - May Y Choi
- University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
- McCaig Institute for Bone and Joint Health, Calgary, Alberta, Canada
| |
Collapse
|
18
|
Khan L, Shahreen M, Qazi A, Jamil Ahmed Shah S, Hussain S, Chang HT. Migraine headache (MH) classification using machine learning methods with data augmentation. Sci Rep 2024; 14:5180. [PMID: 38431729 PMCID: PMC10908834 DOI: 10.1038/s41598-024-55874-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/28/2024] [Indexed: 03/05/2024] Open
Abstract
Migraine headache, a prevalent and intricate neurovascular disease, presents significant challenges in its clinical identification. Existing techniques that use subjective pain intensity measures are insufficiently accurate to make a reliable diagnosis. Even though headaches are a common condition with poor diagnostic specificity, they have a significant negative influence on the brain, body, and general human function. In this era of deeply intertwined health and technology, machine learning (ML) has emerged as a crucial force in transforming every aspect of healthcare, utilizing advanced facilities ML has shown groundbreaking achievements related to developing classification and automatic predictors. With this, deep learning models, in particular, have proven effective in solving complex problems spanning computer vision and data analytics. Consequently, the integration of ML in healthcare has become vital, especially in developing countries where limited medical resources and lack of awareness prevail, the urgent need to forecast and categorize migraines using artificial intelligence (AI) becomes even more crucial. By training these models on a publicly available dataset, with and without data augmentation. This study focuses on leveraging state-of-the-art ML algorithms, including support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), decision tree (DST), and deep neural networks (DNN), to predict and classify various types of migraines. The proposed models with data augmentations were trained to classify seven various types of migraine. The proposed models with data augmentations were trained to classify seven various types of migraine. The revealed results show that DNN, SVM, KNN, DST, and RF achieved an accuracy of 99.66%, 94.60%, 97.10%, 88.20%, and 98.50% respectively with data augmentation highlighting the transformative potential of AI in enhancing migraine diagnosis.
Collapse
Affiliation(s)
- Lal Khan
- Department of Computer Science, Ibadat International University Islamabad Pakpattan Campus, Pakpattan, Pakistan
| | - Moudasra Shahreen
- Department of Computer Science, Mir Chakar Khan Rind University, Sibi, Pakistan
| | - Atika Qazi
- Centre for Lifelong Learning, Universiti Brunei Darussalam, Bandar Seri Begawan, Brunei Darussalam
| | | | - Sabir Hussain
- Department of Agriculture, Mir Chakar Khan Rind University, Sibi, Pakistan
| | - Hsien-Tsung Chang
- Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan, Taiwan.
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
- Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
| |
Collapse
|
19
|
Chan SCC, Neves AL, Majeed A, Faisal A. Bridging the equity gap towards inclusive artificial intelligence in healthcare diagnostics. BMJ 2024; 384:q490. [PMID: 38423556 DOI: 10.1136/bmj.q490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
|
20
|
Lock C, Tan NSM, Long IJ, Keong NC. Neuroimaging data repositories and AI-driven healthcare-Global aspirations vs. ethical considerations in machine learning models of neurological disease. Front Artif Intell 2024; 6:1286266. [PMID: 38440234 PMCID: PMC10910099 DOI: 10.3389/frai.2023.1286266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 12/27/2023] [Indexed: 03/06/2024] Open
Abstract
Neuroimaging data repositories are data-rich resources comprising brain imaging with clinical and biomarker data. The potential for such repositories to transform healthcare is tremendous, especially in their capacity to support machine learning (ML) and artificial intelligence (AI) tools. Current discussions about the generalizability of such tools in healthcare provoke concerns of risk of bias-ML models underperform in women and ethnic and racial minorities. The use of ML may exacerbate existing healthcare disparities or cause post-deployment harms. Do neuroimaging data repositories and their capacity to support ML/AI-driven clinical discoveries, have both the potential to accelerate innovative medicine and harden the gaps of social inequities in neuroscience-related healthcare? In this paper, we examined the ethical concerns of ML-driven modeling of global community neuroscience needs arising from the use of data amassed within neuroimaging data repositories. We explored this in two parts; firstly, in a theoretical experiment, we argued for a South East Asian-based repository to redress global imbalances. Within this context, we then considered the ethical framework toward the inclusion vs. exclusion of the migrant worker population, a group subject to healthcare inequities. Secondly, we created a model simulating the impact of global variations in the presentation of anosmia risks in COVID-19 toward altering brain structural findings; we then performed a mini AI ethics experiment. In this experiment, we interrogated an actual pilot dataset (n = 17; 8 non-anosmic (47%) vs. 9 anosmic (53%) using an ML clustering model. To create the COVID-19 simulation model, we bootstrapped to resample and amplify the dataset. This resulted in three hypothetical datasets: (i) matched (n = 68; 47% anosmic), (ii) predominant non-anosmic (n = 66; 73% disproportionate), and (iii) predominant anosmic (n = 66; 76% disproportionate). We found that the differing proportions of the same cohorts represented in each hypothetical dataset altered not only the relative importance of key features distinguishing between them but even the presence or absence of such features. The main objective of our mini experiment was to understand if ML/AI methodologies could be utilized toward modelling disproportionate datasets, in a manner we term "AI ethics." Further work is required to expand the approach proposed here into a reproducible strategy.
Collapse
Affiliation(s)
- Christine Lock
- Department of Neurosurgery, National Neuroscience Institute, Singapore, Singapore
| | - Nicole Si Min Tan
- Department of Neurosurgery, National Neuroscience Institute, Singapore, Singapore
| | - Ian James Long
- Department of Neurosurgery, National Neuroscience Institute, Singapore, Singapore
| | - Nicole C. Keong
- Department of Neurosurgery, National Neuroscience Institute, Singapore, Singapore
- Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
21
|
Li A, Mullin S, Elkin PL. Improving Prediction of Survival for Extremely Premature Infants Born at 23 to 29 Weeks Gestational Age in the Neonatal Intensive Care Unit: Development and Evaluation of Machine Learning Models. JMIR Med Inform 2024; 12:e42271. [PMID: 38354033 PMCID: PMC10902770 DOI: 10.2196/42271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/02/2023] [Accepted: 12/28/2023] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Infants born at extremely preterm gestational ages are typically admitted to the neonatal intensive care unit (NICU) after initial resuscitation. The subsequent hospital course can be highly variable, and despite counseling aided by available risk calculators, there are significant challenges with shared decision-making regarding life support and transition to end-of-life care. Improving predictive models can help providers and families navigate these unique challenges. OBJECTIVE Machine learning methods have previously demonstrated added predictive value for determining intensive care unit outcomes, and their use allows consideration of a greater number of factors that potentially influence newborn outcomes, such as maternal characteristics. Machine learning-based models were analyzed for their ability to predict the survival of extremely preterm neonates at initial admission. METHODS Maternal and newborn information was extracted from the health records of infants born between 23 and 29 weeks of gestation in the Medical Information Mart for Intensive Care III (MIMIC-III) critical care database. Applicable machine learning models predicting survival during the initial NICU admission were developed and compared. The same type of model was also examined using only features that would be available prepartum for the purpose of survival prediction prior to an anticipated preterm birth. Features most correlated with the predicted outcome were determined when possible for each model. RESULTS Of included patients, 37 of 459 (8.1%) expired. The resulting random forest model showed higher predictive performance than the frequently used Score for Neonatal Acute Physiology With Perinatal Extension II (SNAPPE-II) NICU model when considering extremely preterm infants of very low birth weight. Several other machine learning models were found to have good performance but did not show a statistically significant difference from previously available models in this study. Feature importance varied by model, and those of greater importance included gestational age; birth weight; initial oxygenation level; elements of the APGAR (appearance, pulse, grimace, activity, and respiration) score; and amount of blood pressure support. Important prepartum features also included maternal age, steroid administration, and the presence of pregnancy complications. CONCLUSIONS Machine learning methods have the potential to provide robust prediction of survival in the context of extremely preterm births and allow for consideration of additional factors such as maternal clinical and socioeconomic information. Evaluation of larger, more diverse data sets may provide additional clarity on comparative performance.
Collapse
Affiliation(s)
- Angie Li
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| | - Sarah Mullin
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| | - Peter L Elkin
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
22
|
Giddings R, Joseph A, Callender T, Janes SM, van der Schaar M, Sheringham J, Navani N. Factors influencing clinician and patient interaction with machine learning-based risk prediction models: a systematic review. Lancet Digit Health 2024; 6:e131-e144. [PMID: 38278615 DOI: 10.1016/s2589-7500(23)00241-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 10/20/2023] [Accepted: 11/14/2023] [Indexed: 01/28/2024]
Abstract
Machine learning (ML)-based risk prediction models hold the potential to support the health-care setting in several ways; however, use of such models is scarce. We aimed to review health-care professional (HCP) and patient perceptions of ML risk prediction models in published literature, to inform future risk prediction model development. Following database and citation searches, we identified 41 articles suitable for inclusion. Article quality varied with qualitative studies performing strongest. Overall, perceptions of ML risk prediction models were positive. HCPs and patients considered that models have the potential to add benefit in the health-care setting. However, reservations remain; for example, concerns regarding data quality for model development and fears of unintended consequences following ML model use. We identified that public views regarding these models might be more negative than HCPs and that concerns (eg, extra demands on workload) were not always borne out in practice. Conclusions are tempered by the low number of patient and public studies, the absence of participant ethnic diversity, and variation in article quality. We identified gaps in knowledge (particularly views from under-represented groups) and optimum methods for model explanation and alerts, which require future research.
Collapse
Affiliation(s)
- Rebecca Giddings
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK.
| | - Anabel Joseph
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Thomas Callender
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Sam M Janes
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK; The Alan Turing Institute, London, UK
| | - Jessica Sheringham
- Department of Applied Health Research, University College London, London, UK
| | - Neal Navani
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| |
Collapse
|
23
|
Berman D, Hunter C, Hossain A, Yao J, Workman E, Guan S, Strickhart L, Beanlands R, Slater D, deKemp RA. Machine and deep learning models for accurate detection of ischemia and scar with myocardial blood flow positron emission tomography imaging. J Nucl Cardiol 2024; 32:101797. [PMID: 38185409 DOI: 10.1016/j.nuclcard.2024.101797] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
BACKGROUND Quantification of myocardial blood flow (MBF) is used for the noninvasive diagnosis of patients with coronary artery disease (CAD). This study compared traditional statistics, machine learning, and deep learning techniques in their ability to diagnose disease using only the rest and stress MBF values. METHODS This study included 3245 rest and stress rubidium-82 positron emission tomography (PET) studies and matching diagnostic labels from perfusion reports. Standard logistic regression, lasso logistic regression, support vector machine, random forest, multilayer perceptron, and dense U-Net were compared for per-patient detection and per-vessel localization of scars and ischemia. RESULTS Receiver-operator characteristic area under the curve (AUC) of machine learning models was significantly higher than those of traditional statistics models for per-patient detection of disease (0.92-0.95 vs. 0.87) but not for per-vessel localization of ischemia or scar. Random forest showed the highest AUC = 0.95 among the different models compared. On the final hold-out set for generalizability, random forest showed an AUC of 0.92 for detection and 0.89 for localization of perfusion abnormalities. CONCLUSIONS For per-vessel localization, simple models trained on segmental data performed similarly to a convolutional neural network trained on polar-map data, highlighting the need to justify the use of complex predictive algorithms through comparison with simpler methods.
Collapse
Affiliation(s)
- Daniel Berman
- The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA
| | - Chad Hunter
- University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, K1Y 4W7, Canada
| | - Alomgir Hossain
- University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, K1Y 4W7, Canada; The Hospital for Sick Children, 555 University Avenue, Toronto, M5G 1X8, Canada
| | - Jason Yao
- University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, K1Y 4W7, Canada
| | - Emily Workman
- The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA
| | - Steven Guan
- The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA
| | - Laura Strickhart
- The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA
| | - Rob Beanlands
- University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, K1Y 4W7, Canada
| | - David Slater
- The MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102, USA
| | - Robert A deKemp
- University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, K1Y 4W7, Canada.
| |
Collapse
|
24
|
K M, Syed K. Arrhythmia classification for non-experts using infinite impulse response (IIR)-filter-based machine learning and deep learning models of the electrocardiogram. PeerJ Comput Sci 2024; 10:e1774. [PMID: 38435599 PMCID: PMC10909216 DOI: 10.7717/peerj-cs.1774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/04/2023] [Indexed: 03/05/2024]
Abstract
Arrhythmias are a leading cause of cardiovascular morbidity and mortality. Portable electrocardiogram (ECG) monitors have been used for decades to monitor patients with arrhythmias. These monitors provide real-time data on cardiac activity to identify irregular heartbeats. However, rhythm monitoring and wave detection, especially in the 12-lead ECG, make it difficult to interpret the ECG analysis by correlating it with the condition of the patient. Moreover, even experienced practitioners find ECG analysis challenging. All of this is due to the noise in ECG readings and the frequencies at which the noise occurs. The primary objective of this research is to remove noise and extract features from ECG signals using the proposed infinite impulse response (IIR) filter to improve ECG quality, which can be better understood by non-experts. For this purpose, this study used ECG signal data from the Massachusetts Institute of Technology Beth Israel Hospital (MIT-BIH) database. This allows the acquired data to be easily evaluated using machine learning (ML) and deep learning (DL) models and classified as rhythms. To achieve accurate results, we applied hyperparameter (HP)-tuning for ML classifiers and fine-tuning (FT) for DL models. This study also examined the categorization of arrhythmias using different filters and the changes in accuracy. As a result, when all models were evaluated, DenseNet-121 without FT achieved 99% accuracy, while FT showed better results with 99.97% accuracy.
Collapse
Affiliation(s)
- Mallikarjunamallu K
- School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
| | - Khasim Syed
- School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
| |
Collapse
|
25
|
Pierson E. Accuracy and Equity in Clinical Risk Prediction. N Engl J Med 2024; 390:100-102. [PMID: 38198167 DOI: 10.1056/nejmp2311050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Affiliation(s)
- Emma Pierson
- From the Department of Computer Science, Cornell Tech, and the Department of Population Health Sciences, Weill Cornell Medical College - both in New York
| |
Collapse
|
26
|
Rahman A, Debnath T, Kundu D, Khan MSI, Aishi AA, Sazzad S, Sayduzzaman M, Band SS. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health 2024; 11:58-109. [PMID: 38617415 PMCID: PMC11007421 DOI: 10.3934/publichealth.2024004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/18/2023] [Indexed: 04/16/2024] Open
Abstract
In recent years, machine learning (ML) and deep learning (DL) have been the leading approaches to solving various challenges, such as disease predictions, drug discovery, medical image analysis, etc., in intelligent healthcare applications. Further, given the current progress in the fields of ML and DL, there exists the promising potential for both to provide support in the realm of healthcare. This study offered an exhaustive survey on ML and DL for the healthcare system, concentrating on vital state of the art features, integration benefits, applications, prospects and future guidelines. To conduct the research, we found the most prominent journal and conference databases using distinct keywords to discover scholarly consequences. First, we furnished the most current along with cutting-edge progress in ML-DL-based analysis in smart healthcare in a compendious manner. Next, we integrated the advancement of various services for ML and DL, including ML-healthcare, DL-healthcare, and ML-DL-healthcare. We then offered ML and DL-based applications in the healthcare industry. Eventually, we emphasized the research disputes and recommendations for further studies based on our observations.
Collapse
Affiliation(s)
- Anichur Rahman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Tanoy Debnath
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
- Department of CSE, Green University of Bangladesh, 220/D, Begum Rokeya Sarani, Dhaka -1207, Bangladesh
| | - Dipanjali Kundu
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Md. Saikat Islam Khan
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Airin Afroj Aishi
- Department of Computing and Information System, Daffodil International University, Savar, Dhaka, Bangladesh
| | - Sadia Sazzad
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Mohammad Sayduzzaman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Shahab S. Band
- Department of Information Management, International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Taiwan
| |
Collapse
|
27
|
Schulte PJ, Goldberg JD, Oster RA, Ambrosius WT, Bonner LB, Cabral H, Carter RE, Chen Y, Desai M, Li D, Lindsell CJ, Pomann GM, Slade E, Tosteson TD, Yu F, Spratt H. Peer review of clinical and translational research manuscripts: Perspectives from statistical collaborators. J Clin Transl Sci 2024; 8:e20. [PMID: 38384899 PMCID: PMC10879991 DOI: 10.1017/cts.2023.707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/29/2023] [Accepted: 12/19/2023] [Indexed: 02/23/2024] Open
Abstract
Research articles in the clinical and translational science literature commonly use quantitative data to inform evaluation of interventions, learn about the etiology of disease, or develop methods for diagnostic testing or risk prediction of future events. The peer review process must evaluate the methodology used therein, including use of quantitative statistical methods. In this manuscript, we provide guidance for peer reviewers tasked with assessing quantitative methodology, intended to complement guidelines and recommendations that exist for manuscript authors. We describe components of clinical and translational science research manuscripts that require assessment including study design and hypothesis evaluation, sampling and data acquisition, interventions (for studies that include an intervention), measurement of data, statistical analysis methods, presentation of the study results, and interpretation of the study results. For each component, we describe what reviewers should look for and assess; how reviewers should provide helpful comments for fixable errors or omissions; and how reviewers should communicate uncorrectable and irreparable errors. We then discuss the critical concepts of transparency and acceptance/revision guidelines when communicating with responsible journal editors.
Collapse
Affiliation(s)
- Phillip J. Schulte
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Judith D. Goldberg
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY, USA
| | - Robert A. Oster
- Department of Medicine, Division of Preventive Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Walter T. Ambrosius
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Lauren Balmert Bonner
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Howard Cabral
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Rickey E. Carter
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, USA
| | - Ye Chen
- Biostatistics, Epidemiology and Research Design (BERD), Tufts Clinical and Translational Science Institute (CTSI), Boston, MA, USA
| | - Manisha Desai
- Quantitative Sciences Unit, Departments of Medicine, Biomedical Data Science, and Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Dongmei Li
- Department of Clinical and Translational Research, Obstetrics and Gynecology and Public Health Sciences, University of Rochester Medical Center, Rochester, NY, USA
| | | | - Gina-Maria Pomann
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Emily Slade
- Department of Biostatistics, University of Kentucky, Lexington, KY, USA
| | - Tor D. Tosteson
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Fang Yu
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, USA
| | - Heidi Spratt
- Department of Biostatistics and Data Science, School of Public and Population Health, University of Texas Medical Branch, Galveston, TX, USA
| |
Collapse
|
28
|
Beam K, Sharma P, Levy P, Beam AL. Artificial intelligence in the neonatal intensive care unit: the time is now. J Perinatol 2024; 44:131-135. [PMID: 37443271 DOI: 10.1038/s41372-023-01719-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 06/24/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023]
Abstract
Artificial intelligence (AI) has the potential to revolutionize the neonatal intensive care unit (NICU) care by leveraging the large-scale, high-dimensional data that are generated by NICU patients. There is an emerging recognition that the confluence of technological progress, commercialization pathways, and rich data sets provides a unique opportunity for AI to make a lasting impact on the NICU. In this perspective article, we discuss four broad categories of AI applications in the NICU: imaging interpretation, prediction modeling of electronic health record data, integration of real-time monitoring data, and documentation and billing. By enhancing decision-making, streamlining processes, and improving patient outcomes, AI holds the potential to transform the quality of care for vulnerable newborns, making the excitement surrounding AI advancements well-founded and the potential for significant positive change stronger than ever before.
Collapse
Affiliation(s)
- Kristyn Beam
- Department of Neonatology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Puneet Sharma
- Division of Newborn Medicine, Department of Pediatrics Boston Children's Hospital, Boston, MA, USA
| | - Phil Levy
- Division of Newborn Medicine, Department of Pediatrics Boston Children's Hospital, Boston, MA, USA
| | - Andrew L Beam
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
29
|
Nieser KJ, Cochran AL. Quantifying and reducing inequity in average treatment effect estimation. BMC Med Res Methodol 2023; 23:297. [PMID: 38102563 PMCID: PMC10722685 DOI: 10.1186/s12874-023-02104-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Across studies of average treatment effects, some population subgroups consistently have lower representation than others which can lead to discrepancies in how well results generalize. METHODS We develop a framework for quantifying inequity due to systemic disparities in sample representation and a method for mitigation during data analysis. Assuming subgroup treatment effects are exchangeable, an unbiased sample average treatment effect estimator will have lower mean-squared error, on average across studies, for subgroups with less representation when treatment effects vary. We present a method for estimating average treatment effects in representation-adjusted samples which enables subgroups to optimally leverage information from the full sample rather than only their own subgroup's data. Two approaches for specifying representation adjustment are offered-one minimizes average mean-squared error for each subgroup separately and the other balances minimization of mean-squared error and equal representation. We conduct simulation studies to compare the performance of the proposed estimators to several subgroup-specific estimators. RESULTS We find that the proposed estimators generally provide lower mean squared error, particularly for smaller subgroups, relative to the other estimators. As a case study, we apply this method to a subgroup analysis from a published study. CONCLUSIONS We recommend the use of the proposed estimators to mitigate the impact of disparities in representation, though structural change is ultimately needed.
Collapse
Affiliation(s)
- Kenneth J Nieser
- Department of Population Health Sciences, University of Wisconsin-Madison, Madison, USA
| | - Amy L Cochran
- Department of Population Health Sciences, University of Wisconsin-Madison, Madison, USA.
- Department of Mathematics, University of Wisconsin-Madison, Madison, USA.
| |
Collapse
|
30
|
Herington J, McCradden MD, Creel K, Boellaard R, Jones EC, Jha AK, Rahmim A, Scott PJH, Sunderland JJ, Wahl RL, Zuehlsdorff S, Saboury B. Ethical Considerations for Artificial Intelligence in Medical Imaging: Data Collection, Development, and Evaluation. J Nucl Med 2023; 64:1848-1854. [PMID: 37827839 PMCID: PMC10690124 DOI: 10.2967/jnumed.123.266080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
The development of artificial intelligence (AI) within nuclear imaging involves several ethically fraught components at different stages of the machine learning pipeline, including during data collection, model training and validation, and clinical use. Drawing on the traditional principles of medical and research ethics, and highlighting the need to ensure health justice, the AI task force of the Society of Nuclear Medicine and Molecular Imaging has identified 4 major ethical risks: privacy of data subjects, data quality and model efficacy, fairness toward marginalized populations, and transparency of clinical performance. We provide preliminary recommendations to developers of AI-driven medical devices for mitigating the impact of these risks on patients and populations.
Collapse
Affiliation(s)
- Jonathan Herington
- Department of Health Humanities and Bioethics and Department of Philosophy, University of Rochester, Rochester, New York
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children, Toronto and Dana Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Kathleen Creel
- Department of Philosophy and Religion and Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Ronald Boellaard
- Department of Radiology and Nuclear Medicine, Cancer Centre Amsterdam, Amsterdam University Medical Centres, Amsterdam, The Netherlands
| | - Elizabeth C Jones
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland
| | - Abhinav K Jha
- Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri
| | - Arman Rahmim
- Departments of Radiology and Physics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Peter J H Scott
- Department of Radiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - John J Sunderland
- Departments of Radiology and Physics, University of Iowa, Iowa City, Iowa
| | - Richard L Wahl
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri; and
| | | | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland;
| |
Collapse
|
31
|
Allareddy V, Oubaidin M, Rampa S, Venugopalan SR, Elnagar MH, Yadav S, Lee MK. Call for algorithmic fairness to mitigate amplification of racial biases in artificial intelligence models used in orthodontics and craniofacial health. Orthod Craniofac Res 2023; 26 Suppl 1:124-130. [PMID: 37846615 DOI: 10.1111/ocr.12721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2023] [Indexed: 10/18/2023]
Abstract
Machine Learning (ML), a subfield of Artificial Intelligence (AI), is being increasingly used in Orthodontics and craniofacial health for predicting clinical outcomes. Current ML/AI models are prone to accentuate racial disparities. The objective of this narrative review is to provide an overview of how AI/ML models perpetuate racial biases and how we can mitigate this situation. A narrative review of articles published in the medical literature on racial biases and the use of AI/ML models was undertaken. Current AI/ML models are built on homogenous clinical datasets that have a gross underrepresentation of historically disadvantages demographic groups, especially the ethno-racial minorities. The consequence of such AI/ML models is that they perform poorly when deployed on ethno-racial minorities thus further amplifying racial biases. Healthcare providers, policymakers, AI developers and all stakeholders should pay close attention to various steps in the pipeline of building AI/ML models and every effort must be made to establish algorithmic fairness to redress inequities.
Collapse
Affiliation(s)
- Veerasathpurush Allareddy
- Department of Orthodontics, University of Illinois Chicago College of Dentistry, Chicago, Illinois, USA
| | - Maysaa Oubaidin
- Department of Orthodontics, University of Illinois Chicago College of Dentistry, Chicago, Illinois, USA
| | - Sankeerth Rampa
- Health Care Administration Program, School of Business, Rhode Island College, Providence, Rhode Island, USA
| | | | - Mohammed H Elnagar
- Department of Orthodontics, University of Illinois Chicago College of Dentistry, Chicago, Illinois, USA
| | - Sumit Yadav
- Department of Orthodontics, University of Nebraska Medical Center, Lincoln, Nebraska, USA
| | - Min Kyeong Lee
- Department of Orthodontics, University of Illinois Chicago College of Dentistry, Chicago, Illinois, USA
| |
Collapse
|
32
|
Mei Z, Zheng D, Ge M. Informative Artifacts in AI-Assisted Care. N Engl J Med 2023; 389:10.1056/NEJMc2311525#sa2. [PMID: 38048205 DOI: 10.1056/nejmc2311525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Affiliation(s)
- Zubing Mei
- Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - De Zheng
- Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Maojun Ge
- Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
33
|
Trentham-Dietz A, Corley DA, Del Vecchio NJ, Greenlee RT, Haas JS, Hubbard RA, Hughes AE, Kim JJ, Kobrin S, Li CI, Meza R, Neslund-Dudas CM, Tiro JA. Data gaps and opportunities for modeling cancer health equity. J Natl Cancer Inst Monogr 2023; 2023:246-254. [PMID: 37947335 PMCID: PMC11009506 DOI: 10.1093/jncimonographs/lgad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/12/2023] [Accepted: 08/15/2023] [Indexed: 11/12/2023] Open
Abstract
Population models of cancer reflect the overall US population by drawing on numerous existing data resources for parameter inputs and calibration targets. Models require data inputs that are appropriately representative, collected in a harmonized manner, have minimal missing or inaccurate values, and reflect adequate sample sizes. Data resource priorities for population modeling to support cancer health equity include increasing the availability of data that 1) arise from uninsured and underinsured individuals and those traditionally not included in health-care delivery studies, 2) reflect relevant exposures for groups historically and intentionally excluded across the full cancer control continuum, 3) disaggregate categories (race, ethnicity, socioeconomic status, gender, sexual orientation, etc.) and their intersections that conceal important variation in health outcomes, 4) identify specific populations of interest in clinical databases whose health outcomes have been understudied, 5) enhance health records through expanded data elements and linkage with other data types (eg, patient surveys, provider and/or facility level information, neighborhood data), 6) decrease missing and misclassified data from historically underrecognized populations, and 7) capture potential measures or effects of systemic racism and corresponding intervenable targets for change.
Collapse
Affiliation(s)
- Amy Trentham-Dietz
- Department of Population Health Sciences and Carbone Cancer Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
| | - Douglas A Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Natalie J Del Vecchio
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Jennifer S Haas
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amy E Hughes
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jane J Kim
- Department of Health Policy and Management, Center for Health Decision Science, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sarah Kobrin
- Healthcare Delivery Research Program, Division of Cancer Control & Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Christopher I Li
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Rafael Meza
- Department of Integrative Oncology, British Columbia (BC) Cancer Research Institute, Vancouver, BC, Canada
| | | | - Jasmin A Tiro
- Department of Public Health Sciences, University of Chicago Biological Sciences Division, and University of Chicago Medicine Comprehensive Cancer Center, Chicago, IL, USA
| |
Collapse
|
34
|
Arora A, Alderman JE, Palmer J, Ganapathi S, Laws E, McCradden MD, Oakden-Rayner L, Pfohl SR, Ghassemi M, McKay F, Treanor D, Rostamzadeh N, Mateen B, Gath J, Adebajo AO, Kuku S, Matin R, Heller K, Sapey E, Sebire NJ, Cole-Lewis H, Calvert M, Denniston A, Liu X. The value of standards for health datasets in artificial intelligence-based applications. Nat Med 2023; 29:2929-2938. [PMID: 37884627 PMCID: PMC10667100 DOI: 10.1038/s41591-023-02608-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 09/22/2023] [Indexed: 10/28/2023]
Abstract
Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).
Collapse
Affiliation(s)
- Anmol Arora
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Joseph E Alderman
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | - Joanne Palmer
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | | | - Elinor Laws
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | - Melissa D McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, Ontario, Canada
- Genetics and Genome Biology, Peter Gilgan Centre for Research and Learning, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, Toronto, Ontario, Canada
| | - Lauren Oakden-Rayner
- The Australian Institute for Machine Learning, University of Adelaide, Adelaide, South Australia, Australia
| | | | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Vector Institute, Toronto, Ontario, Canada
| | - Francis McKay
- The Ethox Centre and the Wellcome Centre for Ethics and Humanities, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Darren Treanor
- Leeds Teaching Hospitals NHS Trust, Leeds, UK
- University of Leeds, Leeds, UK
- Department of Clinical Pathology and Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
- Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden
| | | | - Bilal Mateen
- Institute for Health Informatics, University College London, London, UK
- Wellcome Trust, London, UK
| | - Jacqui Gath
- Patient and Public Involvement and Engagement (PPIE) Group, STANDING Together, Birmingham, UK
| | - Adewole O Adebajo
- Patient and Public Involvement and Engagement (PPIE) Group, STANDING Together, Birmingham, UK
| | | | - Rubeta Matin
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | | | - Elizabeth Sapey
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
- PIONEER, HDR UK Hub in Acute Care, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - Neil J Sebire
- National Institute for Health and Care Research, Great Ormond Street Hospital Biomedical Research Centre, London, UK
- Great Ormond Street Institute of Child Health, University Hospital London, London, UK
| | | | - Melanie Calvert
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK
- Centre for Patient Reported Outcomes Research, Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research Applied Research Collaboration West Midlands, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research Birmingham-Oxford Blood and Transplant Research Unit in Precision Transplant and Cellular Therapeutics, University of Birmingham, Birmingham, UK
- DEMAND Hub, University of Birmingham, Birmingham, UK
- UK SPINE, University of Birmingham, Birmingham, UK
| | - Alastair Denniston
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research Biomedical Research Centre, Moorfields Eye Hospital/University College London, London, UK
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
- National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK.
| |
Collapse
|
35
|
Ghassemi M. Presentation matters for AI-generated clinical advice. Nat Hum Behav 2023; 7:1833-1835. [PMID: 37985904 DOI: 10.1038/s41562-023-01721-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Affiliation(s)
- Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Vector Institute, Toronto, Ontario, Canada.
| |
Collapse
|
36
|
Hubbard RA, Pujol TA, Alhajjar E, Edoh K, Martin ML. Sources of Disparities in Surveillance Mammography Performance and Risk-Guided Recommendations for Supplemental Breast Imaging: A Simulation Study. Cancer Epidemiol Biomarkers Prev 2023; 32:1531-1541. [PMID: 37351916 PMCID: PMC10750297 DOI: 10.1158/1055-9965.epi-23-0330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/22/2023] [Accepted: 06/21/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Surveillance mammography is recommended for all women with a history of breast cancer. Risk-guided surveillance incorporating advanced imaging modalities based on individual risk of a second cancer could improve cancer detection. However, personalized surveillance may also amplify disparities. METHODS In simulated populations using inputs from the Breast Cancer Surveillance Consortium (BCSC), we investigated race- and ethnicity-based disparities. Disparities were decomposed into those due to primary breast cancer and treatment characteristics, social determinants of health (SDOH) and differential error in second cancer ascertainment by modeling populations with or without variation across race and ethnicity in the distribution of these characteristics. We estimated effects of disparities on mammography performance and supplemental imaging recommendations stratified by race and ethnicity. RESULTS In simulated cohorts based on 65,446 BCSC surveillance mammograms, when only cancer characteristics varied by race and ethnicity, mammograms for Black women had lower sensitivity compared with the overall population (64.1% vs. 71.1%). Differences between Black women and the overall population were larger when both cancer characteristics and SDOH varied by race and ethnicity (53.8% vs. 71.1%). Basing supplemental imaging recommendations on high predicted second cancer risk resulted in less frequent recommendations for Hispanic (6.7%) and Asian/Pacific Islander women (6.4%) compared with the overall population (10.0%). CONCLUSIONS Variation in cancer characteristics and SDOH led to disparities in surveillance mammography performance and recommendations for supplemental imaging. IMPACT Risk-guided surveillance imaging may exacerbate disparities. Decision-makers should consider implications for equity in cancer outcomes resulting from implementing risk-guided screening programs. See related In the Spotlight, p. 1479.
Collapse
Affiliation(s)
- Rebecca A. Hubbard
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | | | - Elie Alhajjar
- Department of Mathematical Sciences, United States Military Academy, West Point, NY
| | - Kossi Edoh
- Department of Mathematics, North Carolina Agricultural & Technical State University, Greensboro, NC
| | - Melissa L. Martin
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
37
|
Ricci Lara MA, Rodríguez Kowalczuk MV, Lisa Eliceche M, Ferraresso MG, Luna DR, Benitez SE, Mazzuoccolo LD. A dataset of skin lesion images collected in Argentina for the evaluation of AI tools in this population. Sci Data 2023; 10:712. [PMID: 37853053 PMCID: PMC10584927 DOI: 10.1038/s41597-023-02630-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023] Open
Abstract
In recent years, numerous dermatological image databases have been published to make possible the development and validation of artificial intelligence-based technologies to support healthcare professionals in the diagnosis of skin diseases. However, the generation of these datasets confined to certain countries as well as the lack of demographic information accompanying the images, prevents having a real knowledge of in which populations these models could be used. Consequently, this hinders the translation of the models to the clinical setting. This has led the scientific community to encourage the detailed and transparent reporting of the databases used for artificial intelligence developments, as well as to promote the formation of genuinely international databases that can be representative of the world population. Through this work, we seek to provide details of the processing stages of the first public database of dermoscopy and clinical images created in a hospital in Argentina. The dataset comprises 1,616 images corresponding to 1,246 unique lesions collected from 623 patients.
Collapse
Affiliation(s)
- María Agustina Ricci Lara
- Departamento de Informática en Salud, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina.
- Universidad Tecnológica Nacional, Av. Medrano 951, 1179, Ciudad Autónoma de, Buenos Aires, Argentina.
| | - María Victoria Rodríguez Kowalczuk
- Servicio de Dermatología, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| | - Maite Lisa Eliceche
- Servicio de Dermatología, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| | - María Guillermina Ferraresso
- Servicio de Dermatología, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| | - Daniel Roberto Luna
- Departamento de Informática en Salud, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
- Instituto de Medicina Traslacional e Ingeniería Biomédica (IMTIB), UE de triple dependencia CONICET- Instituto Universitario del Hospital Italiano (IUHI) - Hospital ITaliano (HIBA), Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| | - Sonia Elizabeth Benitez
- Departamento de Informática en Salud, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
- Instituto Universitario del Hospital Italiano, Potosí 4265, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| | - Luis Daniel Mazzuoccolo
- Servicio de Dermatología, Hospital Italiano de Buenos Aires, Tte. Gral. Juan Domingo Perón 4190, 1199, Ciudad Autónoma de, Buenos Aires, Argentina
| |
Collapse
|
38
|
Wahid KA, Cardenas CE, Marquez B, Netherton TJ, Kann BH, Court LE, He R, Naser MA, Moreno AC, Fuller CD, Fuentes D. Evolving Horizons in Radiotherapy Auto-Contouring: Distilling Insights, Embracing Data-Centric Frameworks, and Moving Beyond Geometric Quantification. ARXIV 2023:arXiv:2310.10867v1. [PMID: 37904737 PMCID: PMC10614971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Affiliation(s)
- Kareem A. Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Carlos E. Cardenas
- Department of Radiation Oncology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Barbara Marquez
- UT MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J. Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Benjamin H. Kann
- Department of Radiation Oncology, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Laurence E. Court
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C. Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
39
|
Teotia K, Jia Y, Woite NL, Celi LA, Matos J, Struja T. Variation in monitoring: Glucose measurement in the ICU as a case study to preempt spurious correlations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.12.23296568. [PMID: 37873163 PMCID: PMC10593024 DOI: 10.1101/2023.10.12.23296568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Objective Health inequities can be influenced by demographic factors such as race and ethnicity, proficiency in English, and biological sex. Disparities may manifest as differential likelihood of testing which correlates directly with the likelihood of an intervention to address an abnormal finding. Our retrospective observational study evaluated the presence of variation in glucose measurements in the Intensive Care Unit (ICU). Methods Using the MIMIC-IV database (2008-2019), a single-center, academic referral hospital in Boston (USA), we identified adult patients meeting sepsis-3 criteria. Exclusion criteria were diabetic ketoacidosis, ICU length of stay under 1 day, and unknown race or ethnicity. We performed a logistic regression analysis to assess differential likelihoods of glucose measurements on day 1. A negative binomial regression was fitted to assess the frequency of subsequent glucose readings. Analyses were adjusted for relevant clinical confounders, and performed across three disparity proxy axes: race and ethnicity, sex, and English proficiency. Results We studied 24,927 patients, of which 19.5% represented racial and ethnic minority groups, 42.4% were female, and 9.8% had limited English proficiency. No significant differences were found for glucose measurement on day 1 in the ICU. This pattern was consistent irrespective of the axis of analysis, i.e. race and ethnicity, sex, or English proficiency. Conversely, subsequent measurement frequency revealed potential disparities. Specifically, males (incidence rate ratio (IRR) 1.06, 95% confidence interval (CI) 1.01 - 1.21), patients who identify themselves as Hispanic (IRR 1.11, 95% CI 1.01 - 1.21), or Black (IRR 1.06, 95% CI 1.01 - 1.12), and patients being English proficient (IRR 1.08, 95% CI 1.01 - 1.15) had higher chances of subsequent glucose readings. Conclusion We found disparities in ICU glucose measurements among patients with sepsis, albeit the magnitude was small. Variation in disease monitoring is a source of data bias that may lead to spurious correlations when modeling health data.
Collapse
|
40
|
Charpignon ML, Byers J, Cabral S, Celi LA, Fernandes C, Gallifant J, Lough ME, Mlombwa D, Moukheiber L, Ong BA, Panitchote A, William W, Wong AKI, Nazer L. Critical Bias in Critical Care Devices. Crit Care Clin 2023; 39:795-813. [PMID: 37704341 DOI: 10.1016/j.ccc.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Critical care data contain information about the most physiologically fragile patients in the hospital, who require a significant level of monitoring. However, medical devices used for patient monitoring suffer from measurement biases that have been largely underreported. This article explores sources of bias in commonly used clinical devices, including pulse oximeters, thermometers, and sphygmomanometers. Further, it provides a framework for mitigating these biases and key principles to achieve more equitable health care delivery.
Collapse
Affiliation(s)
- Marie-Laure Charpignon
- Institute for Data, Systems, and Society (IDSS), E18-407A, 50 Ames Street, Cambridge, MA 02142, USA.
| | - Joseph Byers
- Respiratory Therapy, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
| | - Stephanie Cabral
- Department of Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Chrystinne Fernandes
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Jack Gallifant
- Imperial College London NHS Trust, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK
| | - Mary E Lough
- Stanford Health Care, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Donald Mlombwa
- Zomba Central Hospital, 8th Avenue, Zomba, Malawi; Kamuzu College of Health Sciences, Blantyre, Malawi; St. Luke's College of Health Sciences, Chilema-Zomba, Malawi
| | - Lama Moukheiber
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E25-330, Cambridge, MA 02139, USA
| | - Bradley Ashley Ong
- College of Medicine, University of the Philippines Manila, Calderon hall, UP College of Medicine, 547 Pedro Gil Street, Ermita Manila, Philippines
| | - Anupol Panitchote
- Faculty of Medicine, Khon Kaen University, 123 Mittraparp Highway, Muang District, Khon Kaen 40002, Thailand
| | - Wasswa William
- Mbarara University of Science and Technology, P.O. Box 1410, Mbarara, Uganda
| | - An-Kwok Ian Wong
- Duke University Medical Center, 2424 Erwin Road, Suite 1102, Hock Plaza Box 2721, Durham, NC 27710, USA
| | - Lama Nazer
- King Hussein Cancer Center, Queen Rania Street 202, Amman, Jordan
| |
Collapse
|
41
|
Langlotz CP. The Future of AI and Informatics in Radiology: 10 Predictions. Radiology 2023; 309:e231114. [PMID: 37874234 PMCID: PMC10623186 DOI: 10.1148/radiol.231114] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 05/16/2023] [Accepted: 05/22/2023] [Indexed: 10/25/2023]
Affiliation(s)
- Curtis P. Langlotz
- From the Departments of Radiology, Medicine, and Biomedical Data
Science, Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA
94305
| |
Collapse
|
42
|
Herington J, McCradden MD, Creel K, Boellaard R, Jones EC, Jha AK, Rahmim A, Scott PJH, Sunderland JJ, Wahl RL, Zuehlsdorff S, Saboury B. Ethical Considerations for Artificial Intelligence in Medical Imaging: Deployment and Governance. J Nucl Med 2023; 64:1509-1515. [PMID: 37620051 DOI: 10.2967/jnumed.123.266110] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 07/11/2023] [Indexed: 08/26/2023] Open
Abstract
The deployment of artificial intelligence (AI) has the potential to make nuclear medicine and medical imaging faster, cheaper, and both more effective and more accessible. This is possible, however, only if clinicians and patients feel that these AI medical devices (AIMDs) are trustworthy. Highlighting the need to ensure health justice by fairly distributing benefits and burdens while respecting individual patients' rights, the AI Task Force of the Society of Nuclear Medicine and Molecular Imaging has identified 4 major ethical risks that arise during the deployment of AIMD: autonomy of patients and clinicians, transparency of clinical performance and limitations, fairness toward marginalized populations, and accountability of physicians and developers. We provide preliminary recommendations for governing these ethical risks to realize the promise of AIMD for patients and populations.
Collapse
Affiliation(s)
- Jonathan Herington
- Department of Health Humanities and Bioethics and Department of Philosophy, University of Rochester, Rochester, New York
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children, and Dana Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Kathleen Creel
- Department of Philosophy and Religion and Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Ronald Boellaard
- Department of Radiology and Nuclear Medicine, Cancer Centre Amsterdam, Amsterdam University Medical Centres, Amsterdam, The Netherlands
| | - Elizabeth C Jones
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland
| | - Abhinav K Jha
- Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri
| | - Arman Rahmim
- Departments of Radiology and Physics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Peter J H Scott
- Department of Radiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - John J Sunderland
- Departments of Radiology and Physics, University of Iowa, Iowa City, Iowa
| | - Richard L Wahl
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri; and
| | | | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland;
| |
Collapse
|
43
|
Singh N, Lawrence K, Richardson S, Mann DM. Centering health equity in large language model deployment. PLOS DIGITAL HEALTH 2023; 2:e0000367. [PMID: 37874780 PMCID: PMC10597518 DOI: 10.1371/journal.pdig.0000367] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Affiliation(s)
- Nina Singh
- Department of Population Health, New York University Grossman School of Medicine, New York, New York, United States of America
| | - Katharine Lawrence
- Department of Population Health, New York University Grossman School of Medicine, New York, New York, United States of America
| | - Safiya Richardson
- Department of Population Health, New York University Grossman School of Medicine, New York, New York, United States of America
| | - Devin M. Mann
- Department of Population Health, New York University Grossman School of Medicine, New York, New York, United States of America
- Medical Center Information Technology, New York University Langone Health, New York, New York, United States of America
| |
Collapse
|
44
|
Affiliation(s)
- David J Hunter
- From the Nuffield Department of Population Health (D.J.H.) and the Department of Statistics and Nuffield Department of Medicine (C.H.), University of Oxford, Oxford, and the Alan Turing Institute, London (C.H.) - both in the United Kingdom
| | - Christopher Holmes
- From the Nuffield Department of Population Health (D.J.H.) and the Department of Statistics and Nuffield Department of Medicine (C.H.), University of Oxford, Oxford, and the Alan Turing Institute, London (C.H.) - both in the United Kingdom
| |
Collapse
|
45
|
McElfresh DC, Chen L, Oliva E, Joyce V, Rose S, Tamang S. A call for better validation of opioid overdose risk algorithms. J Am Med Inform Assoc 2023; 30:1741-1746. [PMID: 37428897 PMCID: PMC10531142 DOI: 10.1093/jamia/ocad110] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/11/2023] [Accepted: 07/01/2023] [Indexed: 07/12/2023] Open
Abstract
Clinical decision support (CDS) systems powered by predictive models have the potential to improve the accuracy and efficiency of clinical decision-making. However, without sufficient validation, these systems have the potential to mislead clinicians and harm patients. This is especially true for CDS systems used by opioid prescribers and dispensers, where a flawed prediction can directly harm patients. To prevent these harms, regulators and researchers have proposed guidance for validating predictive models and CDS systems. However, this guidance is not universally followed and is not required by law. We call on CDS developers, deployers, and users to hold these systems to higher standards of clinical and technical validation. We provide a case study on two CDS systems deployed on a national scale in the United States for predicting a patient's risk of adverse opioid-related events: the Stratification Tool for Opioid Risk Mitigation (STORM), used by the Veterans Health Administration, and NarxCare, a commercial system.
Collapse
Affiliation(s)
- Duncan C McElfresh
- Department of Health Policy, Stanford University, Stanford, California, USA
- Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, US Department of Veterans Affairs, Menlo Park, California, USA
| | - Lucia Chen
- Department of Health Policy, Stanford University, Stanford, California, USA
| | - Elizabeth Oliva
- Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, US Department of Veterans Affairs, Menlo Park, California, USA
| | - Vilija Joyce
- Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, US Department of Veterans Affairs, Menlo Park, California, USA
- Health Economics Resource Center, US Department of Veterans Affairs, Menlo Park, California, USA
| | - Sherri Rose
- Department of Health Policy, Stanford University, Stanford, California, USA
| | - Suzanne Tamang
- Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, US Department of Veterans Affairs, Menlo Park, California, USA
- Department of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
46
|
Muralidharan V, Burgart A, Daneshjou R, Rose S. Recommendations for the use of pediatric data in artificial intelligence and machine learning ACCEPT-AI. NPJ Digit Med 2023; 6:166. [PMID: 37673925 PMCID: PMC10482936 DOI: 10.1038/s41746-023-00898-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/03/2023] [Indexed: 09/08/2023] Open
Abstract
ACCEPT-AI is a framework of recommendations for the safe inclusion of pediatric data in artificial intelligence and machine learning (AI/ML) research. It has been built on fundamental ethical principles of pediatric and AI research and incorporates age, consent, assent, communication, equity, protection of data, and technological considerations. ACCEPT-AI has been designed to guide researchers, clinicians, regulators, and policymakers and can be utilized as an independent tool, or adjunctively to existing AI/ML guidelines.
Collapse
Affiliation(s)
- V Muralidharan
- Department of Dermatology, Stanford University, Stanford, USA.
| | - A Burgart
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, USA
| | - R Daneshjou
- Department of Dermatology, Stanford University, Stanford, USA
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - S Rose
- Department of Health Policy, Stanford University, Stanford, USA
| |
Collapse
|
47
|
Abstract
Automatic polysomnography analysis can be leveraged to shorten scoring times, reduce associated costs, and ultimately improve the overall diagnosis of sleep disorders. Multiple and diverse strategies have been attempted for implementation of this technology at scale in the routine workflow of sleep centers. The field, however, is complex and presents unsolved challenges in a number of areas. Recent developments in computer science and artificial intelligence are nevertheless closing the gap. Technological advances are also opening new pathways for expanding our current understanding of the domain and its analysis.
Collapse
Affiliation(s)
- Diego Alvarez-Estevez
- Center for Information and Communications Technology Research (CITIC), Universidade da Coruña, 15071 A Coruña, Spain.
| |
Collapse
|
48
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
49
|
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature 2023; 620:172-180. [PMID: 37438534 PMCID: PMC10396962 DOI: 10.1038/s41586-023-06291-2] [Citation(s) in RCA: 244] [Impact Index Per Article: 244.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/05/2023] [Indexed: 07/14/2023]
Abstract
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
Collapse
Affiliation(s)
| | | | - Tao Tu
- Google Research, Mountain View, CA, USA
| | | | - Jason Wei
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yun Liu
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | |
Collapse
|
50
|
Polevikov S. Advancing AI in healthcare: A comprehensive review of best practices. Clin Chim Acta 2023; 548:117519. [PMID: 37595864 DOI: 10.1016/j.cca.2023.117519] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/14/2023] [Accepted: 08/15/2023] [Indexed: 08/20/2023]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are powerful tools shaping the healthcare sector. This review considers twelve key aspects of AI in clinical practice: 1) Ethical AI; 2) Explainable AI; 3) Health Equity and Bias in AI; 4) Sponsorship Bias; 5) Data Privacy; 6) Genomics and Privacy; 7) Insufficient Sample Size and Self-Serving Bias; 8) Bridging the Gap Between Training Datasets and Real-World Scenarios; 9) Open Source and Collaborative Development; 10) Dataset Bias and Synthetic Data; 11) Measurement Bias; 12) Reproducibility in AI Research. These categories represent both the challenges and opportunities of AI implementation in healthcare. While AI holds significant potential for improving patient care, it also presents risks and challenges, such as ensuring privacy, combating bias, and maintaining transparency and ethics. The review underscores the necessity of developing comprehensive best practices for healthcare organizations and fostering a diverse dialogue involving data scientists, clinicians, patient advocates, ethicists, economists, and policymakers. We are at the precipice of significant transformation in healthcare powered by AI. By continuing to reassess and refine our approach, we can ensure that AI is implemented responsibly and ethically, maximizing its benefit to patient care and public health.
Collapse
|