1
|
Butler E, Spirtos M, O' Keeffe LM, Clarke M. Predicting 5-year-olds mental health at birth: development and internal validation of a multivariable model using the prospective ELFE birth cohort. Eur Child Adolesc Psychiatry 2025:10.1007/s00787-025-02730-9. [PMID: 40369286 DOI: 10.1007/s00787-025-02730-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Accepted: 04/22/2025] [Indexed: 05/16/2025]
Abstract
We developed and internally validated a multivariable model to be used in the perinatal period, to predict 5-year-olds mental health, using the ELFE prospective French multicentre birth cohort (n=9768). Twenty-six candidate predictors were used, spanning pre-pregnancy maternal health, pregnancy-specific-experiences, birth factors and sociodemographic risk (maternal age, education, relationship, migrancy and family income). The Strengths and Difficulties Questionnaire total score at 5-years, dichotomised at the recommended cut-off (16), was the outcome. Least Absolute Shrinkage and Selector Operator followed by bootstrapping was used. High and low-risk was classified by ≥8% risk-threshold score. Stability of the model at population- and individual-level and model performance across groups of interest (sex, sociodemographic risk and neonatal intensive care admissions) was also examined. 10 variables (total number pregnancy-specific experiences, sociodemographic risk, maternal pre-existing hypertension and psychological difficulties, gravidity, maternal mental health problems in a previous pregnancy, smoking and alcohol use in current pregnancy, how labour started and infant sex) with a C-statistic of 0.67; 95%CI (0.64-0.69) predicted mental health. The positive and negative predictive value were 12% & 95.4% respectively, leading to 78.8% of children correctly classified. Model performance was similar across groups of interest but increased for children (born ≥33-weeks-gestation) with neonatal admissions (AUC 0.78; 95%CI (0.69-0.87)). This model is most useful for identifying low-risk children. Applying this model in a tiered preventative intervention framework could be beneficial with those predicted to be high-risk receiving further screening to determine the level of intervention required. External validation and implementation research are required before considering its use in practice.
Collapse
Affiliation(s)
- Emma Butler
- Department of Population Health, Royal College of Surgeons Ireland, Dublin, Ireland.
| | - Michelle Spirtos
- Department of Occupational Therapy, Trinity College Dublin, Dublin, Ireland
| | - Linda M O' Keeffe
- School of Public Health, University College Cork, Cork, Ireland
- MRC Integrative Epidemiology Unit and Population Health Sciences, University of Bristol, Bristol, UK
| | - Mary Clarke
- Department of Psychology, School of Population Health & Department of Psychiatry, Royal College of Surgeons Ireland, Dublin, Ireland
| |
Collapse
|
2
|
Tadepalli K, Das A, Meena T, Roy S. Bridging gaps in artificial intelligence adoption for maternal-fetal and obstetric care: Unveiling transformative capabilities and challenges. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 263:108682. [PMID: 40023965 DOI: 10.1016/j.cmpb.2025.108682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 02/04/2025] [Accepted: 02/18/2025] [Indexed: 03/04/2025]
Abstract
PURPOSE This review aims to comprehensively explore the application of Artificial Intelligence (AI) to an area that has not been traditionally explored in depth: the continuum of maternal-fetal health. In doing so, the intent was to examine this physiologically continuous spectrum of mother and child health, as well as to highlight potential pitfalls, and suggest solutions for the same. METHOD A systematic search identified studies employing AI techniques for prediction, diagnosis, and decision support employing various modalities like imaging, electrophysiological signals and electronic health records in the domain of obstetrics and fetal health. In the selected articles then, AI applications in fetal morphology, gestational age assessment, congenital defect detection, fetal monitoring, placental analysis, and maternal physiological monitoring were critically examined both from the perspective of the domain and artificial intelligence. RESULT AI-driven solutions demonstrate promising capabilities in medical diagnostics and risk prediction, offering automation, improved accuracy, and the potential for personalized medicine. However, challenges regarding data availability, algorithmic transparency, and ethical considerations must be overcome to ensure responsible and effective clinical implementation. These challenges must be urgently addressed to ensure a domain as critical to public health as obstetrics and fetal health, is able to fully benefit from the gigantic strides made in the field of artificial intelligence. CONCLUSION Open access to relevant datasets is crucial for equitable progress in this critical public health domain. Integrating responsible and explainable AI, while addressing ethical considerations, is essential to maximize the public health benefits of AI-driven solutions in maternal-fetal care.
Collapse
Affiliation(s)
- Kalyan Tadepalli
- Sir HN Reliance Foundation Hospital, Girgaon, Mumbai, 400004, India; Artificial Intelligence & Data Science, Jio Institute, Navi Mumbai, 410206, India
| | - Abhijit Das
- Artificial Intelligence & Data Science, Jio Institute, Navi Mumbai, 410206, India
| | - Tanushree Meena
- Artificial Intelligence & Data Science, Jio Institute, Navi Mumbai, 410206, India
| | - Sudipta Roy
- Artificial Intelligence & Data Science, Jio Institute, Navi Mumbai, 410206, India.
| |
Collapse
|
3
|
Marko JGO, Neagu CD, Anand PB. Examining inclusivity: the use of AI and diverse populations in health and social care: a systematic review. BMC Med Inform Decis Mak 2025; 25:57. [PMID: 39910518 PMCID: PMC11796235 DOI: 10.1186/s12911-025-02884-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 01/20/2025] [Indexed: 02/07/2025] Open
Abstract
BACKGROUND Artificial intelligence (AI)-based systems are being rapidly integrated into the fields of health and social care. Although such systems can substantially improve the provision of care, diverse and marginalized populations are often incorrectly or insufficiently represented within these systems. This review aims to assess the influence of AI on health and social care among these populations, particularly with regard to issues related to inclusivity and regulatory concerns. METHODS We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Six leading databases were searched, and 129 articles were selected for this review in line with predefined eligibility criteria. RESULTS This research revealed disparities in AI outcomes, accessibility, and representation among diverse groups due to biased data sources and a lack of representation in training datasets, which can potentially exacerbate inequalities in care delivery for marginalized communities. CONCLUSION AI development practices, legal frameworks, and policies must be reformulated to ensure that AI is applied in an equitable manner. A holistic approach must be used to address disparities, enforce effective regulations, safeguard privacy, promote inclusion and equity, and emphasize rigorous validation.
Collapse
Affiliation(s)
- John Gabriel O Marko
- University of Bradford Facility of Engineering and Digital Technology, Bradford, UK.
| | - Ciprian Daniel Neagu
- University of Bradford Facility of Engineering and Digital Technology, Bradford, UK
| | - P B Anand
- University of Bradford Faculty of Management Law and Social Sciences, Bradford, UK
| |
Collapse
|
4
|
Galanter N, Carone M, Kessler RC, Luedtke A. Can the potential benefit of individualizing treatment be assessed using trial summary statistics alone? Am J Epidemiol 2024; 193:1161-1167. [PMID: 38679458 PMCID: PMC11299035 DOI: 10.1093/aje/kwae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/01/2024] [Accepted: 04/23/2024] [Indexed: 05/01/2024] Open
Abstract
Individualizing treatment assignment can improve outcomes for diseases with patient-to-patient variability in comparative treatment effects. When a clinical trial demonstrates that some patients improve on treatment while others do not, it is tempting to assume that treatment effect heterogeneity exists. However, if outcome variability is mainly driven by factors other than variability in the treatment effect, investigating the extent to which covariate data can predict differential treatment response is a potential waste of resources. Motivated by recent meta-analyses assessing the potential of individualizing treatment for major depressive disorder using only summary statistics, we provide a method that uses summary statistics widely available in published clinical trial results to bound the benefit of optimally assigning treatment to each patient. We also offer alternate bounds for settings in which trial results are stratified by another covariate. Our upper bounds can be especially informative when they are small, as there is then little benefit to collecting additional covariate data. We demonstrate our approach using summary statistics from a depression treatment trial. Our methods are implemented in the rct2otrbounds R package.
Collapse
Affiliation(s)
- Nina Galanter
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Marco Carone
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Ronald C Kessler
- Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, United States
| | - Alex Luedtke
- Department of Statistics, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
5
|
Yang Y, Lin M, Zhao H, Peng Y, Huang F, Lu Z. A survey of recent methods for addressing AI fairness and bias in biomedicine. J Biomed Inform 2024; 154:104646. [PMID: 38677633 PMCID: PMC11129918 DOI: 10.1016/j.jbi.2024.104646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 04/17/2024] [Indexed: 04/29/2024]
Abstract
OBJECTIVES Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.
Collapse
Affiliation(s)
- Yifan Yang
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA; Department of Computer Science, University of Maryland, College Park, USA
| | - Mingquan Lin
- Department of Population Health Sciences, Weill Cornell Medicine, NY, USA
| | - Han Zhao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, NY, USA
| | - Furong Huang
- Department of Computer Science, University of Maryland, College Park, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
6
|
Schaekermann M, Spitz T, Pyles M, Cole-Lewis H, Wulczyn E, Pfohl SR, Martin D, Jaroensri R, Keeling G, Liu Y, Farquhar S, Xue Q, Lester J, Hughes C, Strachan P, Tan F, Bui P, Mermel CH, Peng LH, Matias Y, Corrado GS, Webster DR, Virmani S, Semturs C, Liu Y, Horn I, Cameron Chen PH. Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study. EClinicalMedicine 2024; 70:102479. [PMID: 38685924 PMCID: PMC11056401 DOI: 10.1016/j.eclinm.2024.102479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/16/2024] [Accepted: 01/25/2024] [Indexed: 05/02/2024] Open
Abstract
Background Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding Google LLC.
Collapse
Affiliation(s)
| | | | - Malcolm Pyles
- Advanced Clinical, Deerfield, IL, USA
- Department of Dermatology, Cleveland Clinic, Cleveland, OH, USA
| | | | | | | | | | | | | | - Yuan Liu
- Google Health, Mountain View, CA, USA
| | | | | | - Jenna Lester
- Advanced Clinical, Deerfield, IL, USA
- Department of Dermatology, University of California, San Francisco, CA, USA
| | | | | | | | - Peggy Bui
- Google Health, Mountain View, CA, USA
| | | | | | | | | | | | | | | | - Yun Liu
- Google Health, Mountain View, CA, USA
| | - Ivor Horn
- Google Health, Mountain View, CA, USA
| | | |
Collapse
|
7
|
Riley RD, Collins GS. Stability of clinical prediction models developed using statistical or machine learning methods. Biom J 2023; 65:e2200302. [PMID: 37466257 PMCID: PMC10952221 DOI: 10.1002/bimj.202200302] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/26/2023] [Accepted: 05/02/2023] [Indexed: 07/20/2023]
Abstract
Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model-building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model-building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals' original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements.
Collapse
Affiliation(s)
- Richard D. Riley
- Institute of Applied Health ResearchCollege of Medical and Dental SciencesUniversity of BirminghamBirminghamUK
| | - Gary S. Collins
- Centre for Statistics in MedicineNuffield Department of OrthopaedicsRheumatology and Musculoskeletal SciencesUniversity of OxfordOxfordUK
| |
Collapse
|
8
|
Schaap G, Bosse T, Hendriks Vettehen P. The ABC of algorithmic aversion: not agent, but benefits and control determine the acceptance of automated decision-making. AI & SOCIETY 2023. [DOI: 10.1007/s00146-023-01649-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
AbstractWhile algorithmic decision-making (ADM) is projected to increase exponentially in the coming decades, the academic debate on whether people are ready to accept, trust, and use ADM as opposed to human decision-making is ongoing. The current research aims at reconciling conflicting findings on ‘algorithmic aversion’ in the literature. It does so by investigating algorithmic aversion while controlling for two important characteristics that are often associated with ADM: increased benefits (monetary and accuracy) and decreased user control. Across three high-powered (Ntotal = 1192), preregistered 2 (agent: algorithm/human) × 2 (benefits: high/low) × 2 (control: user control/no control) between-subjects experiments, and two domains (finance and dating), the results were quite consistent: there is little evidence for a default aversion against algorithms and in favor of human decision makers. Instead, users accept or reject decisions and decisional agents based on their predicted benefits and the ability to exercise control over the decision.
Collapse
|
9
|
Grote T, Keeling G. Enabling Fairness in Healthcare Through Machine Learning. ETHICS AND INFORMATION TECHNOLOGY 2022; 24:39. [PMID: 36060496 PMCID: PMC9428374 DOI: 10.1007/s10676-022-09658-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/27/2022] [Indexed: 06/15/2023]
Abstract
The use of machine learning systems for decision-support in healthcare may exacerbate health inequalities. However, recent work suggests that algorithms trained on sufficiently diverse datasets could in principle combat health inequalities. One concern about these algorithms is that their performance for patients in traditionally disadvantaged groups exceeds their performance for patients in traditionally advantaged groups. This renders the algorithmic decisions unfair relative to the standard fairness metrics in machine learning. In this paper, we defend the permissible use of affirmative algorithms; that is, algorithms trained on diverse datasets that perform better for traditionally disadvantaged groups. Whilst such algorithmic decisions may be unfair, the fairness of algorithmic decisions is not the appropriate locus of moral evaluation. What matters is the fairness of final decisions, such as diagnoses, resulting from collaboration between clinicians and algorithms. We argue that affirmative algorithms can permissibly be deployed provided the resultant final decisions are fair.
Collapse
Affiliation(s)
- Thomas Grote
- Ethics and Philosophy Lab; Cluster of Excellence: Machine Learning: New Perspectives for Science, University of Tübingen, Maria von Linden Str. 6, D-72076 Tübingen, Germany
| | - Geoff Keeling
- Institute for Human-Centered AI and McCoy Family Center for Ethics in Society, Stanford University, 450 Serra Mall, 94305 Stanford, CA USA
| |
Collapse
|