1
|
Kim C, Gadgil SU, DeGrave AJ, Omiye JA, Cai ZR, Daneshjou R, Lee SI. Transparent medical image AI via an image-text foundation model grounded in medical literature. Nat Med 2024; 30:1154-1165. [PMID: 38627560 DOI: 10.1038/s41591-024-02887-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/27/2024] [Indexed: 04/21/2024]
Abstract
Building trustworthy and transparent image-based medical artificial intelligence (AI) systems requires the ability to interrogate data and models at all stages of the development pipeline, from training models to post-deployment monitoring. Ideally, the data and associated AI systems could be described using terms already familiar to physicians, but this requires medical datasets densely annotated with semantically meaningful concepts. In the present study, we present a foundation model approach, named MONET (medical concept retriever), which learns how to connect medical images with text and densely scores images on concept presence to enable important tasks in medical AI development and deployment such as data auditing, model auditing and model interpretation. Dermatology provides a demanding use case for the versatility of MONET, due to the heterogeneity in diseases, skin tones and imaging modalities. We trained MONET based on 105,550 dermatological images paired with natural language descriptions from a large collection of medical literature. MONET can accurately annotate concepts across dermatology images as verified by board-certified dermatologists, competitively with supervised models built on previously concept-annotated dermatology datasets of clinical images. We demonstrate how MONET enables AI transparency across the entire AI system development pipeline, from building inherently interpretable models to dataset and model auditing, including a case study dissecting the results of an AI clinical trial.
Collapse
Affiliation(s)
- Chanwoo Kim
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Soham U Gadgil
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Alex J DeGrave
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Jesutofunmi A Omiye
- Department of Dermatology, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA
| | - Zhuo Ran Cai
- Program for Clinical Research and Technology, Stanford University, Stanford, CA, USA
| | - Roxana Daneshjou
- Department of Dermatology, Stanford School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Bosschieter TM, Xu Z, Lan H, Lengerich BJ, Nori H, Painter I, Souter V, Caruana R. Interpretable Predictive Models to Understand Risk Factors for Maternal and Fetal Outcomes. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:65-87. [PMID: 38273984 PMCID: PMC10805688 DOI: 10.1007/s41666-023-00151-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 09/18/2023] [Accepted: 09/19/2023] [Indexed: 01/27/2024]
Abstract
Although most pregnancies result in a good outcome, complications are not uncommon and can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through a better understanding of risk factors, heightened surveillance for high-risk patients, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. We identify and study the most important risk factors for four types of pregnancy complications: (i) severe maternal morbidity, (ii) shoulder dystocia, (iii) preterm preeclampsia, and (iv) antepartum stillbirth. We use an Explainable Boosting Machine (EBM), a high-accuracy glass-box learning method, for the prediction and identification of important risk factors. We undertake external validation and perform an extensive robustness analysis of the EBM models. EBMs match the accuracy of other black-box ML methods, such as deep neural networks and random forests, and outperform logistic regression, while being more interpretable. EBMs prove to be robust. The interpretability of the EBM models reveal surprising insights into the features contributing to risk (e.g., maternal height is the second most important feature for shoulder dystocia) and may have potential for clinical application in the prediction and prevention of serious complications in pregnancy.
Collapse
Affiliation(s)
| | - Zifei Xu
- Stanford University, Stanford, CA USA
| | - Hui Lan
- Stanford University, Stanford, CA USA
| | | | | | - Ian Painter
- Foundation for Healthcare Quality, Seattle, WA USA
| | | | | |
Collapse
|
3
|
Kore A, Abbasi Bavil E, Subasri V, Abdalla M, Fine B, Dolatabadi E, Abdalla M. Empirical data drift detection experiments on real-world medical imaging data. Nat Commun 2024; 15:1887. [PMID: 38424096 PMCID: PMC10904813 DOI: 10.1038/s41467-024-46142-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024] Open
Abstract
While it is common to monitor deployed clinical artificial intelligence (AI) models for performance degradation, it is less common for the input data to be monitored for data drift - systemic changes to input distributions. However, when real-time evaluation may not be practical (eg., labeling costs) or when gold-labels are automatically generated, we argue that tracking data drift becomes a vital addition for AI deployments. In this work, we perform empirical experiments on real-world medical imaging to evaluate three data drift detection methods' ability to detect data drift caused (a) naturally (emergence of COVID-19 in X-rays) and (b) synthetically. We find that monitoring performance alone is not a good proxy for detecting data drift and that drift-detection heavily depends on sample size and patient features. Our work discusses the need and utility of data drift detection in various scenarios and highlights gaps in knowledge for the practical application of existing methods.
Collapse
Affiliation(s)
- Ali Kore
- Vector Institute, Toronto, Canada
| | | | - Vallijah Subasri
- Peter Munk Cardiac Center, University Health Network, Toronto, ON, Canada
| | - Moustafa Abdalla
- Department of Surgery, Harvard Medical School, Massachusetts General Hospital, Boston, USA
| | - Benjamin Fine
- Institute for Better Health, Trillium Health Partners, Mississauga, Canada
- Department of Medical Imaging, University of Toronto, Toronto, Canada
| | - Elham Dolatabadi
- Vector Institute, Toronto, Canada
- School of Health Policy and Management, Faculty of Health, York University, Toronto, Canada
| | - Mohamed Abdalla
- Institute for Better Health, Trillium Health Partners, Mississauga, Canada.
| |
Collapse
|
4
|
Goldstein BA, Xu C, Wilson J, Henao R, Ephraim PL, Weiner DE, Shafi T, Scialla JJ. Designing an Implementable Clinical Prediction Model for Near-Term Mortality and Long-Term Survival in Patients on Maintenance Hemodialysis. Am J Kidney Dis 2024:S0272-6386(24)00594-8. [PMID: 38493378 DOI: 10.1053/j.ajkd.2023.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/10/2023] [Accepted: 12/05/2023] [Indexed: 03/18/2024]
Abstract
RATIONALE & OBJECTIVE The life expectancy of patients treated with maintenance hemodialysis (MHD) is heterogeneous. Knowledge of life-expectancy may focus care decisions on near-term versus long-term goals. The current tools are limited and focus on near-term mortality. Here, we develop and assess potential utility for predicting near-term mortality and long-term survival on MHD. STUDY DESIGN Predictive modeling study. SETTING & PARTICIPANTS 42,351 patients contributing 997,381 patient months over 11 years, abstracted from the electronic health record (EHR) system of midsize, nonprofit dialysis providers. NEW PREDICTORS & ESTABLISHED PREDICTORS Demographics, laboratory results, vital signs, and service utilization data available within dialysis EHR. OUTCOME For each patient month, we ascertained death within the next 6 months (ie, near-term mortality) and survival over more than 5 years during receipt of MHD or after kidney transplantation (ie, long-term survival). ANALYTICAL APPROACH We used least absolute shrinkage and selection operator logistic regression and gradient-boosting machines to predict each outcome. We compared these to time-to-event models spanning both time horizons. We explored the performance of decision rules at different cut points. RESULTS All models achieved an area under the receiver operator characteristic curve of≥0.80 and optimal calibration metrics in the test set. The long-term survival models had significantly better performance than the near-term mortality models. The time-to-event models performed similarly to binary models. Applying different cut points spanning from the 1st to 90th percentile of the predictions, a positive predictive value (PPV) of 54% could be achieved for near-term mortality, but with poor sensitivity of 6%. A PPV of 71% could be achieved for long-term survival with a sensitivity of 67%. LIMITATIONS The retrospective models would need to be prospectively validated before they could be appropriately used as clinical decision aids. CONCLUSIONS A model built with readily available clinical variables to support easy implementation can predict clinically important life expectancy thresholds and shows promise as a clinical decision support tool for patients on MHD. Predicting long-term survival has better decision rule performance than predicting near-term mortality. PLAIN-LANGUAGE SUMMARY Clinical prediction models (CPMs) are not widely used for patients undergoing maintenance hemodialysis (MHD). Although a variety of CPMs have been reported in the literature, many of these were not well-designed to be easily implementable. We consider the performance of an implementable CPM for both near-term mortality and long-term survival for patients undergoing MHD. Both near-term and long-term models have similar predictive performance, but the long-term models have greater clinical utility. We further consider how the differential performance of predicting over different time horizons may be used to impact clinical decision making. Although predictive modeling is not regularly used for MHD patients, such tools may help promote individualized care planning and foster shared decision making.
Collapse
Affiliation(s)
- Benjamin A Goldstein
- Department of Biostatistics and Bioinformatics, School of Medicine, Duke University, Durham, North Carolina.
| | - Chun Xu
- Department of Biostatistics and Bioinformatics, School of Medicine, Duke University, Durham, North Carolina
| | - Jonathan Wilson
- Department of Biostatistics and Bioinformatics, School of Medicine, Duke University, Durham, North Carolina
| | - Ricardo Henao
- Department of Biostatistics and Bioinformatics, School of Medicine, Duke University, Durham, North Carolina
| | - Patti L Ephraim
- Institute of Health System Science, Feinstein Institute for Medical Research, Northwell Health, New York, New York
| | - Daniel E Weiner
- Department of Medicine, School of Medicine, Tufts University, Boston, Massachusetts
| | - Tariq Shafi
- Division of Nephrology, Department of Medicine, Houston Methodist Hospital, Houston, Texas
| | - Julia J Scialla
- Departments of Medicine and Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
5
|
Economou-Zavlanos NJ, Bessias S, Cary MP, Bedoya AD, Goldstein BA, Jelovsek JE, O’Brien CL, Walden N, Elmore M, Parrish AB, Elengold S, Lytle KS, Balu S, Lipkin ME, Shariff AI, Gao M, Leverenz D, Henao R, Ming DY, Gallagher DM, Pencina MJ, Poon EG. Translating ethical and quality principles for the effective, safe and fair development, deployment and use of artificial intelligence technologies in healthcare. J Am Med Inform Assoc 2024; 31:705-713. [PMID: 38031481 PMCID: PMC10873841 DOI: 10.1093/jamia/ocad221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/06/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023] Open
Abstract
OBJECTIVE The complexity and rapid pace of development of algorithmic technologies pose challenges for their regulation and oversight in healthcare settings. We sought to improve our institution's approach to evaluation and governance of algorithmic technologies used in clinical care and operations by creating an Implementation Guide that standardizes evaluation criteria so that local oversight is performed in an objective fashion. MATERIALS AND METHODS Building on a framework that applies key ethical and quality principles (clinical value and safety, fairness and equity, usability and adoption, transparency and accountability, and regulatory compliance), we created concrete guidelines for evaluating algorithmic technologies at our institution. RESULTS An Implementation Guide articulates evaluation criteria used during review of algorithmic technologies and details what evidence supports the implementation of ethical and quality principles for trustworthy health AI. Application of the processes described in the Implementation Guide can lead to algorithms that are safer as well as more effective, fair, and equitable upon implementation, as illustrated through 4 examples of technologies at different phases of the algorithmic lifecycle that underwent evaluation at our academic medical center. DISCUSSION By providing clear descriptions/definitions of evaluation criteria and embedding them within standardized processes, we streamlined oversight processes and educated communities using and developing algorithmic technologies within our institution. CONCLUSIONS We developed a scalable, adaptable framework for translating principles into evaluation criteria and specific requirements that support trustworthy implementation of algorithmic technologies in patient care and healthcare operations.
Collapse
Affiliation(s)
| | - Sophia Bessias
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
| | - Michael P Cary
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
- Duke University School of Nursing, Durham, NC 27710, United States
| | - Armando D Bedoya
- Duke Health Technology Solutions, Duke University Health System, Durham, NC 27705, United States
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
| | - Benjamin A Goldstein
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, United States
| | - John E Jelovsek
- Department of Obstetrics and Gynecology, Duke University School of Medicine, Durham, NC 27710, United States
| | - Cara L O’Brien
- Duke Health Technology Solutions, Duke University Health System, Durham, NC 27705, United States
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
| | - Nancy Walden
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
| | - Matthew Elmore
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
| | - Amanda B Parrish
- Office of Regulatory Affairs and Quality, Duke University School of Medicine, Durham, NC 27705, United States
| | - Scott Elengold
- Office of Counsel, Duke University, Durham, NC 27701, United States
| | - Kay S Lytle
- Duke University School of Nursing, Durham, NC 27710, United States
- Duke Health Technology Solutions, Duke University Health System, Durham, NC 27705, United States
| | - Suresh Balu
- Duke Institute for Health Innovation, Duke University, Durham, NC 27701, United States
| | - Michael E Lipkin
- Department of Urology, Duke University School of Medicine, Durham, NC 27710, United States
| | - Afreen Idris Shariff
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
- Duke Endocrine-Oncology Program, Duke University Health System, Durham, NC 27710, United States
| | - Michael Gao
- Duke Institute for Health Innovation, Duke University, Durham, NC 27701, United States
| | - David Leverenz
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
| | - Ricardo Henao
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, United States
- Department of Bioengineering, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - David Y Ming
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
- Duke Department of Pediatrics, Duke University Health System, Durham, NC 27705, United States
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC 27701, United States
| | - David M Gallagher
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
| | - Michael J Pencina
- Duke AI Health, Duke University School of Medicine, Durham, NC 27705, United States
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, United States
| | - Eric G Poon
- Duke Health Technology Solutions, Duke University Health System, Durham, NC 27705, United States
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, United States
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, United States
| |
Collapse
|
6
|
Mello MM, Shah NH, Char DS. President Biden's Executive Order on Artificial Intelligence-Implications for Health Care Organizations. JAMA 2024; 331:17-18. [PMID: 38032634 DOI: 10.1001/jama.2023.25051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
This Viewpoint discusses a recent executive order by US President Joe Biden about the development and implementation of AI, including the role of government vs the private sector and how the order may affect health care.
Collapse
Affiliation(s)
- Michelle M Mello
- Department of Health Policy, Stanford University School of Medicine, Stanford, California
- Stanford Law School and The Freeman Spogli Institute for International Studies, Stanford, California
| | - Nigam H Shah
- Departments of Medicine and Biomedical Data Science, and the Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, California
| | - Danton S Char
- Department of Anesthesiology, Stanford University School of Medicine, Stanford, California
- Stanford Center for Biomedical Ethics, Stanford, California
| |
Collapse
|
7
|
Hekman DJ, Barton HJ, Maru AP, Wills G, Cochran AL, Fritsch C, Wiegmann DA, Liao F, Patterson BW. Dashboarding to Monitor Machine-Learning-Based Clinical Decision Support Interventions. Appl Clin Inform 2024; 15:164-169. [PMID: 38029792 PMCID: PMC10901643 DOI: 10.1055/a-2219-5175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/28/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND Existing monitoring of machine-learning-based clinical decision support (ML-CDS) is focused predominantly on the ML outputs and accuracy thereof. Improving patient care requires not only accurate algorithms but also systems of care that enable the output of these algorithms to drive specific actions by care teams, necessitating expanding their monitoring. OBJECTIVES In this case report, we describe the creation of a dashboard that allows the intervention development team and operational stakeholders to govern and identify potential issues that may require corrective action by bridging the monitoring gap between model outputs and patient outcomes. METHODS We used an iterative development process to build a dashboard to monitor the performance of our intervention in the broader context of the care system. RESULTS Our investigation of best practices elsewhere, iterative design, and expert consultation led us to anchor our dashboard on alluvial charts and control charts. Both the development process and the dashboard itself illuminated areas to improve the broader intervention. CONCLUSION We propose that monitoring ML-CDS algorithms with regular dashboards that allow both a context-level view of the system and a drilled down view of specific components is a critical part of implementing these algorithms to ensure that these tools function appropriately within the broader care system.
Collapse
Affiliation(s)
- Daniel J. Hekman
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Hanna J. Barton
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Apoorva P. Maru
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Graham Wills
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Amy L. Cochran
- Department of Population Health, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Corey Fritsch
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Douglas A. Wiegmann
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin, United States
| | - Frank Liao
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Brian W. Patterson
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
- Department of Population Health, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin, United States
| |
Collapse
|
8
|
Nong P, Adler-Milstein J, Platt J. How patients distinguish between clinical and administrative predictive models in health care. THE AMERICAN JOURNAL OF MANAGED CARE 2024; 30:31-37. [PMID: 38271580 PMCID: PMC10962331 DOI: 10.37765/ajmc.2024.89484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
OBJECTIVES To understand patient perceptions of specific applications of predictive models in health care. STUDY DESIGN Original, cross-sectional national survey. METHODS We conducted a national online survey of US adults with the National Opinion Research Center from November to December 2021. Measures of internal consistency were used to identify how patients differentiate between clinical and administrative predictive models. Multivariable logistic regressions were used to identify relationships between comfort with various types of predictive models and patient demographics, perceptions of privacy protections, and experiences in the health care system. RESULTS A total of 1541 respondents completed the survey. After excluding observations with missing data for the variables of interest, the final analytic sample was 1488. We found that patients differentiate between clinical and administrative predictive models. Comfort with prediction of bill payment and missed appointments was especially low (21.6% and 36.6%, respectively). Comfort was higher with clinical predictive models, such as predicting stroke in an emergency (55.8%). Experiences of discrimination were significant negative predictors of comfort with administrative predictive models. Health system transparency around privacy policies was a significant positive predictor of comfort with both clinical and administrative predictive models. CONCLUSIONS Patients are more comfortable with clinical applications of predictive models than administrative ones. Privacy protections and transparency about how health care systems protect patient data may facilitate patient comfort with these technologies. However, larger inequities and negative experiences in health care remain important for how patients perceive administrative applications of prediction.
Collapse
Affiliation(s)
- Paige Nong
- Division of Health Policy and Management, University of Minnesota School of Public Health, 516 Delaware St SE, Minneapolis, MN 55455.
| | | | | |
Collapse
|
9
|
Chin MH, Afsar-Manesh N, Bierman AS, Chang C, Colón-Rodríguez CJ, Dullabh P, Duran DG, Fair M, Hernandez-Boussard T, Hightower M, Jain A, Jordan WB, Konya S, Moore RH, Moore TT, Rodriguez R, Shaheen G, Snyder LP, Srinivasan M, Umscheid CA, Ohno-Machado L. Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care. JAMA Netw Open 2023; 6:e2345050. [PMID: 38100101 DOI: 10.1001/jamanetworkopen.2023.45050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Importance Health care algorithms are used for diagnosis, treatment, prognosis, risk stratification, and allocation of resources. Bias in the development and use of algorithms can lead to worse outcomes for racial and ethnic minoritized groups and other historically marginalized populations such as individuals with lower income. Objective To provide a conceptual framework and guiding principles for mitigating and preventing bias in health care algorithms to promote health and health care equity. Evidence Review The Agency for Healthcare Research and Quality and the National Institute for Minority Health and Health Disparities convened a diverse panel of experts to review evidence, hear from stakeholders, and receive community feedback. Findings The panel developed a conceptual framework to apply guiding principles across an algorithm's life cycle, centering health and health care equity for patients and communities as the goal, within the wider context of structural racism and discrimination. Multiple stakeholders can mitigate and prevent bias at each phase of the algorithm life cycle, including problem formulation (phase 1); data selection, assessment, and management (phase 2); algorithm development, training, and validation (phase 3); deployment and integration of algorithms in intended settings (phase 4); and algorithm monitoring, maintenance, updating, or deimplementation (phase 5). Five principles should guide these efforts: (1) promote health and health care equity during all phases of the health care algorithm life cycle; (2) ensure health care algorithms and their use are transparent and explainable; (3) authentically engage patients and communities during all phases of the health care algorithm life cycle and earn trustworthiness; (4) explicitly identify health care algorithmic fairness issues and trade-offs; and (5) establish accountability for equity and fairness in outcomes from health care algorithms. Conclusions and Relevance Multiple stakeholders must partner to create systems, processes, regulations, incentives, standards, and policies to mitigate and prevent algorithmic bias. Reforms should implement guiding principles that support promotion of health and health care equity in all phases of the algorithm life cycle as well as transparency and explainability, authentic community engagement and ethical partnerships, explicit identification of fairness issues and trade-offs, and accountability for equity and fairness.
Collapse
Affiliation(s)
| | | | | | - Christine Chang
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | | | | | - Malika Fair
- Association of American Medical Colleges, Washington, DC
| | | | | | - Anjali Jain
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | - Stephen Konya
- Office of the National Coordinator for Health Information Technology, Washington, DC
| | - Roslyn Holliday Moore
- US Department of Health and Human Services Office of Minority Health, Rockville, Maryland
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Zaribafzadeh H, Webster WL, Vail CJ, Daigle T, Kirk AD, Allen PJ, Henao R, Buckland DM. Development, Deployment, and Implementation of a Machine Learning Surgical Case Length Prediction Model and Prospective Evaluation. Ann Surg 2023; 278:890-895. [PMID: 37264901 PMCID: PMC10631498 DOI: 10.1097/sla.0000000000005936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
OBJECTIVE To implement a machine learning model using only the restricted data available at case creation time to predict surgical case length for multiple services at different locations. BACKGROUND The operating room is one of the most expensive resources in a health system, estimated to cost $22 to $133 per minute and generate about 40% of hospital revenue. Accurate prediction of surgical case length is necessary for efficient scheduling and cost-effective utilization of the operating room and other resources. METHODS We introduced a similarity cascade to capture the complexity of cases and surgeon influence on the case length and incorporated that into a gradient-boosting machine learning model. The model loss function was customized to improve the balance between over- and under-prediction of the case length. A production pipeline was created to seamlessly deploy and implement the model across our institution. RESULTS The prospective analysis showed that the model output was gradually adopted by the schedulers and outperformed the scheduler-predicted case length from August to December 2022. In 33,815 surgical cases across outpatient and inpatient platforms, the operational implementation predicted 11.2% fewer underpredicted cases and 5.9% more cases within 20% of the actual case length compared with the schedulers and only overpredicted 5.3% more. The model assisted schedulers to predict 3.4% more cases within 20% of the actual case length and 4.3% fewer underpredicted cases. CONCLUSIONS We created a unique framework that is being leveraged every day to predict surgical case length more accurately at case posting time and could be potentially utilized to deploy future machine learning models.
Collapse
Affiliation(s)
- Hamed Zaribafzadeh
- Department of Biostatistics and Bioinformatics, and Department of Surgery, Duke University, Durham, NC
| | | | | | - Thomas Daigle
- Duke Health Technology Solutions, Duke University Health System, Durham, NC
| | | | | | - Ricardo Henao
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC
| | - Daniel M. Buckland
- Department of Surgery, Duke University, Durham, NC
- Department of Emergency Medicine and Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC
| |
Collapse
|
11
|
Nwosu OI, Crowson MG, Rameau A. Artificial Intelligence Governance and Otolaryngology-Head and Neck Surgery. Laryngoscope 2023; 133:2868-2870. [PMID: 37658749 PMCID: PMC10592089 DOI: 10.1002/lary.31013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/05/2023]
Abstract
This rapid communication highlights components of artificial intelligence governance in healthcare and suggests adopting key governance approaches in otolaryngology – head and neck surgery.
Collapse
Affiliation(s)
- Obinna I. Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA
- Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, Massachusetts, USA
| | - Matthew G. Crowson
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA
- Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, Massachusetts, USA
- Deloitte Consulting, Boston, Massachusetts, USA
| | - Anaïs Rameau
- Department of Otolaryngology–Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
12
|
Youssef A, Pencina M, Thakur A, Zhu T, Clifton D, Shah NH. External validation of AI models in health should be replaced with recurring local validation. Nat Med 2023; 29:2686-2687. [PMID: 37853136 DOI: 10.1038/s41591-023-02540-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Affiliation(s)
- Alexey Youssef
- Stanford Bioengineering Department, Stanford University, Stanford, CA, USA.
- Department of Engineering Science, University of Oxford, Oxford, UK.
| | | | - Anshul Thakur
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - Tingting Zhu
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - David Clifton
- Department of Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
- Technology and Digital Solutions, Stanford Medicine, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford Medicine, Stanford, CA, USA
| |
Collapse
|
13
|
Liu M, Ning Y, Teixayavong S, Mertens M, Xu J, Ting DSW, Cheng LTE, Ong JCL, Teo ZL, Tan TF, RaviChandran N, Wang F, Celi LA, Ong MEH, Liu N. A translational perspective towards clinical AI fairness. NPJ Digit Med 2023; 6:172. [PMID: 37709945 PMCID: PMC10502051 DOI: 10.1038/s41746-023-00918-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the fairness of such data-driven insights remains a concern in high-stakes fields. Despite extensive developments, issues of AI fairness in clinical contexts have not been adequately addressed. A fair model is normally expected to perform equally across subgroups defined by sensitive variables (e.g., age, gender/sex, race/ethnicity, socio-economic status, etc.). Various fairness measurements have been developed to detect differences between subgroups as evidence of bias, and bias mitigation methods are designed to reduce the differences detected. This perspective of fairness, however, is misaligned with some key considerations in clinical contexts. The set of sensitive variables used in healthcare applications must be carefully examined for relevance and justified by clear clinical motivations. In addition, clinical AI fairness should closely investigate the ethical implications of fairness measurements (e.g., potential conflicts between group- and individual-level fairness) to select suitable and objective metrics. Generally defining AI fairness as "equality" is not necessarily reasonable in clinical settings, as differences may have clinical justifications and do not indicate biases. Instead, "equity" would be an appropriate objective of clinical AI fairness. Moreover, clinical feedback is essential to developing fair and well-performing AI models, and efforts should be made to actively involve clinicians in the process. The adaptation of AI fairness towards healthcare is not self-evident due to misalignments between technical developments and clinical considerations. Multidisciplinary collaboration between AI researchers, clinicians, and ethicists is necessary to bridge the gap and translate AI fairness into real-life benefits.
Collapse
Affiliation(s)
- Mingxuan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | | | - Mayli Mertens
- Centre for Ethics, Department of Philosophy, University of Antwerp, Antwerp, Belgium
- Antwerp Center on Responsible AI, University of Antwerp, Antwerp, Belgium
| | - Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Daniel Shu Wei Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- SingHealth AI Office, Singapore Health Services, Singapore, Singapore
| | - Lionel Tim-Ee Cheng
- Department of Diagnostic Radiology, Singapore General Hospital, Singapore, Singapore
| | | | - Zhen Ling Teo
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Ting Fang Tan
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | | | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.
- SingHealth AI Office, Singapore Health Services, Singapore, Singapore.
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore.
- Institute of Data Science, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
14
|
Corbin CK, Maclay R, Acharya A, Mony S, Punnathanam S, Thapa R, Kotecha N, Shah NH, Chen JH. DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. J Am Med Inform Assoc 2023; 30:1532-1542. [PMID: 37369008 PMCID: PMC10436147 DOI: 10.1093/jamia/ocad114] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/16/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE Heatlhcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable, and reliable machine learning models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient, safe and high-quality manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher-created models into a widely used electronic medical record system. MATERIALS AND METHODS We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within electronic medical record software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact. RESULTS We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating 12 machine learning models trained using electronic medical record data that predict laboratory diagnostic results, triggered by clinician button-clicks in Stanford Health Care's electronic medical record. DISCUSSION Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. When possible, we recommend using prospectively estimated performance measures during silent trials to make final go decisions for model deployment. CONCLUSION Machine learning applications in healthcare are extensively researched, but successful translations to the bedside are rare. By describing DEPLOYR, we aim to inform machine learning deployment best practices and help bridge the model implementation gap.
Collapse
Affiliation(s)
- Conor K Corbin
- Department of Biomedical Data Science, Stanford, California, USA
| | - Rob Maclay
- Stanford Children’s Health, Palo Alto, California, USA
| | | | | | | | - Rahul Thapa
- Stanford Health Care, Palo Alto, California, USA
| | | | - Nigam H Shah
- Center for Biomedical Informatics Research, Division of Hospital Medicine, Department of Medicine, Stanford University, School of Medicine, Stanford, California, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Division of Hospital Medicine, Department of Medicine, Stanford University, School of Medicine, Stanford, California, USA
| |
Collapse
|
15
|
van der Vegt AH, Scott IA, Dermawan K, Schnetler RJ, Kalke VR, Lane PJ. Implementation frameworks for end-to-end clinical AI: derivation of the SALIENT framework. J Am Med Inform Assoc 2023; 30:1503-1515. [PMID: 37208863 PMCID: PMC10436156 DOI: 10.1093/jamia/ocad088] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/17/2023] [Accepted: 05/09/2023] [Indexed: 05/21/2023] Open
Abstract
OBJECTIVE To derive a comprehensive implementation framework for clinical AI models within hospitals informed by existing AI frameworks and integrated with reporting standards for clinical AI research. MATERIALS AND METHODS (1) Derive a provisional implementation framework based on the taxonomy of Stead et al and integrated with current reporting standards for AI research: TRIPOD, DECIDE-AI, CONSORT-AI. (2) Undertake a scoping review of published clinical AI implementation frameworks and identify key themes and stages. (3) Perform a gap analysis and refine the framework by incorporating missing items. RESULTS The provisional AI implementation framework, called SALIENT, was mapped to 5 stages common to both the taxonomy and the reporting standards. A scoping review retrieved 20 studies and 247 themes, stages, and subelements were identified. A gap analysis identified 5 new cross-stage themes and 16 new tasks. The final framework comprised 5 stages, 7 elements, and 4 components, including the AI system, data pipeline, human-computer interface, and clinical workflow. DISCUSSION This pragmatic framework resolves gaps in existing stage- and theme-based clinical AI implementation guidance by comprehensively addressing the what (components), when (stages), and how (tasks) of AI implementation, as well as the who (organization) and why (policy domains). By integrating research reporting standards into SALIENT, the framework is grounded in rigorous evaluation methodologies. The framework requires validation as being applicable to real-world studies of deployed AI models. CONCLUSIONS A novel end-to-end framework has been developed for implementing AI within hospital clinical practice that builds on previous AI implementation frameworks and research reporting standards.
Collapse
Affiliation(s)
- Anton H van der Vegt
- Centre for Health Services Research, The University of Queensland, Brisbane, Australia
| | - Ian A Scott
- Department of Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Brisbane, Australia
| | - Krishna Dermawan
- Centre for Information Resilience, The University of Queensland, St Lucia, Australia
| | - Rudolf J Schnetler
- School of Information Technology and Electrical Engineering, The University of Queensland, St Lucia, Australia
| | - Vikrant R Kalke
- Patient Safety and Quality, Clinical Excellence Queensland, Queensland Health, Brisbane, Australia
| | - Paul J Lane
- Safety Quality & Innovation, The Prince Charles Hospital, Queensland Health, Brisbane, Australia
| |
Collapse
|
16
|
Hekman DJ, Cochran AL, Maru AP, Barton HJ, Shah MN, Wiegmann D, Smith MA, Liao F, Patterson BW. Effectiveness of an Emergency Department-Based Machine Learning Clinical Decision Support Tool to Prevent Outpatient Falls Among Older Adults: Protocol for a Quasi-Experimental Study. JMIR Res Protoc 2023; 12:e48128. [PMID: 37535416 PMCID: PMC10436111 DOI: 10.2196/48128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 05/04/2023] [Accepted: 05/23/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND Emergency department (ED) providers are important collaborators in preventing falls for older adults because they are often the first health care providers to see a patient after a fall and because at-home falls are often preceded by previous ED visits. Previous work has shown that ED referrals to falls interventions can reduce the risk of an at-home fall by 38%. Screening patients at risk for a fall can be time-consuming and difficult to implement in the ED setting. Machine learning (ML) and clinical decision support (CDS) offer the potential of automating the screening process. However, it remains unclear whether automation of screening and referrals can reduce the risk of future falls among older patients. OBJECTIVE The goal of this paper is to describe a research protocol for evaluating the effectiveness of an automated screening and referral intervention. These findings will inform ongoing discussions about the use of ML and artificial intelligence to augment medical decision-making. METHODS To assess the effectiveness of our program for patients receiving the falls risk intervention, our primary analysis will be to obtain referral completion rates at 3 different EDs. We will use a quasi-experimental design known as a sharp regression discontinuity with regard to intent-to-treat, since the intervention is administered to patients whose risk score falls above a threshold. A conditional logistic regression model will be built to describe 6-month fall risk at each site as a function of the intervention, patient demographics, and risk score. The odds ratio of a return visit for a fall and the 95% CI will be estimated by comparing those identified as high risk by the ML-based CDS (ML-CDS) and those who were not but had a similar risk profile. RESULTS The ML-CDS tool under study has been implemented at 2 of the 3 EDs in our study. As of April 2023, a total of 1326 patient encounters have been flagged for providers, and 339 unique patients have been referred to the mobility and falls clinic. To date, 15% (45/339) of patients have scheduled an appointment with the clinic. CONCLUSIONS This study seeks to quantify the impact of an ML-CDS intervention on patient behavior and outcomes. Our end-to-end data set allows for a more meaningful analysis of patient outcomes than other studies focused on interim outcomes, and our multisite implementation plan will demonstrate applicability to a broad population and the possibility to adapt the intervention to other EDs and achieve similar results. Our statistical methodology, regression discontinuity design, allows for causal inference from observational data and a staggered implementation strategy allows for the identification of secular trends that could affect causal associations and allow mitigation as necessary. TRIAL REGISTRATION ClinicalTrials.gov NCT05810064; https://www.clinicaltrials.gov/study/NCT05810064. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/48128.
Collapse
Affiliation(s)
- Daniel J Hekman
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Amy L Cochran
- Department of Population Health, University of Wisconsin-Madison, Madison, WI, United States
| | - Apoorva P Maru
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Hanna J Barton
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, WI, United States
| | - Manish N Shah
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Douglas Wiegmann
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, WI, United States
| | - Maureen A Smith
- Health Innovation Program, University of Wisconsin-Madison, Madison, WI, United States
| | - Frank Liao
- Department of Applied Data Science, UWHealth Hospitals and Clinics, University of Wisconsin-Madison, Madison, WI, United States
| | - Brian W Patterson
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
17
|
Kuziemsky CE. The Role of Human and Organizational Factors in the Pursuit of One Digital Health. Yearb Med Inform 2023; 32:201-209. [PMID: 37414032 PMCID: PMC10751147 DOI: 10.1055/s-0043-1768724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
OBJECTIVE This paper surveys a subset of the 2022 human and organizational factor (HOF) literature to provide guidance on building a One Digital Health ecosystem. METHODS We searched a subset of journals in PubMed/Medline for studies with "human factors" or "organization" in the title or abstract. Papers published in 2022 were eligible for inclusion in the survey. Selected papers were categorized into structural and behavioural aspects to understand digital health enabled interactions across micro, meso, and macro systems. RESULTS Our survey of the 2022 HOF literature showed that while we continue to make meaningful progress at digital health enabled interactions across systems levels, there are still challenges that must be overcome. For example, we must continue to grow the breadth of HOF research beyond individual users and systems to assist with the scale up of digital health systems across and beyond organizations. We summarize the findings by providing five HOF considerations to help build a One Digital Health ecosystem. CONCLUSION One Digital Health challenges us to improve coordination, communication, and collaboration between the health, environmental and veterinary sectors. Doing so requires us to develop both the structural and behavioural capacity of digital health systems at the organizational level and beyond so that we can develop more robust and integrated systems across health, environmental and veterinary sectors. The HOF community has much to offer and must play a leading role in designing a One Digital Health ecosystem.
Collapse
|
18
|
Brereton TA, Malik MM, Lifson M, Greenwood JD, Peterson KJ, Overgaard SM. The Role of Artificial Intelligence Model Documentation in Translational Science: Scoping Review. Interact J Med Res 2023; 12:e45903. [PMID: 37450330 PMCID: PMC10382950 DOI: 10.2196/45903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/10/2023] [Accepted: 05/11/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Despite the touted potential of artificial intelligence (AI) and machine learning (ML) to revolutionize health care, clinical decision support tools, herein referred to as medical modeling software (MMS), have yet to realize the anticipated benefits. One proposed obstacle is the acknowledged gaps in AI translation. These gaps stem partly from the fragmentation of processes and resources to support MMS transparent documentation. Consequently, the absence of transparent reporting hinders the provision of evidence to support the implementation of MMS in clinical practice, thereby serving as a substantial barrier to the successful translation of software from research settings to clinical practice. OBJECTIVE This study aimed to scope the current landscape of AI- and ML-based MMS documentation practices and elucidate the function of documentation in facilitating the translation of ethical and explainable MMS into clinical workflows. METHODS A scoping review was conducted in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. PubMed was searched using Medical Subject Headings key concepts of AI, ML, ethical considerations, and explainability to identify publications detailing AI- and ML-based MMS documentation, in addition to snowball sampling of selected reference lists. To include the possibility of implicit documentation practices not explicitly labeled as such, we did not use documentation as a key concept but as an inclusion criterion. A 2-stage screening process (title and abstract screening and full-text review) was conducted by 1 author. A data extraction template was used to record publication-related information; barriers to developing ethical and explainable MMS; available standards, regulations, frameworks, or governance strategies related to documentation; and recommendations for documentation for papers that met the inclusion criteria. RESULTS Of the 115 papers retrieved, 21 (18.3%) papers met the requirements for inclusion. Ethics and explainability were investigated in the context of AI- and ML-based MMS documentation and translation. Data detailing the current state and challenges and recommendations for future studies were synthesized. Notable themes defining the current state and challenges that required thorough review included bias, accountability, governance, and explainability. Recommendations identified in the literature to address present barriers call for a proactive evaluation of MMS, multidisciplinary collaboration, adherence to investigation and validation protocols, transparency and traceability requirements, and guiding standards and frameworks that enhance documentation efforts and support the translation of AI- and ML-based MMS. CONCLUSIONS Resolving barriers to translation is critical for MMS to deliver on expectations, including those barriers identified in this scoping review related to bias, accountability, governance, and explainability. Our findings suggest that transparent strategic documentation, aligning translational science and regulatory science, will support the translation of MMS by coordinating communication and reporting and reducing translational barriers, thereby furthering the adoption of MMS.
Collapse
Affiliation(s)
- Tracey A Brereton
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | - Momin M Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | - Mark Lifson
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | - Jason D Greenwood
- Department of Family Medicine, Mayo Clinic, Rochester, MN, United States
| | - Kevin J Peterson
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | | |
Collapse
|
19
|
APLUS: A Python library for usefulness simulations of machine learning models in healthcare. J Biomed Inform 2023; 139:104319. [PMID: 36791900 DOI: 10.1016/j.jbi.2023.104319] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/16/2023]
Abstract
Despite the creation of thousands of machine learning (ML) models, the promise of improving patient care with ML remains largely unrealized. Adoption into clinical practice is lagging, in large part due to disconnects between how ML practitioners evaluate models and what is required for their successful integration into care delivery. Models are just one component of care delivery workflows whose constraints determine clinicians' abilities to act on models' outputs. However, methods to evaluate the usefulness of models in the context of their corresponding workflows are currently limited. To bridge this gap we developed APLUS, a reusable framework for quantitatively assessing via simulation the utility gained from integrating a model into a clinical workflow. We describe the APLUS simulation engine and workflow specification language, and apply it to evaluate a novel ML-based screening pathway for detecting peripheral artery disease at Stanford Health Care.
Collapse
|
20
|
Kawamoto K, Finkelstein J, Del Fiol G. Implementing Machine Learning in the Electronic Health Record: Checklist of Essential Considerations. Mayo Clin Proc 2023; 98:366-369. [PMID: 36868743 DOI: 10.1016/j.mayocp.2023.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 01/19/2023] [Indexed: 03/05/2023]
Affiliation(s)
- Kensaku Kawamoto
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT.
| | - Joseph Finkelstein
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| |
Collapse
|
21
|
Goldstein BA, Mazurowski MA, Li C. The Need for Targeted Labeling of Machine Learning-Based Software as a Medical Device. JAMA Netw Open 2022; 5:e2242351. [PMID: 36409502 DOI: 10.1001/jamanetworkopen.2022.42351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Benjamin A Goldstein
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina
- Duke AI Health, Duke University School of Medicine, Durham, North Carolina
| | - Maciej A Mazurowski
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina
- Duke AI Health, Duke University School of Medicine, Durham, North Carolina
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina
| | - Cheng Li
- Independent Regulatory Consultant, Durham, North Carolina
| |
Collapse
|