1
|
Weber S, Wyszynski M, Godefroid M, Plattfaut R, Niehaves B. How do medical professionals make sense (or not) of AI? A social-media-based computational grounded theory study and an online survey. Comput Struct Biotechnol J 2024; 24:146-159. [PMID: 38434249 PMCID: PMC10904922 DOI: 10.1016/j.csbj.2024.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 03/05/2024] Open
Abstract
To investigate opinions and attitudes of medical professionals towards adopting AI-enabled healthcare technologies in their daily business, we used a mixed-methods approach. Study 1 employed a qualitative computational grounded theory approach analyzing 181 Reddit threads in the several subreddits of r/medicine. By utilizing an unsupervised machine learning clustering method, we identified three key themes: (1) consequences of AI, (2) physician-AI relationship, and (3) a proposed way forward. In particular Reddit posts related to the first two themes indicated that the medical professionals' fear of being replaced by AI and skepticism toward AI played a major role in the argumentations. Moreover, the results suggest that this fear is driven by little or moderate knowledge about AI. Posts related to the third theme focused on factual discussions about how AI and medicine have to be designed to become broadly adopted in health care. Study 2 quantitatively examined the relationship between the fear of AI, knowledge about AI, and medical professionals' intention to use AI-enabled technologies in more detail. Results based on a sample of 223 medical professionals who participated in the online survey revealed that the intention to use AI technologies increases with increasing knowledge about AI and that this effect is moderated by the fear of being replaced by AI.
Collapse
Affiliation(s)
- Sebastian Weber
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marc Wyszynski
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marie Godefroid
- University of Siegen, Information Systems, Kohlbettstr. 15, 57072 Siegen, Germany
| | - Ralf Plattfaut
- University of Duisburg-Essen, Information Systems and Transformation Management, Universitätsstr. 9, 45141 Essen, Germany
| | - Bjoern Niehaves
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| |
Collapse
|
2
|
Pinsky MR, Bedoya A, Bihorac A, Celi L, Churpek M, Economou-Zavlanos NJ, Elbers P, Saria S, Liu V, Lyons PG, Shickel B, Toral P, Tscholl D, Clermont G. Use of artificial intelligence in critical care: opportunities and obstacles. Crit Care 2024; 28:113. [PMID: 38589940 PMCID: PMC11000355 DOI: 10.1186/s13054-024-04860-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Perhaps nowhere else in the healthcare system than in the intensive care unit environment are the challenges to create useful models with direct time-critical clinical applications more relevant and the obstacles to achieving those goals more massive. Machine learning-based artificial intelligence (AI) techniques to define states and predict future events are commonplace activities of modern life. However, their penetration into acute care medicine has been slow, stuttering and uneven. Major obstacles to widespread effective application of AI approaches to the real-time care of the critically ill patient exist and need to be addressed. MAIN BODY Clinical decision support systems (CDSSs) in acute and critical care environments support clinicians, not replace them at the bedside. As will be discussed in this review, the reasons are many and include the immaturity of AI-based systems to have situational awareness, the fundamental bias in many large databases that do not reflect the target population of patient being treated making fairness an important issue to address and technical barriers to the timely access to valid data and its display in a fashion useful for clinical workflow. The inherent "black-box" nature of many predictive algorithms and CDSS makes trustworthiness and acceptance by the medical community difficult. Logistically, collating and curating in real-time multidimensional data streams of various sources needed to inform the algorithms and ultimately display relevant clinical decisions support format that adapt to individual patient responses and signatures represent the efferent limb of these systems and is often ignored during initial validation efforts. Similarly, legal and commercial barriers to the access to many existing clinical databases limit studies to address fairness and generalizability of predictive models and management tools. CONCLUSIONS AI-based CDSS are evolving and are here to stay. It is our obligation to be good shepherds of their use and further development.
Collapse
Affiliation(s)
- Michael R Pinsky
- Department of Critical Care Medicine, School of Medicine, University of Pittsburgh, 638 Scaife Hall, 3550 Terrace Street, Pittsburgh, PA, 15261, USA.
| | - Armando Bedoya
- Algorithm-Based Clinical Decision Support (ABCDS) Oversight, Office of Vice Dean of Data Science, School of Medicine, Duke University, Durham, NC, 27705, USA
- Division of Pulmonary Critical Care Medicine, Duke University School of Medicine, Durham, NC, 27713, USA
| | - Azra Bihorac
- Department of Medicine, University of Florida College of Medicine Gainesville, Malachowsky Hall, 1889 Museum Road, Suite 2410, Gainesville, FL, 32611, USA
| | - Leo Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Matthew Churpek
- Department of Medicine, University of Wisconsin, 600 Highland Ave, Madison, WI, 53792, USA
| | - Nicoleta J Economou-Zavlanos
- Algorithm-Based Clinical Decision Support (ABCDS) Oversight, Office of Vice Dean of Data Science, School of Medicine, Duke University, Durham, NC, 27705, USA
| | - Paul Elbers
- Department of Intensive Care, Amsterdam UMC, Amsterdam, NL, USA
- Amsterdam UMC, ZH.7D.167, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - Suchi Saria
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins Medical Institutions, Johns Hopkins University, 333 Malone Hall, 300 Wolfe Street, Baltimore, MD, USA
- Department of Medicine, Johns Hopkins School of Medicine, AI and Health Lab, Johns Hopkins University, Baltimore, MD, USA
- Bayesian Health, New york, NY, 10282, USA
| | - Vincent Liu
- Department of Medicine, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Road, Mail Code UHN67, Portland, OR, 97239-3098, USA
- , 2000 Broadway, Oakland, CA, 94612, USA
| | - Patrick G Lyons
- Department of Medicine, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Road, Mail Code UHN67, Portland, OR, 97239-3098, USA
| | - Benjamin Shickel
- Department of Medicine, University of Florida College of Medicine Gainesville, Malachowsky Hall, 1889 Museum Road, Suite 2410, Gainesville, FL, 32611, USA
- Amsterdam UMC, ZH.7D.167, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - Patrick Toral
- Department of Intensive Care, Amsterdam UMC, Amsterdam, NL, USA
- Amsterdam UMC, ZH.7D.165, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - David Tscholl
- Institute of Anesthesiology, University Hospital Zurich, University of Zurich, Frauenklinikstrasse 10, 8091, Zurich, Switzerland
| | - Gilles Clermont
- Department of Critical Care Medicine, School of Medicine, University of Pittsburgh, 638 Scaife Hall, 3550 Terrace Street, Pittsburgh, PA, 15261, USA
- VA Pittsburgh Health System, 131A Building 30, 4100 Allequippa St, Pittsburgh, PA, 15240, USA
| |
Collapse
|
3
|
Bertelsen PS, Bossen C, Knudsen C, Pedersen AM. Data work and practices in healthcare: A scoping review. Int J Med Inform 2024; 184:105348. [PMID: 38309238 DOI: 10.1016/j.ijmedinf.2024.105348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/17/2024] [Accepted: 01/23/2024] [Indexed: 02/05/2024]
Abstract
CONTEXT In healthcare, digitization has been widespread and profound, entailing a deluge of data. This has spurred ambitions for healthcare to become data-driven to improve efficiency and quality, and within medicine itself to improve diagnosing and treating diseases. The generation and processing of data requires human intervention and work, though this is often not acknowledged. PURPOSE The paper investigates who, where, by which means, and for which purposes data work is conducted which is crucial for healthcare managers and policy makers if ambitions to become data-driven are to succeed. To guide further research, it also provides an overview of existing research on data work and practices. METHODS We conducted a scoping review based on a search for papers including the terms healthcare or health care combined with at least one of the following terms: data work, data worker*, data practice*, data practitioner* in Scopus and Web of Science. 74 papers on data work or practices in healthcare were included. ANALYSIS The 74 papers were coded and analyzed regarding the following themes: the kind of data workers and practitioners, organizational settings, involved technologies, purposes, data work tasks, theories and concepts, and definitions of data work and practice. RESULTS Data work is pervasive in healthcare and conducted by various professions and people and in various contexts. The field researching data work and practices is emerging, with publications spread across multiple venues. and there is a need for more precise definitions of data work. Further, data work and practices are useful concepts that have enabled the exploration of those efforts and tasks in detail. CONCLUSION The research on data work and practices in healthcare is emerging and promising. We call for more research to consolidate the field and to better understand and support the work needed for healthcare to become data-driven.
Collapse
Affiliation(s)
| | - Claus Bossen
- Department of Digital Design and Information Studies, Aarhus University, Denmark.
| | - Casper Knudsen
- Department of Sustainability and Planning, Aalborg University, Denmark
| | - Asbjørn M Pedersen
- Department of Digital Design and Information Studies, Aarhus University, Denmark
| |
Collapse
|
4
|
Göndöcs D, Dörfler V. AI in medical diagnosis: AI prediction & human judgment. Artif Intell Med 2024; 149:102769. [PMID: 38462271 DOI: 10.1016/j.artmed.2024.102769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/02/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
AI has long been regarded as a panacea for decision-making and many other aspects of knowledge work; as something that will help humans get rid of their shortcomings. We believe that AI can be a useful asset to support decision-makers, but not that it should replace decision-makers. Decision-making uses algorithmic analysis, but it is not solely algorithmic analysis; it also involves other factors, many of which are very human, such as creativity, intuition, emotions, feelings, and value judgments. We have conducted semi-structured open-ended research interviews with 17 dermatologists to understand what they expect from an AI application to deliver to medical diagnosis. We have found four aggregate dimensions along which the thinking of dermatologists can be described: the ways in which our participants chose to interact with AI, responsibility, 'explainability', and the new way of thinking (mindset) needed for working with AI. We believe that our findings will help physicians who might consider using AI in their diagnosis to understand how to use AI beneficially. It will also be useful for AI vendors in improving their understanding of how medics want to use AI in diagnosis. Further research will be needed to examine if our findings have relevance in the wider medical field and beyond.
Collapse
Affiliation(s)
| | - Viktor Dörfler
- University of Strathclyde Business School, United Kingdom.
| |
Collapse
|
5
|
Balsano C, Burra P, Duvoux C, Alisi A, Piscaglia F, Gerussi A. Artificial Intelligence and liver: Opportunities and barriers. Dig Liver Dis 2023; 55:1455-1461. [PMID: 37718227 DOI: 10.1016/j.dld.2023.08.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/14/2023] [Accepted: 08/17/2023] [Indexed: 09/19/2023]
Abstract
Artificial Intelligence (AI) has recently been shown as an excellent tool for the study of the liver; however, many obstacles still have to be overcome for the digitalization of real-world hepatology. The authors present an overview of the current state of the art on the use of innovative technologies in different areas (big data, translational hepatology, imaging, and transplant setting). In clinical practice, physicians must integrate a vast array of data modalities (medical history, clinical data, laboratory tests, imaging, and pathology slides) to achieve a diagnostic or therapeutic decision. Unfortunately, machine learning and deep learning are still far from really supporting clinicians in real life. In fact, the accuracy of any technological support has no value in medicine without the support of clinicians. To make better use of new technologies, it is essential to improve clinicians' knowledge about them. To this end, the authors propose that collaborative networks for multidisciplinary approaches will improve the rapid implementation of AI systems for developing disease-customized AI-powered clinical decision support tools. The authors also discuss ethical, educational, and legal challenges that must be overcome to build robust bridges and deploy potentially effective AI in real-world clinical settings.
Collapse
Affiliation(s)
- Clara Balsano
- Department of Life, Health and Environmental Sciences-MESVA, School of Emergency-Urgency Medicine, University of L'Aquila, Piazzale Salvatore Tommasi 1, Coppito, L'Aquila 67100, Italy.
| | - Patrizia Burra
- Multivisceral Transplant Unit Gastroenterology Department of Surgery, Oncology and Gastroenterology, Padua University Hospital, Padua, Italy
| | - Christophe Duvoux
- Department of Hepatology, Medical Liver Transplant Unit, Hospital Henri Mondor AP-HP, University of Paris-Est Créteil (UPEC), France
| | - Anna Alisi
- Research Unit of Molecular Genetics of Complex Phenotypes, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Fabio Piscaglia
- Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Alessio Gerussi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy; European Reference Network on Hepatological Diseases (ERN RARE-LIVER), Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| |
Collapse
|
6
|
Henriksen A, Blond L. Executive-centered AI? Designing predictive systems for the public sector. SOCIAL STUDIES OF SCIENCE 2023; 53:738-760. [PMID: 37154115 DOI: 10.1177/03063127231163756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Recent policies and research articles call for turning AI into a form of IA ('intelligence augmentation'), by envisioning systems that center on and enhance humans. Based on a field study at an AI company, this article studies how AI is performed as developers enact two predictive systems along with stakeholders in public sector accounting and public sector healthcare. Inspired by STS theories about values in design, we analyze our empirical data focusing especially on how objectives, structured performances, and divisions of labor are built into the two systems and at whose expense. Our findings reveal that the development of the two AI systems is informed by politically motivated managerial interests in cost-efficiency. This results in AI systems that are (1) designed as managerial tools meant to enable efficiency improvements and cost reductions, and (2) enforced on professionals on the 'shop floor' in a top-down manner. Based on our findings and a discussion drawing on literature on the original visions of human-centered systems design from the 1960s, we argue that turning AI into IA seems dubious, and ask what human-centered AI really means and whether it remains an ideal not easily realizable in practice. More work should be done to rethink human-machine relationships in the age of big data and AI, in this way making the call for ethical and responsible AI more genuine and trustworthy.
Collapse
Affiliation(s)
| | - Lasse Blond
- Danish Technological Institute, Aarhus, Denmark
| |
Collapse
|
7
|
Winter PD, Chico TJA. Using the Non-Adoption, Abandonment, Scale-Up, Spread, and Sustainability (NASSS) Framework to Identify Barriers and Facilitators for the Implementation of Digital Twins in Cardiovascular Medicine. SENSORS (BASEL, SWITZERLAND) 2023; 23:6333. [PMID: 37514627 PMCID: PMC10385429 DOI: 10.3390/s23146333] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 06/26/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
A digital twin is a computer-based "virtual" representation of a complex system, updated using data from the "real" twin. Digital twins are established in product manufacturing, aviation, and infrastructure and are attracting significant attention in medicine. In medicine, digital twins hold great promise to improve prevention of cardiovascular diseases and enable personalised health care through a range of Internet of Things (IoT) devices which collect patient data in real-time. However, the promise of such new technology is often met with many technical, scientific, social, and ethical challenges that need to be overcome-if these challenges are not met, the technology is therefore less likely on balance to be adopted by stakeholders. The purpose of this work is to identify the facilitators and barriers to the implementation of digital twins in cardiovascular medicine. Using, the Non-adoption, Abandonment, Scale-up, Spread, and Sustainability (NASSS) framework, we conducted a document analysis of policy reports, industry websites, online magazines, and academic publications on digital twins in cardiovascular medicine, identifying potential facilitators and barriers to adoption. Our results show key facilitating factors for implementation: preventing cardiovascular disease, in silico simulation and experimentation, and personalised care. Key barriers to implementation included: establishing real-time data exchange, perceived specialist skills required, high demand for patient data, and ethical risks related to privacy and surveillance. Furthermore, the lack of empirical research on the attributes of digital twins by different research groups, the characteristics and behaviour of adopters, and the nature and extent of social, regulatory, economic, and political contexts in the planning and development process of these technologies is perceived as a major hindering factor to future implementation.
Collapse
Affiliation(s)
- Peter D Winter
- School of Sociology, Politics, and International Studies (SPAIS), University of Bristol, Bristol BS8 1TU, UK
| | - Timothy J A Chico
- Department of Infection, Immunity and Cardiovascular Disease (IICD), University of Sheffield, Sheffield S10 2RX, UK
| |
Collapse
|
8
|
Dreizin D, Zhang L, Sarkar N, Bodanapally UK, Li G, Hu J, Chen H, Khedr M, Khetan U, Campbell P, Unberath M. Accelerating voxelwise annotation of cross-sectional imaging through AI collaborative labeling with quality assurance and bias mitigation. FRONTIERS IN RADIOLOGY 2023; 3:1202412. [PMID: 37485306 PMCID: PMC10362988 DOI: 10.3389/fradi.2023.1202412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Background precision-medicine quantitative tools for cross-sectional imaging require painstaking labeling of targets that vary considerably in volume, prohibiting scaling of data annotation efforts and supervised training to large datasets for robust and generalizable clinical performance. A straight-forward time-saving strategy involves manual editing of AI-generated labels, which we call AI-collaborative labeling (AICL). Factors affecting the efficacy and utility of such an approach are unknown. Reduction in time effort is not well documented. Further, edited AI labels may be prone to automation bias. Purpose In this pilot, using a cohort of CTs with intracavitary hemorrhage, we evaluate both time savings and AICL label quality and propose criteria that must be met for using AICL annotations as a high-throughput, high-quality ground truth. Methods 57 CT scans of patients with traumatic intracavitary hemorrhage were included. No participant recruited for this study had previously interpreted the scans. nnU-net models trained on small existing datasets for each feature (hemothorax/hemoperitoneum/pelvic hematoma; n = 77-253) were used in inference. Two common scenarios served as baseline comparison- de novo expert manual labeling, and expert edits of trained staff labels. Parameters included time effort and image quality graded by a blinded independent expert using a 9-point scale. The observer also attempted to discriminate AICL and expert labels in a random subset (n = 18). Data were compared with ANOVA and post-hoc paired signed rank tests with Bonferroni correction. Results AICL reduced time effort 2.8-fold compared to staff label editing, and 8.7-fold compared to expert labeling (corrected p < 0.0006). Mean Likert grades for AICL (8.4, SD:0.6) were significantly higher than for expert labels (7.8, SD:0.9) and edited staff labels (7.7, SD:0.8) (corrected p < 0.0006). The independent observer failed to correctly discriminate AI and human labels. Conclusion For our use case and annotators, AICL facilitates rapid large-scale curation of high-quality ground truth. The proposed quality control regime can be employed by other investigators prior to embarking on AICL for segmentation tasks in large datasets.
Collapse
Affiliation(s)
- David Dreizin
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Lei Zhang
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Nathan Sarkar
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Uttam K. Bodanapally
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Guang Li
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Jiazhen Hu
- Johns Hopkins University, Baltimore, MD, United States
| | - Haomin Chen
- Johns Hopkins University, Baltimore, MD, United States
| | - Mustafa Khedr
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Udit Khetan
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Peter Campbell
- Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States
| | | |
Collapse
|
9
|
Wehkamp K, Krawczak M, Schreiber S. The Quality and Utility of Artificial Intelligence in Patient Care. DEUTSCHES ARZTEBLATT INTERNATIONAL 2023; 120:463-469. [PMID: 37218054 PMCID: PMC10487679 DOI: 10.3238/arztebl.m2023.0124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 11/30/2022] [Accepted: 05/08/2023] [Indexed: 05/24/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is increasingly being used in patient care. In the future, physicians will need to understand not only the basic functioning of AI applications, but also their quality, utility, and risks. METHODS This article is based on a selective review of the literature on the principles, quality, limitations, and benefits AI applications in patient care, along with examples of individual applications. RESULTS The number of AI applications in patient care is rising, with more than 500 approvals in the United States to date. Their quality and utility are based on a number of interdependent factors, including the real-life setting, the type and amount of data collected, the choice of variables used by the application, the algorithms used, and the goal and implementation of each application. Bias (which may be hidden) and errors can arise at all these levels. Any evaluation of the quality and utility of an AI application must, therefore, be conducted according to the scientific principles of evidence-based medicine-a requirement that is often hampered by a lack of transparency. CONCLUSION AI has the potential to improve patient care while meeting the challenge of dealing with an ever-increasing surfeit of information and data in medicine with limited human resources. The limitations and risks of AI applications require critical and responsible consideration. This can best be achieved through a combination of scientific.
Collapse
Affiliation(s)
- Kai Wehkamp
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Lübeck, Kiel, Germany
- Department for Medical Management, MSH Medical School Hamburg, Hamburg, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany
| | - Stefan Schreiber
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Lübeck, Kiel, Germany
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, University Medical Center Schleswig-Holstein Campus Kiel, Germany
| |
Collapse
|
10
|
Cummings BC, Blackmer JM, Motyka JR, Farzaneh N, Cao L, Bisco EL, Glassbrook JD, Roebuck MD, Gillies CE, Admon AJ, Medlin RP, Singh K, Sjoding MW, Ward KR, Ansari S. External Validation and Comparison of a General Ward Deterioration Index Between Diversely Different Health Systems. Crit Care Med 2023; 51:775-786. [PMID: 36927631 PMCID: PMC10187626 DOI: 10.1097/ccm.0000000000005837] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
OBJECTIVES Implementing a predictive analytic model in a new clinical environment is fraught with challenges. Dataset shifts such as differences in clinical practice, new data acquisition devices, or changes in the electronic health record (EHR) implementation mean that the input data seen by a model can differ significantly from the data it was trained on. Validating models at multiple institutions is therefore critical. Here, using retrospective data, we demonstrate how Predicting Intensive Care Transfers and other UnfoReseen Events (PICTURE), a deterioration index developed at a single academic medical center, generalizes to a second institution with significantly different patient population. DESIGN PICTURE is a deterioration index designed for the general ward, which uses structured EHR data such as laboratory values and vital signs. SETTING The general wards of two large hospitals, one an academic medical center and the other a community hospital. SUBJECTS The model has previously been trained and validated on a cohort of 165,018 general ward encounters from a large academic medical center. Here, we apply this model to 11,083 encounters from a separate community hospital. INTERVENTIONS None. MEASUREMENTS AND MAIN RESULTS The hospitals were found to have significant differences in missingness rates (> 5% difference in 9/52 features), deterioration rate (4.5% vs 2.5%), and racial makeup (20% non-White vs 49% non-White). Despite these differences, PICTURE's performance was consistent (area under the receiver operating characteristic curve [AUROC], 0.870; 95% CI, 0.861-0.878), area under the precision-recall curve (AUPRC, 0.298; 95% CI, 0.275-0.320) at the first hospital; AUROC 0.875 (0.851-0.902), AUPRC 0.339 (0.281-0.398) at the second. AUPRC was standardized to a 2.5% event rate. PICTURE also outperformed both the Epic Deterioration Index and the National Early Warning Score at both institutions. CONCLUSIONS Important differences were observed between the two institutions, including data availability and demographic makeup. PICTURE was able to identify general ward patients at risk of deterioration at both hospitals with consistent performance (AUROC and AUPRC) and compared favorably to existing metrics.
Collapse
Affiliation(s)
- Brandon C Cummings
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Joseph M Blackmer
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Jonathan R Motyka
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Negar Farzaneh
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Loc Cao
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Erin L Bisco
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | | | - Michael D Roebuck
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
- Department of Emergency Medicine, Hurley Medical Center, Flint, MI
| | - Christopher E Gillies
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Andrew J Admon
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI
- Medicine Service, LTC Charles S. Kettles VA Medical Center, Ann Arbor, MI
| | - Richard P Medlin
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Karandeep Singh
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI
- Precision Health, University of Michigan, Ann Arbor, MI
| | - Michael W Sjoding
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI
- Precision Health, University of Michigan, Ann Arbor, MI
| | - Kevin R Ward
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI
| | - Sardar Ansari
- The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI
| |
Collapse
|
11
|
Winter PD, Carusi A. (De)troubling transparency: artificial intelligence (AI) for clinical applications. MEDICAL HUMANITIES 2023; 49:17-26. [PMID: 35545432 PMCID: PMC9985768 DOI: 10.1136/medhum-2021-012318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/05/2022] [Indexed: 06/15/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) techniques occupy a prominent role in medical research in terms of the innovation and development of new technologies. However, while many perceive AI as a technology of promise and hope-one that is allowing for more early and accurate diagnosis-the acceptance of AI and ML technologies in hospitals remains low. A major reason for this is the lack of transparency associated with these technologies, in particular epistemic transparency, which results in AI disturbing or troubling established knowledge practices in clinical contexts. In this article, we describe the development process of one AI application for a clinical setting. We show how epistemic transparency is negotiated and co-produced in close collaboration between AI developers and clinicians and biomedical scientists, forming the context in which AI is accepted as an epistemic operator. Drawing on qualitative research with collaborative researchers developing an AI technology for the early diagnosis of a rare respiratory disease (pulmonary hypertension/PH), this paper examines how including clinicians and clinical scientists in the collaborative practices of AI developers de-troubles transparency. Our research shows how de-troubling transparency occurs in three dimensions of AI development relating to PH: querying of data sets, building software and training the model The close collaboration results in an AI application that is at once social and technological: it integrates and inscribes into the technology the knowledge processes of the different participants in its development. We suggest that it is a misnomer to call these applications 'artificial' intelligence, and that they would be better developed and implemented if they were reframed as forms of sociotechnical intelligence.
Collapse
Affiliation(s)
- Peter David Winter
- School of Sociology, Politics and International Studies, University of Bristol, Bristol, UK
| | - Annamaria Carusi
- Interchange Research, London, UK
- Department of Science and Technology Studies, University College London, London, London, UK
| |
Collapse
|
12
|
Artificial intelligence for strengthening healthcare systems in low- and middle-income countries: a systematic scoping review. NPJ Digit Med 2022; 5:162. [PMID: 36307479 PMCID: PMC9614192 DOI: 10.1038/s41746-022-00700-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 09/29/2022] [Indexed: 02/08/2023] Open
Abstract
In low- and middle-income countries (LMICs), AI has been promoted as a potential means of strengthening healthcare systems by a growing number of publications. We aimed to evaluate the scope and nature of AI technologies in the specific context of LMICs. In this systematic scoping review, we used a broad variety of AI and healthcare search terms. Our literature search included records published between 1st January 2009 and 30th September 2021 from the Scopus, EMBASE, MEDLINE, Global Health and APA PsycInfo databases, and grey literature from a Google Scholar search. We included studies that reported a quantitative and/or qualitative evaluation of a real-world application of AI in an LMIC health context. A total of 10 references evaluating the application of AI in an LMIC were included. Applications varied widely, including: clinical decision support systems, treatment planning and triage assistants and health chatbots. Only half of the papers reported which algorithms and datasets were used in order to train the AI. A number of challenges of using AI tools were reported, including issues with reliability, mixed impacts on workflows, poor user friendliness and lack of adeptness with local contexts. Many barriers exists that prevent the successful development and adoption of well-performing, context-specific AI tools, such as limited data availability, trust and evidence of cost-effectiveness in LMICs. Additional evaluations of the use of AI in healthcare in LMICs are needed in order to identify their effectiveness and reliability in real-world settings and to generate understanding for best practices for future implementations.
Collapse
|
13
|
White RD, Demirer M, Gupta V, Sebro RA, Kusumoto FM, Erdal BS. Pre-deployment assessment of an AI model to assist radiologists in chest X-ray detection and identification of lead-less implanted electronic devices for pre-MRI safety screening: realized implementation needs and proposed operational solutions. J Med Imaging (Bellingham) 2022; 9:054504. [PMID: 36310648 PMCID: PMC9603740 DOI: 10.1117/1.jmi.9.5.054504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 09/23/2022] [Indexed: 09/29/2023] Open
Abstract
Purpose Chest X-ray (CXR) use in pre-MRI safety screening, such as for lead-less implanted electronic device (LLIED) recognition, is common. To assist CXR interpretation, we "pre-deployed" an artificial intelligence (AI) model to assess (1) accuracies in LLIED-type (and consequently safety-level) identification, (2) safety implications of LLIED nondetections or misidentifications, (3) infrastructural or workflow requirements, and (4) demands related to model adaptation to real-world conditions. Approach A two-tier cascading methodology for LLIED detection/localization and identification on a frontal CXR was applied to evaluate the performance of the original nine-class AI model. With the unexpected early appearance of LLIED types during simulated real-world trialing, retraining of a newer 12-class version preceded retrialing. A zero footprint (ZF) graphical user interface (GUI)/viewer with DICOM-based output was developed for inference-result display and adjudication, supporting end-user engagement and model continuous learning and/or modernization. Results During model testing or trialing using both the nine-class and 12-class models, robust detection/localization was consistently 100%, with mAP 0.99 from fivefold cross-validation. Safety-level categorization was high during both testing ( AUC ≥ 0.98 and ≥ 0.99 , respectively) and trialing (accuracy 98% and 97%, respectively). LLIED-type identifications by the two models during testing (1) were 98.9% and 99.5% overall correct and (2) consistently showed AUC ≥ 0.92 (1.00 for 8/9 and 9/12 LLIED-types, respectively). Pre-deployment trialing of both models demonstrated overall type-identification accuracies of 94.5% and 95%, respectively. Of the small number of misidentifications, none involved MRI-stringently conditional or MRI-unsafe types of LLIEDs. Optimized ZF GUI/viewer operations led to greater user-friendliness for radiologist engagement. Conclusions Our LLIED-related AI methodology supports (1) 100% detection sensitivity, (2) high identification (including MRI-safety) accuracy, and (3) future model deployment with facilitated inference-result display and adjudication for ongoing model adaptation to future real-world experiences.
Collapse
Affiliation(s)
- Richard D. White
- Mayo Clinic, Department of Radiology, Center for Augmented Intelligence in Imaging, Jacksonville, Florida, United States
| | - Mutlu Demirer
- Mayo Clinic, Department of Radiology, Center for Augmented Intelligence in Imaging, Jacksonville, Florida, United States
| | - Vikash Gupta
- Mayo Clinic, Department of Radiology, Center for Augmented Intelligence in Imaging, Jacksonville, Florida, United States
| | - Ronnie A. Sebro
- Mayo Clinic, Department of Radiology, Center for Augmented Intelligence in Imaging, Jacksonville, Florida, United States
| | - Frederick M. Kusumoto
- Mayo Clinic, Department of Cardiovascular Medicine, Jacksonville, Florida, United States
| | - Barbaros Selnur Erdal
- Mayo Clinic, Department of Radiology, Center for Augmented Intelligence in Imaging, Jacksonville, Florida, United States
| |
Collapse
|
14
|
King Z, Farrington J, Utley M, Kung E, Elkhodair S, Harris S, Sekula R, Gillham J, Li K, Crowe S. Machine learning for real-time aggregated prediction of hospital admission for emergency patients. NPJ Digit Med 2022; 5:104. [PMID: 35882903 PMCID: PMC9321296 DOI: 10.1038/s41746-022-00649-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 07/04/2022] [Indexed: 12/23/2022] Open
Abstract
Machine learning for hospital operations is under-studied. We present a prediction pipeline that uses live electronic health-records for patients in a UK teaching hospital’s emergency department (ED) to generate short-term, probabilistic forecasts of emergency admissions. A set of XGBoost classifiers applied to 109,465 ED visits yielded AUROCs from 0.82 to 0.90 depending on elapsed visit-time at the point of prediction. Patient-level probabilities of admission were aggregated to forecast the number of admissions among current ED patients and, incorporating patients yet to arrive, total emergency admissions within specified time-windows. The pipeline gave a mean absolute error (MAE) of 4.0 admissions (mean percentage error of 17%) versus 6.5 (32%) for a benchmark metric. Models developed with 104,504 later visits during the Covid-19 pandemic gave AUROCs of 0.68–0.90 and MAE of 4.2 (30%) versus a 4.9 (33%) benchmark. We discuss how we surmounted challenges of designing and implementing models for real-time use, including temporal framing, data preparation, and changing operational conditions.
Collapse
Affiliation(s)
- Zella King
- Clinical Operational Research Unit, University College London, 4 Taviton Street, London, WC1H 0BT, UK. .,Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK.
| | - Joseph Farrington
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
| | - Martin Utley
- Clinical Operational Research Unit, University College London, 4 Taviton Street, London, WC1H 0BT, UK
| | - Enoch Kung
- Clinical Operational Research Unit, University College London, 4 Taviton Street, London, WC1H 0BT, UK
| | - Samer Elkhodair
- University College London Hospitals NHS Foundation Trust, 250 Euston Road, London, NW1 2PG, UK
| | - Steve Harris
- University College London Hospitals NHS Foundation Trust, 250 Euston Road, London, NW1 2PG, UK
| | - Richard Sekula
- University College London Hospitals NHS Foundation Trust, 250 Euston Road, London, NW1 2PG, UK
| | - Jonathan Gillham
- University College London Hospitals NHS Foundation Trust, 250 Euston Road, London, NW1 2PG, UK
| | - Kezhi Li
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
| | - Sonya Crowe
- Clinical Operational Research Unit, University College London, 4 Taviton Street, London, WC1H 0BT, UK
| |
Collapse
|
15
|
Chamberlin JH, Aquino G, Schoepf UJ, Nance S, Godoy F, Carson L, Giovagnoli VM, Gill CE, McGill LJ, O'Doherty J, Emrich T, Burt JR, Baruah D, Varga-Szemes A, Kabakus IM. An Interpretable Chest CT Deep Learning Algorithm for Quantification of COVID-19 Lung Disease and Prediction of Inpatient Morbidity and Mortality. Acad Radiol 2022; 29:1178-1188. [PMID: 35610114 PMCID: PMC8977389 DOI: 10.1016/j.acra.2022.03.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 03/17/2022] [Accepted: 03/24/2022] [Indexed: 12/23/2022]
Abstract
Rationale and Objectives The burden of coronavirus disease 2019 (COVID-19) airspace opacities is time consuming and challenging to quantify on computed tomography. The purpose of this study was to evaluate the ability of a deep convolutional neural network (dCNN) to predict inpatient outcomes associated with COVID-19 pneumonia. Materials and Methods A previously trained dCNN was tested on an external validation cohort of 241 patients who presented to the emergency department and received a chest computed tomography scan, 93 with COVID-19 and 168 without. Airspace opacity scoring systems were defined by the extent of airspace opacity in each lobe, totaled across the entire lungs. Expert and dCNN scores were concurrently evaluated for interobserver agreement, while both dCNN identified airspace opacity scoring and raw opacity values were used in the prediction of COVID-19 diagnosis and inpatient outcomes. Results Interobserver agreement for airspace opacity scoring was 0.892 (95% CI 0.834-0.930). Probability of each outcome behaved as a logistic function of the opacity scoring (25% intensive care unit admission at score of 13/25, 25% intubation at 17/25, and 25% mortality at 20/25). Length of hospitalization, intensive care unit stay, and intubation were associated with larger airspace opacity score (p = 0.032, 0.039, 0.036, respectively). Conclusion The tested dCNN was highly predictive of inpatient outcomes, performs at a near expert level, and provides added value for clinicians in terms of prognostication and disease severity.
Collapse
|
16
|
Bai E, Song SL, Fraser HSF, Ranney ML. A Graphical Toolkit for Longitudinal Dataset Maintenance and Predictive Model Training in Health Care. Appl Clin Inform 2022; 13:56-66. [PMID: 35172371 PMCID: PMC8850007 DOI: 10.1055/s-0041-1740923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/09/2021] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Predictive analytic models, including machine learning (ML) models, are increasingly integrated into electronic health record (EHR)-based decision support tools for clinicians. These models have the potential to improve care, but are challenging to internally validate, implement, and maintain over the long term. Principles of ML operations (MLOps) may inform development of infrastructure to support the entire ML lifecycle, from feature selection to long-term model deployment and retraining. OBJECTIVES This study aimed to present the conceptual prototypes for a novel predictive model management system and to evaluate the acceptability of the system among three groups of end users. METHODS Based on principles of user-centered software design, human-computer interaction, and ethical design, we created graphical prototypes of a web-based MLOps interface to support the construction, deployment, and maintenance of models using EHR data. To assess the acceptability of the interface, we conducted semistructured user interviews with three groups of users (health informaticians, clinical and data stakeholders, chief information officers) and evaluated preliminary usability using the System Usability Scale (SUS). We subsequently revised prototypes based on user input and developed user case studies. RESULTS Our prototypes include design frameworks for feature selection, model training, deployment, long-term maintenance, visualization over time, and cross-functional collaboration. Users were able to complete 71% of prompted tasks without assistance. The average SUS score of the initial prototype was 75.8 out of 100, translating to a percentile range of 70 to 79, a letter grade of B, and an adjective rating of "good." We reviewed persona-based case studies that illustrate functionalities of this novel prototype. CONCLUSION The initial graphical prototypes of this MLOps system are preliminarily usable and demonstrate an unmet need within the clinical informatics landscape.
Collapse
Affiliation(s)
- Eric Bai
- Warren Alpert Medical School, Brown University, Providence, Rhode Island, United States
| | - Sophia L. Song
- Warren Alpert Medical School, Brown University, Providence, Rhode Island, United States
| | - Hamish S. F. Fraser
- Brown University Center for Biomedical Informatics, Providence, Rhode Island, United States
| | - Megan L. Ranney
- Brown-Lifespan Center for Digital Health, Providence, Rhode Island, United States
| |
Collapse
|
17
|
Matthiesen S, Diederichsen SZ, Hansen MKH, Villumsen C, Lassen MCH, Jacobsen PK, Risum N, Winkel BG, Philbert BT, Svendsen JH, Andersen TO. Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study. JMIR Hum Factors 2021; 8:e26964. [PMID: 34842528 PMCID: PMC8665383 DOI: 10.2196/26964] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/23/2021] [Accepted: 10/11/2021] [Indexed: 11/30/2022] Open
Abstract
Background Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models. However, few AI-based tools have been implemented in cardiology clinics because of the sociotechnical challenges during transitioning from algorithm development to real-world implementation. Objective This study explored how an ML-based tool for predicting ventricular tachycardia and ventricular fibrillation (VT/VF) could support clinical decision-making in the remote monitoring of patients with an implantable cardioverter defibrillator (ICD). Methods Seven experienced electrophysiologists participated in a near-live feasibility and qualitative study, which included walkthroughs of 5 blinded retrospective patient cases, use of the prediction tool, and questionnaires and interview questions. All sessions were video recorded, and sessions evaluating the prediction tool were transcribed verbatim. Data were analyzed through an inductive qualitative approach based on grounded theory. Results The prediction tool was found to have potential for supporting decision-making in ICD remote monitoring by providing reassurance, increasing confidence, acting as a second opinion, reducing information search time, and enabling delegation of decisions to nurses and technicians. However, the prediction tool did not lead to changes in clinical action and was found less useful in cases where the quality of data was poor or when VT/VF predictions were found to be irrelevant for evaluating the patient. Conclusions When transitioning from AI development to testing its feasibility for clinical implementation, we need to consider the following: expectations must be aligned with the intended use of AI; trust in the prediction tool is likely to emerge from real-world use; and AI accuracy is relational and dependent on available information and local workflows. Addressing the sociotechnical gap between the development and implementation of clinical decision-support tools based on ML in cardiac care is essential for succeeding with adoption. It is suggested to include clinical end-users, clinical contexts, and workflows throughout the overall iterative approach to design, development, and implementation.
Collapse
Affiliation(s)
- Stina Matthiesen
- Department of Computer Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark.,Vital Beats, Copenhagen, Denmark
| | - Søren Zöga Diederichsen
- Vital Beats, Copenhagen, Denmark.,Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | | | | | | | - Peter Karl Jacobsen
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Niels Risum
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Bo Gregers Winkel
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Berit T Philbert
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Jesper Hastrup Svendsen
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.,Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Tariq Osman Andersen
- Department of Computer Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark.,Vital Beats, Copenhagen, Denmark
| |
Collapse
|
18
|
Abonamah AA, Tariq MU, Shilbayeh S. On the Commoditization of Artificial Intelligence. Front Psychol 2021; 12:696346. [PMID: 34659012 PMCID: PMC8514611 DOI: 10.3389/fpsyg.2021.696346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 08/16/2021] [Indexed: 11/13/2022] Open
Abstract
As artificial intelligence's potential and pervasiveness continue to increase, its strategic importance, effects, and management must be closely examined. Societies, governments, and business organizations need to view artificial intelligence (AI) technologies and their usage from an entirely different perspective. AI is poised to have a tremendous impact on every aspect of our lives. Therefore, it must have a broader view that transcends AI's technical capabilities and perceived value, including areas of AI's impact and influence. Nicholas G. Carr's seminal paper "IT Does not Matter (Carr, 2003) explained how IT's potential and ubiquity have increased, but IT's strategic importance has declined with time. AI is poised to meet the same fate as IT. In fact, the commoditization of AI has already begun. This paper presents the arguments to demonstrate that AI is moving rapidly in this direction. It also proposes an artificial intelligence-based organizational framework to gain value-added elements for lowering the impact of AI commoditization.
Collapse
Affiliation(s)
| | | | - Samar Shilbayeh
- Abu Dhabi School of Management, Abu Dhabi, United Arab Emirates
| |
Collapse
|
19
|
Sharma M, Luthra S, Joshi S, Kumar A. Implementing challenges of artificial intelligence: Evidence from public manufacturing sector of an emerging economy. GOVERNMENT INFORMATION QUARTERLY 2021. [DOI: 10.1016/j.giq.2021.101624] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
20
|
Chen J, Xiang Y, Li L, Xu A, Hu W, Lin Z, Xu F, Lin D, Chen W, Lin H. Application of Surgical Decision Model for Patients With Childhood Cataract: A Study Based on Real World Data. Front Bioeng Biotechnol 2021; 9:657866. [PMID: 34513804 PMCID: PMC8427305 DOI: 10.3389/fbioe.2021.657866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/04/2021] [Indexed: 11/13/2022] Open
Abstract
Reliable validated methods are necessary to verify the performance of diagnosis and therapy-assisted models in clinical practice. However, some validated results have research bias and may not reflect the results of real-world application. In addition, the conduct of clinical trials has executive risks for the indeterminate effectiveness of models and it is challenging to finish validated clinical trials of rare diseases. Real world data (RWD) can probably solve this problem. In our study, we collected RWD from 251 patients with a rare disease, childhood cataract (CC) and conducted a retrospective study to validate the CC surgical decision model. The consistency of the real surgical type and recommended surgical type was 94.16%. In the cataract extraction (CE) group, the model recommended the same surgical type for 84.48% of eyes, but the model advised conducting cataract extraction and primary intraocular lens implantation (CE + IOL) surgery in 15.52% of eyes, which was different from the real-world choices. In the CE + IOL group, the model recommended the same surgical type for 100% of eyes. The real-recommended matched rates were 94.22% in the eyes of bilateral patients and 90.38% in the eyes of unilateral patients. Our study is the first to apply RWD to complete a retrospective study evaluating a clinical model, and the results indicate the availability and feasibility of applying RWD in model validation and serve guidance for intelligent model evaluation for rare diseases.
Collapse
Affiliation(s)
- Jingjing Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yifan Xiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Longhui Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Andi Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Weiling Hu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhuoling Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Fabao Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Duoru Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Weirong Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.,Center of Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
21
|
Shashikumar SP, Wardi G, Malhotra A, Nemati S. Artificial intelligence sepsis prediction algorithm learns to say "I don't know". NPJ Digit Med 2021; 4:134. [PMID: 34504260 PMCID: PMC8429719 DOI: 10.1038/s41746-021-00504-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 08/09/2021] [Indexed: 01/07/2023] Open
Abstract
Sepsis is a leading cause of morbidity and mortality worldwide. Early identification of sepsis is important as it allows timely administration of potentially life-saving resuscitation and antimicrobial therapy. We present COMPOSER (COnformal Multidimensional Prediction Of SEpsis Risk), a deep learning model for the early prediction of sepsis, specifically designed to reduce false alarms by detecting unfamiliar patients/situations arising from erroneous data, missingness, distributional shift and data drifts. COMPOSER flags these unfamiliar cases as indeterminate rather than making spurious predictions. Six patient cohorts (515,720 patients) curated from two healthcare systems in the United States across intensive care units (ICU) and emergency departments (ED) were used to train and externally and temporally validate this model. In a sequential prediction setting, COMPOSER achieved a consistently high area under the curve (AUC) (ICU: 0.925-0.953; ED: 0.938-0.945). Out of over 6 million prediction windows roughly 20% and 8% were identified as indeterminate amongst non-septic and septic patients, respectively. COMPOSER provided early warning within a clinically actionable timeframe (ICU: 12.2 [3.2 22.8] and ED: 2.1 [0.8 4.5] hours prior to first antibiotics order) across all six cohorts, thus allowing for identification and prioritization of patients at high risk for sepsis.
Collapse
Affiliation(s)
| | - Gabriel Wardi
- Department of Emergency Medicine, University of California San Diego, San Diego, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, USA
| | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California San Diego, San Diego, USA.
| |
Collapse
|
22
|
Salem H, Soria D, Lund JN, Awwad A. A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology. BMC Med Inform Decis Mak 2021; 21:223. [PMID: 34294092 PMCID: PMC8299670 DOI: 10.1186/s12911-021-01585-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 07/08/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Testing a hypothesis for 'factors-outcome effect' is a common quest, but standard statistical regression analysis tools are rendered ineffective by data contaminated with too many noisy variables. Expert Systems (ES) can provide an alternative methodology in analysing data to identify variables with the highest correlation to the outcome. By applying their effective machine learning (ML) abilities, significant research time and costs can be saved. The study aims to systematically review the applications of ES in urological research and their methodological models for effective multi-variate analysis. Their domains, development and validity will be identified. METHODS The PRISMA methodology was applied to formulate an effective method for data gathering and analysis. This study search included seven most relevant information sources: WEB OF SCIENCE, EMBASE, BIOSIS CITATION INDEX, SCOPUS, PUBMED, Google Scholar and MEDLINE. Eligible articles were included if they applied one of the known ML models for a clear urological research question involving multivariate analysis. Only articles with pertinent research methods in ES models were included. The analysed data included the system model, applications, input/output variables, target user, validation, and outcomes. Both ML models and the variable analysis were comparatively reported for each system. RESULTS The search identified n = 1087 articles from all databases and n = 712 were eligible for examination against inclusion criteria. A total of 168 systems were finally included and systematically analysed demonstrating a recent increase in uptake of ES in academic urology in particular artificial neural networks with 31 systems. Most of the systems were applied in urological oncology (prostate cancer = 15, bladder cancer = 13) where diagnostic, prognostic and survival predictor markers were investigated. Due to the heterogeneity of models and their statistical tests, a meta-analysis was not feasible. CONCLUSION ES utility offers an effective ML potential and their applications in research have demonstrated a valid model for multi-variate analysis. The complexity of their development can challenge their uptake in urological clinics whilst the limitation of the statistical tools in this domain has created a gap for further research studies. Integration of computer scientists in academic units has promoted the use of ES in clinical urological research.
Collapse
Affiliation(s)
- Hesham Salem
- Urological Department, NIHR Nottingham Biomedical Research Centre, School of Medicine, University of Nottingham, Nottingham, NG72UH, UK
- University Hospitals of Derby and Burton NHS Foundation Trust, Royal Derby Hospital, University of Nottingham, Derby, DE22 3DT, UK
| | - Daniele Soria
- School of Computer Science and Engineering, University of Westminster, London, W1W 6UW, UK
| | - Jonathan N Lund
- University Hospitals of Derby and Burton NHS Foundation Trust, Royal Derby Hospital, University of Nottingham, Derby, DE22 3DT, UK
| | - Amir Awwad
- NIHR Nottingham Biomedical Research Centre, Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, Nottingham, NG72UH, UK.
- Department of Medical Imaging, London Health Sciences Centre, University of Hospital, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada.
| |
Collapse
|
23
|
Huang D, Bai H, Wang L, Hou Y, Li L, Xia Y, Yan Z, Chen W, Chang L, Li W. The Application and Development of Deep Learning in Radiotherapy: A Systematic Review. Technol Cancer Res Treat 2021; 20:15330338211016386. [PMID: 34142614 PMCID: PMC8216350 DOI: 10.1177/15330338211016386] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
With the massive use of computers, the growth and explosion of data has greatly promoted the development of artificial intelligence (AI). The rise of deep learning (DL) algorithms, such as convolutional neural networks (CNN), has provided radiation oncologists with many promising tools that can simplify the complex radiotherapy process in the clinical work of radiation oncology, improve the accuracy and objectivity of diagnosis, and reduce the workload, thus enabling clinicians to spend more time on advanced decision-making tasks. As the development of DL gets closer to clinical practice, radiation oncologists will need to be more familiar with its principles to properly evaluate and use this powerful tool. In this paper, we explain the development and basic concepts of AI and discuss its application in radiation oncology based on different task categories of DL algorithms. This work clarifies the possibility of further development of DL in radiation oncology.
Collapse
Affiliation(s)
- Danju Huang
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Han Bai
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Li Wang
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Yu Hou
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Lan Li
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Yaoxiong Xia
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Zhirui Yan
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Wenrui Chen
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Li Chang
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| | - Wenhui Li
- Department of Radiation Oncology, 531840The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming, Yunnan, China
| |
Collapse
|
24
|
Interpretable heartbeat classification using local model-agnostic explanations on ECGs. Comput Biol Med 2021; 133:104393. [PMID: 33915362 DOI: 10.1016/j.compbiomed.2021.104393] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 04/07/2021] [Accepted: 04/07/2021] [Indexed: 12/18/2022]
Abstract
Treatment and prevention of cardiovascular diseases often rely on Electrocardiogram (ECG) interpretation. Dependent on the physician's variability, ECG interpretation is subjective and prone to errors. Machine learning models are often developed and used to support doctors; however, their lack of interpretability stands as one of the main drawbacks of their widespread operation. This paper focuses on an Explainable Artificial Intelligence (XAI) solution to make heartbeat classification more explainable using several state-of-the-art model-agnostic methods. We introduce a high-level conceptual framework for explainable time series and propose an original method that adds temporal dependency between time samples using the time series' derivative. The results were validated in the MIT-BIH arrhythmia dataset: we performed a performance's analysis to evaluate whether the explanations fit the model's behaviour; and employed the 1-D Jaccard's index to compare the subsequences extracted from an interpretable model and the XAI methods used. Our results show that the use of the raw signal and its derivative includes temporal dependency between samples to promote classification explanation. A small but informative user study concludes this study to evaluate the potential of the visual explanations produced by our original method for being adopted in real-world clinical settings, either as diagnostic aids or training resource.
Collapse
|
25
|
Knop M, Weber S, Mueller M, Niehaves B. Human Factors and Technological Characteristics Influencing the Interaction with AI-enabled Clinical Decision Support Systems: A Literature Review (Preprint). JMIR Hum Factors 2021; 9:e28639. [PMID: 35323118 PMCID: PMC8990344 DOI: 10.2196/28639] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/02/2021] [Accepted: 02/07/2022] [Indexed: 01/22/2023] Open
Abstract
Background The digitization and automation of diagnostics and treatments promise to alter the quality of health care and improve patient outcomes, whereas the undersupply of medical personnel, high workload on medical professionals, and medical case complexity increase. Clinical decision support systems (CDSSs) have been proven to help medical professionals in their everyday work through their ability to process vast amounts of patient information. However, comprehensive adoption is partially disrupted by specific technological and personal characteristics. With the rise of artificial intelligence (AI), CDSSs have become an adaptive technology with human-like capabilities and are able to learn and change their characteristics over time. However, research has not reflected on the characteristics and factors essential for effective collaboration between human actors and AI-enabled CDSSs. Objective Our study aims to summarize the factors influencing effective collaboration between medical professionals and AI-enabled CDSSs. These factors are essential for medical professionals, management, and technology designers to reflect on the adoption, implementation, and development of an AI-enabled CDSS. Methods We conducted a literature review including 3 different meta-databases, screening over 1000 articles and including 101 articles for full-text assessment. Of the 101 articles, 7 (6.9%) met our inclusion criteria and were analyzed for our synthesis. Results We identified the technological characteristics and human factors that appear to have an essential effect on the collaboration of medical professionals and AI-enabled CDSSs in accordance with our research objective, namely, training data quality, performance, explainability, adaptability, medical expertise, technological expertise, personality, cognitive biases, and trust. Comparing our results with those from research on non-AI CDSSs, some characteristics and factors retain their importance, whereas others gain or lose relevance owing to the uniqueness of human-AI interactions. However, only a few (1/7, 14%) studies have mentioned the theoretical foundations and patient outcomes related to AI-enabled CDSSs. Conclusions Our study provides a comprehensive overview of the relevant characteristics and factors that influence the interaction and collaboration between medical professionals and AI-enabled CDSSs. Rather limited theoretical foundations currently hinder the possibility of creating adequate concepts and models to explain and predict the interrelations between these characteristics and factors. For an appropriate evaluation of the human-AI collaboration, patient outcomes and the role of patients in the decision-making process should be considered.
Collapse
Affiliation(s)
- Michael Knop
- Department of Information Systems, University of Siegen, Siegen, Germany
| | - Sebastian Weber
- Department of Information Systems, University of Siegen, Siegen, Germany
| | - Marius Mueller
- Department of Information Systems, University of Siegen, Siegen, Germany
| | - Bjoern Niehaves
- Department of Information Systems, University of Siegen, Siegen, Germany
| |
Collapse
|
26
|
Maassen O, Fritsch S, Palm J, Deffge S, Kunze J, Marx G, Riedel M, Schuppert A, Bickenbach J. Future Medical Artificial Intelligence Application Requirements and Expectations of Physicians in German University Hospitals: Web-Based Survey. J Med Internet Res 2021; 23:e26646. [PMID: 33666563 PMCID: PMC7980122 DOI: 10.2196/26646] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/29/2021] [Accepted: 02/15/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The increasing development of artificial intelligence (AI) systems in medicine driven by researchers and entrepreneurs goes along with enormous expectations for medical care advancement. AI might change the clinical practice of physicians from almost all medical disciplines and in most areas of health care. While expectations for AI in medicine are high, practical implementations of AI for clinical practice are still scarce in Germany. Moreover, physicians' requirements and expectations of AI in medicine and their opinion on the usage of anonymized patient data for clinical and biomedical research have not been investigated widely in German university hospitals. OBJECTIVE This study aimed to evaluate physicians' requirements and expectations of AI in medicine and their opinion on the secondary usage of patient data for (bio)medical research (eg, for the development of machine learning algorithms) in university hospitals in Germany. METHODS A web-based survey was conducted addressing physicians of all medical disciplines in 8 German university hospitals. Answers were given using Likert scales and general demographic responses. Physicians were asked to participate locally via email in the respective hospitals. RESULTS The online survey was completed by 303 physicians (female: 121/303, 39.9%; male: 173/303, 57.1%; no response: 9/303, 3.0%) from a wide range of medical disciplines and work experience levels. Most respondents either had a positive (130/303, 42.9%) or a very positive attitude (82/303, 27.1%) towards AI in medicine. There was a significant association between the personal rating of AI in medicine and the self-reported technical affinity level (H4=48.3, P<.001). A vast majority of physicians expected the future of medicine to be a mix of human and artificial intelligence (273/303, 90.1%) but also requested a scientific evaluation before the routine implementation of AI-based systems (276/303, 91.1%). Physicians were most optimistic that AI applications would identify drug interactions (280/303, 92.4%) to improve patient care substantially but were quite reserved regarding AI-supported diagnosis of psychiatric diseases (62/303, 20.5%). Of the respondents, 82.5% (250/303) agreed that there should be open access to anonymized patient databases for medical and biomedical research. CONCLUSIONS Physicians in stationary patient care in German university hospitals show a generally positive attitude towards using most AI applications in medicine. Along with this optimism comes several expectations and hopes that AI will assist physicians in clinical decision making. Especially in fields of medicine where huge amounts of data are processed (eg, imaging procedures in radiology and pathology) or data are collected continuously (eg, cardiology and intensive care medicine), physicians' expectations of AI to substantially improve future patient care are high. In the study, the greatest potential was seen in the application of AI for the identification of drug interactions, assumedly due to the rising complexity of drug administration to polymorbid, polypharmacy patients. However, for the practical usage of AI in health care, regulatory and organizational challenges still have to be mastered.
Collapse
Affiliation(s)
- Oliver Maassen
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Sebastian Fritsch
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | - Julia Palm
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany
| | - Saskia Deffge
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Julian Kunze
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Gernot Marx
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| | - Morris Riedel
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
- School of Natural Sciences and Engineering, University of Iceland, Reykjavik, Iceland
| | - Andreas Schuppert
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
- Institute for Computational Biomedicine II, University Hospital RWTH Aachen, Aachen, Germany
| | - Johannes Bickenbach
- Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany
- SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany
| |
Collapse
|
27
|
Cabitza F, Campagner A, Sconfienza LM. As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI. BMC Med Inform Decis Mak 2020; 20:219. [PMID: 32917183 PMCID: PMC7488864 DOI: 10.1186/s12911-020-01224-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 08/17/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND We focus on the importance of interpreting the quality of the labeling used as the input of predictive models to understand the reliability of their output in support of human decision-making, especially in critical domains, such as medicine. METHODS Accordingly, we propose a framework distinguishing the reference labeling (or Gold Standard) from the set of annotations from which it is usually derived (the Diamond Standard). We define a set of quality dimensions and related metrics: representativeness (are the available data representative of its reference population?); reliability (do the raters agree with each other in their ratings?); and accuracy (are the raters' annotations a true representation?). The metrics for these dimensions are, respectively, the degree of correspondence, Ψ, the degree of weighted concordance ϱ, and the degree of fineness, Φ. We apply and evaluate these metrics in a diagnostic user study involving 13 radiologists. RESULTS We evaluate Ψ against hypothesis-testing techniques, highlighting that our metrics can better evaluate distribution similarity in high-dimensional spaces. We discuss how Ψ could be used to assess the reliability of new predictions or for train-test selection. We report the value of ϱ for our case study and compare it with traditional reliability metrics, highlighting both their theoretical properties and the reasons that they differ. Then, we report the degree of fineness as an estimate of the accuracy of the collected annotations and discuss the relationship between this latter degree and the degree of weighted concordance, which we find to be moderately but significantly correlated. Finally, we discuss the implications of the proposed dimensions and metrics with respect to the context of Explainable Artificial Intelligence (XAI). CONCLUSION We propose different dimensions and related metrics to assess the quality of the datasets used to build predictive models and Medical Artificial Intelligence (MAI). We argue that the proposed metrics are feasible for application in real-world settings for the continuous development of trustable and interpretable MAI systems.
Collapse
Affiliation(s)
- Federico Cabitza
- Dipartimento di Informatica, Sistemistica e Comunicazione, Universitá degli Studi di Milano-Bicocca, Viale Sarca, 336, Milan, 20125 Italy
| | - Andrea Campagner
- IRCCS Istituto Ortopedico Galeazzi, Via Riccardo Galeazzi 4, Milan, 20161 Italy
| | - Luca Maria Sconfienza
- IRCCS Istituto Ortopedico Galeazzi, Via Riccardo Galeazzi 4, Milan, 20161 Italy
- Department of Biomedical Sciences for Health, Università degli Studi di Milano, Via Mangiagalli 31, Milan, 20133 Italy
| |
Collapse
|
28
|
Egli A. Digitalization, clinical microbiology and infectious diseases. Clin Microbiol Infect 2020; 26:1289-1290. [PMID: 32622954 PMCID: PMC7330545 DOI: 10.1016/j.cmi.2020.06.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 06/20/2020] [Indexed: 01/11/2023]
Affiliation(s)
- A Egli
- Clinical Bacteriology and Mycology, University Hospital Basel, Basel, Switzerland; Applied Microbiology Research, Department of Biomedicine, University of Basel, Basel, Switzerland.
| |
Collapse
|
29
|
The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10114014] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we present and discuss a novel reliability metric to quantify the extent a ground truth, generated in multi-rater settings, as a reliable basis for the training and validation of machine learning predictive models. To define this metric, three dimensions are taken into account: agreement (that is, how much a group of raters mutually agree on a single case); confidence (that is, how much a rater is certain of each rating expressed); and competence (that is, how accurate a rater is). Therefore, this metric produces a reliability score weighted for the raters’ confidence and competence, but it only requires the former information to be actually collected, as the latter can be obtained by the ratings themselves, if no further information is available. We found that our proposal was both more conservative and robust to known paradoxes than other existing agreement measures, by virtue of a more articulated notion of the agreement due to chance, which was based on an empirical estimation of the reliability of the single raters involved. We discuss the above metric within a realistic annotation task that involved 13 expert radiologists in labeling the MRNet dataset. We also provide a nomogram by which to assess the actual accuracy of a classification model, given the reliability of its ground truth. In this respect, we also make the point that theoretical estimates of model performance are consistently overestimated if ground truth reliability is not properly taken into account.
Collapse
|