1
|
Huang G, Li Y, Jameel S, Long Y, Papanastasiou G. From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality? Comput Struct Biotechnol J 2024; 24:362-373. [PMID: 38800693 PMCID: PMC11126530 DOI: 10.1016/j.csbj.2024.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 05/29/2024] Open
Abstract
Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretable Artificial Intelligence" (XIAI) is introduced to distinguish XAI from IAI. Different models are further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms are the most prevalent emerging IAI technique. The use of IAI is growing, distinguishing it from XAI. The major challenges identified are that most XIAI does not explore "global" modelling processes, the lack of best practices, and the lack of systematic evaluation and benchmarks. One important opportunity is to use attention mechanisms to enhance multi-modal XIAI for personalized medicine. Additionally, combining DL with causal logic holds promise. Our discussion encourages the integration of XIAI in Large Language Models (LLMs) and domain-specific smaller models. In conclusion, XIAI adoption in healthcare requires dedicated in-house expertise. Collaboration with domain experts, end-users, and policymakers can lead to ready-to-use XIAI methods across NLP and medical tasks. While challenges exist, XIAI techniques offer a valuable foundation for interpretable NLP algorithms in healthcare.
Collapse
Affiliation(s)
- Guangming Huang
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Yingya Li
- Harvard Medical School and Boston Children's Hospital, Boston, 02115, United States
| | - Shoaib Jameel
- Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, United Kingdom
| | - Yunfei Long
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | | |
Collapse
|
2
|
Gin BC, Ten Cate O, O'Sullivan PS, Boscardin C. Assessing supervisor versus trainee viewpoints of entrustment through cognitive and affective lenses: an artificial intelligence investigation of bias in feedback. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024:10.1007/s10459-024-10311-9. [PMID: 38388855 DOI: 10.1007/s10459-024-10311-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/21/2024] [Indexed: 02/24/2024]
Abstract
The entrustment framework redirects assessment from considering only trainees' competence to decision-making about their readiness to perform clinical tasks independently. Since trainees and supervisors both contribute to entrustment decisions, we examined the cognitive and affective factors that underly their negotiation of trust, and whether trainee demographic characteristics may bias them. Using a document analysis approach, we adapted large language models (LLMs) to examine feedback dialogs (N = 24,187, each with an associated entrustment rating) between medical student trainees and their clinical supervisors. We compared how trainees and supervisors differentially documented feedback dialogs about similar tasks by identifying qualitative themes and quantitatively assessing their correlation with entrustment ratings. Supervisors' themes predominantly reflected skills related to patient presentations, while trainees' themes were broader-including clinical performance and personal qualities. To examine affect, we trained an LLM to measure feedback sentiment. On average, trainees used more negative language (5.3% lower probability of positive sentiment, p < 0.05) compared to supervisors, while documenting higher entrustment ratings (+ 0.08 on a 1-4 scale, p < 0.05). We also found biases tied to demographic characteristics: trainees' documentation reflected more positive sentiment in the case of male trainees (+ 1.3%, p < 0.05) and of trainees underrepresented in medicine (UIM) (+ 1.3%, p < 0.05). Entrustment ratings did not appear to reflect these biases, neither when documented by trainee nor supervisor. As such, bias appeared to influence the emotive language trainees used to document entrustment more than the degree of entrustment they experienced. Mitigating these biases is nonetheless important because they may affect trainees' assimilation into their roles and formation of trusting relationships.
Collapse
Affiliation(s)
- Brian C Gin
- Department of Pediatrics, University of California San Francisco, 550 16th St Floor 4, UCSF Box 0110, San Francisco, CA, 94158, USA.
| | - Olle Ten Cate
- Utrecht Center for Research and Development of Health Professions Education, University Medical Center, Utrecht, the Netherlands
- Department of Medicine, University of California San Francisco, San Francisco, USA
| | - Patricia S O'Sullivan
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Department of Surgery, University of California San Francisco, San Francisco, USA
| | - Christy Boscardin
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Department of Anesthesia, University of California San Francisco, San Francisco, USA
| |
Collapse
|
3
|
Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. MEDICAL TEACHER 2023; 45:565-573. [PMID: 36862064 DOI: 10.1080/0142159x.2023.2180340] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The use of Artificial Intelligence (AI) in medical education has the potential to facilitate complicated tasks and improve efficiency. For example, AI could help automate assessment of written responses, or provide feedback on medical image interpretations with excellent reliability. While applications of AI in learning, instruction, and assessment are growing, further exploration is still required. There exist few conceptual or methodological guides for medical educators wishing to evaluate or engage in AI research. In this guide, we aim to: 1) describe practical considerations involved in reading and conducting studies in medical education using AI, 2) define basic terminology and 3) identify which medical education problems and data are ideally-suited for using AI.
Collapse
Affiliation(s)
- Martin G Tolsgaard
- Copenhagen Academy for Medical Education and Simulation (CAMES), Copenhagen, Denmark
- Department of Obstetrics, Copenhagen University Hospital Rigshospitalet, Copenhagen, Denmark
| | - Martin V Pusic
- Department of Pediatrics, Harvard University, Boston, MA, USA
| | | | - Brian Gin
- Department of Pediatrics, University of California San Francisco, San Francisco, USA
| | - Morten Bo Svendsen
- Copenhagen Academy for Medical Education and Simulation (CAMES), Copenhagen, Denmark
| | - Mark D Syer
- School of Computing, Queen's University, Kingston, Canada
| | - Ryan Brydges
- Allan Waters Family Simulation Centre, St. Michael's Hospital, Unity Health Toronto & Department of Medicine, University of Toronto, Toronto, Canada
| | | | - Christy K Boscardin
- Department of Medicine and Anesthesia, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
4
|
Masters K. Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide No. 158. MEDICAL TEACHER 2023; 45:574-584. [PMID: 36912253 DOI: 10.1080/0142159x.2023.2186203] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Health Professions Education (HPE) has benefitted from the advances in Artificial Intelligence (AI) and is set to benefit more in the future. Just as any technological advance opens discussions about ethics, so the implications of AI for HPE ethics need to be identified, anticipated, and accommodated so that HPE can utilise AI without compromising crucial ethical principles. Rather than focussing on AI technology, this Guide focuses on the ethical issues likely to face HPE teachers and administrators as they encounter and use AI systems in their teaching environment. While many of the ethical principles may be familiar to readers in other contexts, they will be viewed in light of AI, and some unfamiliar issues will be introduced. They include data gathering, anonymity, privacy, consent, data ownership, security, bias, transparency, responsibility, autonomy, and beneficence. In the Guide, each topic explains the concept and its importance and gives some indication of how to cope with its complexities. Ideas are drawn from personal experience and the relevant literature. In most topics, further reading is suggested so that readers may further explore the concepts at their leisure. The aim is for HPE teachers and decision-makers at all levels to be alert to these issues and to take proactive action to be prepared to deal with the ethical problems and opportunities that AI usage presents to HPE.
Collapse
Affiliation(s)
- Ken Masters
- Medical Education and Informatics Department, College of Medicine and Health Sciences, Sultan Qaboos University, Muscat, Sultanate of Oman
| |
Collapse
|
5
|
Gin BC. Evolving natural language processing towards a subjectivist inductive paradigm. MEDICAL EDUCATION 2023; 57:384-387. [PMID: 36739578 DOI: 10.1111/medu.15024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Affiliation(s)
- Brian C Gin
- Department of Pediatrics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
6
|
Maimone C, Dolan BM, Green MM, Sanguino SM, Garcia PM, O’Brien CL. Utilizing Natural Language Processing of Narrative Feedback to Develop a Predictive Model of Pre-Clerkship Performance: Lessons Learned. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:141-148. [PMID: 37151853 PMCID: PMC10162355 DOI: 10.5334/pme.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/19/2023] [Indexed: 05/09/2023]
Abstract
Background Natural language processing is a promising technique that can be used to create efficiencies in the review of narrative feedback to learners. The Feinberg School of Medicine has implemented formal review of pre-clerkship narrative feedback since 2014 through its portfolio assessment system but this process requires considerable time and effort. This article describes how natural language processing was used to build a predictive model of pre-clerkship student performance that can be utilized to assist competency committee reviews. Approach The authors took an iterative and inductive approach to the analysis, which allowed them to identify characteristics of narrative feedback that are both predictive of performance and useful to faculty reviewers. Words and phrases were manually grouped into topics that represented concepts illustrating student performance. Topics were reviewed by experienced reviewers, tested for consistency across time, and checked to ensure they did not demonstrate bias. Outcomes Sixteen topic groups of words and phrases were found to be predictive of performance. The best-fitting model used a combination of topic groups, word counts, and categorical ratings. The model had an AUC value of 0.92 on the training data and 0.88 on the test data. Reflection A thoughtful, careful approach to using natural language processing was essential. Given the idiosyncrasies of narrative feedback in medical education, standard natural language processing packages were not adequate for predicting student outcomes. Rather, employing qualitative techniques including repeated member checking and iterative revision resulted in a useful and salient predictive model.
Collapse
Affiliation(s)
- Christina Maimone
- Associate director of research data services, Northwestern IT Research Computing Services, Northwestern University, Evanston, Illinois, USA
| | - Brigid M. Dolan
- Associate professor of medicine and medical education and director of assessment, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Marianne M. Green
- Raymond H. Curry, MD Professor of Medical Education, professor of medicine, and vice dean for education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Sandra M. Sanguino
- Associate professor of pediatrics and senior associate dean of medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Patricia M. Garcia
- Professor of obstetrics and gynecology and medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Celia Laird O’Brien
- Assistant professor of medical education and assistant dean of program evaluation and accreditation, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|