1
|
Kerr WT, McFarlane KN, Pucci GF, Carns DR, Israel A, Vighetti L, Pennell PB, Stern JM, Xia Z, Wang Y. Supervised machine learning compared to large language models for identifying functional seizures from medical records. Epilepsia 2025; 66:1155-1164. [PMID: 39960122 PMCID: PMC11997926 DOI: 10.1111/epi.18272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 01/07/2025] [Accepted: 01/08/2025] [Indexed: 04/16/2025]
Abstract
OBJECTIVE The Functional Seizures Likelihood Score (FSLS) is a supervised machine learning-based diagnostic score that was developed to differentiate functional seizures (FS) from epileptic seizures (ES). In contrast to this targeted approach, large language models (LLMs) can identify patterns in data for which they were not specifically trained. To evaluate the relative benefits of each approach, we compared the diagnostic performance of the FSLS to two LLMs: ChatGPT and GPT-4. METHODS In total, 114 anonymized cases were constructed based on patients with documented FS, ES, mixed ES and FS, or physiologic seizure-like events (PSLEs). Text-based data were presented in three sequential prompts to the LLMs, showing the history of present illness (HPI), electroencephalography (EEG) results, and neuroimaging results. We compared the accuracy (number of correct predictions/number of cases) and area under the receiver-operating characteristic (ROC) curves (AUCs) of the LLMs to the FSLS using mixed-effects logistic regression. RESULTS The accuracy of FSLS was 74% (95% confidence interval [CI] 65%-82%) and the AUC was 85% (95% CI 77%-92%). GPT-4 was superior to both the FSLS and ChatGPT (p <.001), with an accuracy of 85% (95% CI 77%-91%) and AUC of 87% (95% CI 79%-95%). Cohen's kappa between the FSLS and GPT-4 was 40% (fair). The LLMs provided different predictions on different days when the same note was provided for 33% of patients, and the LLM's self-rated certainty was moderately correlated with this observed variability (Spearman's rho2: 30% [fair, ChatGPT] and 63% [substantial, GPT-4]). SIGNIFICANCE Both GPT-4 and the FSLS identified a substantial subset of patients with FS based on clinical history. The fair agreement in predictions highlights that the LLMs identified patients differently from the structured score. The inconsistency of the LLMs' predictions across days and incomplete insight into their own consistency was concerning. This comparison highlights both benefits and cautions about how machine learning and artificial intelligence could identify patients with FS in clinical practice.
Collapse
Affiliation(s)
- Wesley T. Kerr
- Department of NeurologyUniversity of PittsburghPittsburghPennsylvaniaUSA
- Department of NeurologyUniversity of California, Los AngelesLos AngelesCaliforniaUSA
- Department of Psychiatry and Biobehavioral SciencesUniversity of California, Los AngelesLos AngelesCaliforniaUSA
- Department of Biomedical InformaticsUniversity of PittsburghPittsburghPennsylvaniaUSA
| | | | | | - Danielle R. Carns
- Department of NeurologyUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Alex Israel
- Department of NeurologyUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Lianne Vighetti
- Department of Social WorkUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Page B. Pennell
- Department of NeurologyUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - John M. Stern
- Department of NeurologyUniversity of California, Los AngelesLos AngelesCaliforniaUSA
| | - Zongqi Xia
- Department of NeurologyUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Yanshan Wang
- Department of Biomedical InformaticsUniversity of PittsburghPittsburghPennsylvaniaUSA
- Intelligent Systems ProgramUniversity of PittsburghPittsburghPennsylvaniaUSA
- Department of Health Information ManagementUniversity of PittsburghPittsburghPennsylvaniaUSA
| |
Collapse
|
2
|
McLaren JR, Yuan D, Beniczky S, Westover MB, Nascimento FA. The future of EEG education in the era of artificial intelligence. Epilepsia 2025. [PMID: 40035709 DOI: 10.1111/epi.18326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/03/2025] [Accepted: 02/06/2025] [Indexed: 03/06/2025]
Affiliation(s)
- John R McLaren
- Department of Neurology, Harvard Medical School, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Doyle Yuan
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Sándor Beniczky
- Department of Clinical Neurophysiology, Danish Epilepsy Center, Dianalund, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - M Brandon Westover
- Department of Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Fábio A Nascimento
- Department of Neurology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
3
|
Brondani M, Alves C, Ribeiro C, Braga MM, Garcia RCM, Ardenghi T, Pattanaporn K. Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. J Dent Educ 2024; 88:1671-1680. [PMID: 38973069 PMCID: PMC11638150 DOI: 10.1002/jdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024]
Abstract
INTRODUCTION Reflections enable students to gain additional value from a given experience. The use of Chat Generative Pre-training Transformer (ChatGPT, OpenAI Incorporated) has gained momentum, but its impact on dental education is understudied. OBJECTIVES To assess whether or not university instructors can differentiate reflections generated by ChatGPT from those generated by students, and to assess whether or not the content of a thematic analysis generated by ChatGPT differs from that generated by qualitative researchers on the same reflections. METHODS Hardcopies of 20 reflections (10 generated by undergraduate dental students and 10 generated by ChatGPT) were distributed to three instructors who had at least 5 years of teaching experience. Instructors were asked to assign either 'ChatGPT' or 'student' to each reflection. Ten of these reflections (five generated by undergraduate dental students and five generated by ChatGPT) were randomly selected and distributed to two qualitative researchers who were asked to perform a brief thematic analysis with codes and themes. The same ten reflections were also thematically analyzed by ChatGPT. RESULTS The three instructors correctly determined whether the reflections were student or ChatGPT generated 85% of the time. Most disagreements (40%) happened with the reflections generated by ChatGPT, as the instructors thought to be generated by students. The thematic analyses did not differ substantially when comparing the codes and themes produced by the two researchers with those generated by ChatGPT. CONCLUSIONS Instructors could differentiate between reflections generated by ChatGPT or by students most of the time. The overall content of a thematic analysis generated by the artificial intelligence program ChatGPT did not differ from that generated by qualitative researchers. Overall, the promising applications of ChatGPT will likely generate a paradigm shift in (dental) health education, research, and practice.
Collapse
Affiliation(s)
- Mario Brondani
- Faculty of Dentistry, Department of Oral Health SciencesUniversity of British ColumbiaVancouverCanada
| | - Claudia Alves
- Faculty of Dentistry, Department of Dentistry IIFederal University of MaranhãoSao Luis‐MaranhaoBrazil
| | - Cecilia Ribeiro
- Faculty of Dentistry, Department of Dentistry IIFederal University of MaranhãoSao Luis‐MaranhaoBrazil
| | - Mariana M Braga
- Faculty of DentistryDepartment of Pediatric Dentistry, University of São PauloSao PauloBrazil
| | - Renata C Mathes Garcia
- Faculty of DentistryProsthodontic and Periodontic DepartmentUniversity of CampinasSao PauloBrazil
| | - Thiago Ardenghi
- Faculty of Dentistry, Department of Pediatric Dentistry and EpidemiologySchool of Dentistry, Federal University of Santa MariaSanta MariaBrazil
| | | |
Collapse
|
4
|
Speiser JL, Kerr WT, Ziegler A. Common Critiques and Recommendations for Studies in Neurology Using Machine Learning Methods. Neurology 2024; 103:e209861. [PMID: 39236270 PMCID: PMC11379123 DOI: 10.1212/wnl.0000000000209861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 07/11/2024] [Indexed: 09/07/2024] Open
Abstract
Machine learning (ML) methods are becoming more prevalent in the neurology literature as alternatives to traditional statistical methods to address challenges in the analysis of modern data sets. Despite the increase in the popularity of ML methods in neurology studies, some authors do not fully address all items recommended in reporting guidelines. The authors of this Research Methods article are members of the Neurology® editorial board and have reviewed many studies using ML methods. In their review reports, several critiques often appear, which could be avoided if guidance were available. In this article, we detail common critiques found in ML research studies and make recommendations for how to avoid them. The first critique involves misalignment of the study goals and the analysis conducted. The second critique focuses on ML terminology being appropriately used. Critiques 3-6 are related to the study design: justifying sample sizes and the suitability of the data set for the study goals, describing the ML analysis pipeline sufficiently, quantifying the amount of missing data and providing information about missing data handling, and including uncertainty estimates for key metrics. The seventh critique focuses on fairly describing both strengths and limitations of the ML study, including the analysis methodology and results. We provide examples in neurology for each critique and guidance on how to avoid the critique. Overall, we recommend that authors use ML-specific checklists developed by research consortia for designing and reporting studies using ML. We also recommend that authors involve both a statistician and an ML expert in work that uses ML. Although our list of critiques is not exhaustive, our recommendations should help improve the quality and rigor of ML studies. ML has great potential to revolutionize neurology, but investigators need to conduct and report the results in a way that allows readers to fully evaluate the benefits and limitations of ML approaches.
Collapse
Affiliation(s)
- Jaime L Speiser
- From the Department of Biostatistics and Data Science (J.L.S.), Wake Forest University School of Medicine, Winston-Salem, NC; Department of Neurology and Biomedical Informatics (W.T.K.), University of Pittsburgh, PA; Cardio-CARE (A.Z.), Medizincampus Davos, Switzerland; Department of Cardiology and Population Health Innovation (A.Z.), University Medical Center Hamburg-Eppendorf, Germany; and Department of Mathematics, Statistics and Computer Science (A.Z.), University of KwaZulu-Natal, Berea, Durban, South Africa
| | - Wesley T Kerr
- From the Department of Biostatistics and Data Science (J.L.S.), Wake Forest University School of Medicine, Winston-Salem, NC; Department of Neurology and Biomedical Informatics (W.T.K.), University of Pittsburgh, PA; Cardio-CARE (A.Z.), Medizincampus Davos, Switzerland; Department of Cardiology and Population Health Innovation (A.Z.), University Medical Center Hamburg-Eppendorf, Germany; and Department of Mathematics, Statistics and Computer Science (A.Z.), University of KwaZulu-Natal, Berea, Durban, South Africa
| | - Andreas Ziegler
- From the Department of Biostatistics and Data Science (J.L.S.), Wake Forest University School of Medicine, Winston-Salem, NC; Department of Neurology and Biomedical Informatics (W.T.K.), University of Pittsburgh, PA; Cardio-CARE (A.Z.), Medizincampus Davos, Switzerland; Department of Cardiology and Population Health Innovation (A.Z.), University Medical Center Hamburg-Eppendorf, Germany; and Department of Mathematics, Statistics and Computer Science (A.Z.), University of KwaZulu-Natal, Berea, Durban, South Africa
| |
Collapse
|
5
|
Malaguti MC, Gios L, Giometto B, Longo C, Riello M, Ottaviani D, Pellegrini M, Di Giacopo R, Donner D, Rozzanigo U, Chierici M, Moroni M, Jurman G, Bincoletto G, Pardini M, Bacchin R, Nobili F, Di Biasio F, Avanzino L, Marchese R, Mandich P, Garbarino S, Pagano M, Campi C, Piana M, Marenco M, Uccelli A, Osmani V. Artificial intelligence of imaging and clinical neurological data for predictive, preventive and personalized (P3) medicine for Parkinson Disease: The NeuroArtP3 protocol for a multi-center research study. PLoS One 2024; 19:e0300127. [PMID: 38483951 PMCID: PMC10939244 DOI: 10.1371/journal.pone.0300127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/15/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND The burden of Parkinson Disease (PD) represents a key public health issue and it is essential to develop innovative and cost-effective approaches to promote sustainable diagnostic and therapeutic interventions. In this perspective the adoption of a P3 (predictive, preventive and personalized) medicine approach seems to be pivotal. The NeuroArtP3 (NET-2018-12366666) is a four-year multi-site project co-funded by the Italian Ministry of Health, bringing together clinical and computational centers operating in the field of neurology, including PD. OBJECTIVE The core objectives of the project are: i) to harmonize the collection of data across the participating centers, ii) to structure standardized disease-specific datasets and iii) to advance knowledge on disease's trajectories through machine learning analysis. METHODS The 4-years study combines two consecutive research components: i) a multi-center retrospective observational phase; ii) a multi-center prospective observational phase. The retrospective phase aims at collecting data of the patients admitted at the participating clinical centers. Whereas the prospective phase aims at collecting the same variables of the retrospective study in newly diagnosed patients who will be enrolled at the same centers. RESULTS The participating clinical centers are the Provincial Health Services (APSS) of Trento (Italy) as the center responsible for the PD study and the IRCCS San Martino Hospital of Genoa (Italy) as the promoter center of the NeuroartP3 project. The computational centers responsible for data analysis are the Bruno Kessler Foundation of Trento (Italy) with TrentinoSalute4.0 -Competence Center for Digital Health of the Province of Trento (Italy) and the LISCOMPlab University of Genoa (Italy). CONCLUSIONS The work behind this observational study protocol shows how it is possible and viable to systematize data collection procedures in order to feed research and to advance the implementation of a P3 approach into the clinical practice through the use of AI models.
Collapse
Affiliation(s)
| | - Lorenzo Gios
- TrentinoSalute4.0 –Competence Center for Digital Health of the Province of Trento, Trento, Italy
| | - Bruno Giometto
- Centro Interdipartimentale di Scienze Mediche (CISMed), Facoltà di Medicina e Chirurgia, Università di Trento, Trento, Italy
| | - Chiara Longo
- Azienda Provinciale per i Servizi Sanitari (APSS) di Trento, Trento, Italy
| | - Marianna Riello
- Azienda Provinciale per i Servizi Sanitari (APSS) di Trento, Trento, Italy
| | | | | | | | - Davide Donner
- Azienda Provinciale per i Servizi Sanitari (APSS) di Trento, Trento, Italy
- Department of Medical and Surgical Sciences, Alma Mater Studiorum Università di Bologna, Bologna, Italy
| | - Umberto Rozzanigo
- Azienda Provinciale per i Servizi Sanitari (APSS) di Trento, Trento, Italy
| | | | - Monica Moroni
- Fondazione Bruno Kessler Research Center, Trento, Italy
| | | | | | - Matteo Pardini
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Neuroscience, Rehabilitation, Maternal and Child Health, University of Genoa, Genoa, Italy
| | - Ruggero Bacchin
- Azienda Provinciale per i Servizi Sanitari (APSS) di Trento, Trento, Italy
| | - Flavio Nobili
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Laura Avanzino
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Experimental Medicine, Section of Human Physiology, University of Genoa, Genoa, Italy
| | | | - Paola Mandich
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- DINOGMI Department, University of Genoa, Genoa, Italy
| | | | - Mattia Pagano
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Cristina Campi
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Dipartimento Di Matematica, Università Di Genova, Genoa, Italy
| | - Michele Piana
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Dipartimento Di Matematica, Università Di Genova, Genoa, Italy
| | | | | | - Venet Osmani
- Fondazione Bruno Kessler Research Center, Trento, Italy
| |
Collapse
|
6
|
Kerr WT, McFarlane KN. Machine Learning and Artificial Intelligence Applications to Epilepsy: a Review for the Practicing Epileptologist. Curr Neurol Neurosci Rep 2023; 23:869-879. [PMID: 38060133 DOI: 10.1007/s11910-023-01318-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2023] [Indexed: 12/08/2023]
Abstract
PURPOSE OF REVIEW Machine Learning (ML) and Artificial Intelligence (AI) are data-driven techniques to translate raw data into applicable and interpretable insights that can assist in clinical decision making. Some of these tools have extremely promising initial results, earning both great excitement and creating hype. This non-technical article reviews recent developments in ML/AI in epilepsy to assist the current practicing epileptologist in understanding both the benefits and limitations of integrating ML/AI tools into their clinical practice. RECENT FINDINGS ML/AI tools have been developed to assist clinicians in almost every clinical decision including (1) predicting future epilepsy in people at risk, (2) detecting and monitoring for seizures, (3) differentiating epilepsy from mimics, (4) using data to improve neuroanatomic localization and lateralization, and (5) tracking and predicting response to medical and surgical treatments. We also discuss practical, ethical, and equity considerations in the development and application of ML/AI tools including chatbots based on Large Language Models (e.g., ChatGPT). ML/AI tools will change how clinical medicine is practiced, but, with rare exceptions, the transferability to other centers, effectiveness, and safety of these approaches have not yet been established rigorously. In the future, ML/AI will not replace epileptologists, but epileptologists with ML/AI will replace epileptologists without ML/AI.
Collapse
Affiliation(s)
- Wesley T Kerr
- Department of Neurology, University of Pittsburgh, 3471 Fifth Ave, Kaufmann 811.22, Pittsburgh, PA, 15213, USA.
- Department of Biomedical Informatics, University of Pittsburgh, 3471 Fifth Ave, Kaufmann 811.22, Pittsburgh, PA, 15213, USA.
- Department of Neurology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.
| | - Katherine N McFarlane
- Department of Neurology, University of Pittsburgh, 3471 Fifth Ave, Kaufmann 811.22, Pittsburgh, PA, 15213, USA
| |
Collapse
|
7
|
Wang Y, Li N, Chen L, Wu M, Meng S, Dai Z, Zhang Y, Clarke M. Guidelines, Consensus Statements, and Standards for the Use of Artificial Intelligence in Medicine: Systematic Review. J Med Internet Res 2023; 25:e46089. [PMID: 37991819 PMCID: PMC10701655 DOI: 10.2196/46089] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 08/21/2023] [Accepted: 09/26/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND The application of artificial intelligence (AI) in the delivery of health care is a promising area, and guidelines, consensus statements, and standards on AI regarding various topics have been developed. OBJECTIVE We performed this study to assess the quality of guidelines, consensus statements, and standards in the field of AI for medicine and to provide a foundation for recommendations about the future development of AI guidelines. METHODS We searched 7 electronic databases from database establishment to April 6, 2022, and screened articles involving AI guidelines, consensus statements, and standards for eligibility. The AGREE II (Appraisal of Guidelines for Research & Evaluation II) and RIGHT (Reporting Items for Practice Guidelines in Healthcare) tools were used to assess the methodological and reporting quality of the included articles. RESULTS This systematic review included 19 guideline articles, 14 consensus statement articles, and 3 standard articles published between 2019 and 2022. Their content involved disease screening, diagnosis, and treatment; AI intervention trial reporting; AI imaging development and collaboration; AI data application; and AI ethics governance and applications. Our quality assessment revealed that the average overall AGREE II score was 4.0 (range 2.2-5.5; 7-point Likert scale) and the mean overall reporting rate of the RIGHT tool was 49.4% (range 25.7%-77.1%). CONCLUSIONS The results indicated important differences in the quality of different AI guidelines, consensus statements, and standards. We made recommendations for improving their methodological and reporting quality. TRIAL REGISTRATION PROSPERO International Prospective Register of Systematic Reviews (CRD42022321360); https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=321360.
Collapse
Affiliation(s)
- Ying Wang
- Department of Medical Administration, West China Hospital, Sichuan University, Chengdu, China
| | - Nian Li
- Department of Medical Administration, West China Hospital, Sichuan University, Chengdu, China
| | - Lingmin Chen
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, China
| | - Miaomiao Wu
- Department of General Practice, National Clinical Research Center for Geriatrics, International Medical Center, West China Hospital, Sichuan University, Chengdu, China
| | - Sha Meng
- Department of Operation Management, West China Hospital, Sichuan University, Chengdu, China
| | - Zelei Dai
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yonggang Zhang
- Department of Periodical Press, National Clinical Research Center for Geriatrics, Chinese Evidence-based Medicine Center, Nursing Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, Chengdu, China
| | - Mike Clarke
- Northern Ireland Methodology Hub, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
8
|
Abstract
OBJECTIVES Through a scoping review, we examine in this survey what ways health equity has been promoted in clinical research informatics with patient implications and especially published in the year of 2021 (and some in 2022). METHOD A scoping review was conducted guided by using methods described in the Joanna Briggs Institute Manual. The review process consisted of five stages: 1) development of aim and research question, 2) literature search, 3) literature screening and selection, 4) data extraction, and 5) accumulate and report results. RESULTS From the 478 identified papers in 2021 on the topic of clinical research informatics with focus on health equity as a patient implication, 8 papers met our inclusion criteria. All included papers focused on artificial intelligence (AI) technology. The papers addressed health equity in clinical research informatics either through the exposure of inequity in AI-based solutions or using AI as a tool for promoting health equity in the delivery of healthcare services. While algorithmic bias poses a risk to health equity within AI-based solutions, AI has also uncovered inequity in traditional treatment and demonstrated effective complements and alternatives that promotes health equity. CONCLUSIONS Clinical research informatics with implications for patients still face challenges of ethical nature and clinical value. However, used prudently-for the right purpose in the right context-clinical research informatics could bring powerful tools in advancing health equity in patient care.
Collapse
|
9
|
Farah L, Davaze-Schneider J, Martin T, Nguyen P, Borget I, Martelli N. Are current clinical studies on artificial intelligence-based medical devices comprehensive enough to support a full health technology assessment? A systematic review. Artif Intell Med 2023; 140:102547. [PMID: 37210155 DOI: 10.1016/j.artmed.2023.102547] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 03/28/2023] [Accepted: 04/04/2023] [Indexed: 05/22/2023]
Abstract
INTRODUCTION Artificial Intelligence-based Medical Devices (AI-based MDs) are experiencing exponential growth in healthcare. This study aimed to investigate whether current studies assessing AI contain the information required for health technology assessment (HTA) by HTA bodies. METHODS We conducted a systematic literature review based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology to extract articles published between 2016 and 2021 related to the assessment of AI-based MDs. Data extraction focused on study characteristics, technology, algorithms, comparators, and results. AI quality assessment and HTA scores were calculated to evaluate whether the items present in the included studies were concordant with the HTA requirements. We performed a linear regression for the HTA and AI scores with the explanatory variables of the impact factor, publication date, and medical specialty. We conducted a univariate analysis of the HTA score and a multivariate analysis of the AI score with an alpha risk of 5 %. RESULTS Of 5578 retrieved records, 56 were included. The mean AI quality assessment score was 67 %; 32 % of articles had an AI quality score ≥ 70 %, 50 % had a score between 50 % and 70 %, and 18 % had a score under 50 %. The highest quality scores were observed for the study design (82 %) and optimisation (69 %) categories, whereas the scores were lowest in the clinical practice category (23 %). The mean HTA score was 52 % for all seven domains. 100 % of the studies assessed clinical effectiveness, whereas only 9 % evaluated safety, and 20 % evaluated economic issues. There was a statistically significant relationship between the impact factor and the HTA and AI scores (both p = 0.046). DISCUSSION Clinical studies on AI-based MDs have limitations and often lack adapted, robust, and complete evidence. High-quality datasets are also required because the output data can only be trusted if the inputs are reliable. The existing assessment frameworks are not specifically designed to assess AI-based MDs. From the perspective of regulatory authorities, we suggest that these frameworks should be adapted to assess the interpretability, explainability, cybersecurity, and safety of ongoing updates. From the perspective of HTA agencies, we highlight that transparency, professional and patient acceptance, ethical issues, and organizational changes are required for the implementation of these devices. Economic assessments of AI should rely on a robust methodology (business impact or health economic models) to provide decision-makers with more reliable evidence. CONCLUSION Currently, AI studies are insufficient to cover HTA prerequisites. HTA processes also need to be adapted because they do not consider the important specificities of AI-based MDs. Specific HTA workflows and accurate assessment tools should be designed to standardise evaluations, generate reliable evidence, and create confidence.
Collapse
Affiliation(s)
- Line Farah
- Groupe de Recherche et d'accueil en Droit et Economie de la Santé (GRADES) Department, University Paris-Saclay, Orsay, France; Innovation Center for Medical Devices, Foch Hospital, 40 Rue Worth, 92150 Suresnes, France.
| | - Julie Davaze-Schneider
- Pharmacy Department, Georges Pompidou European Hospital, AP-HP, 20 Rue Leblanc, 75015 Paris, France
| | - Tess Martin
- Groupe de Recherche et d'accueil en Droit et Economie de la Santé (GRADES) Department, University Paris-Saclay, Orsay, France; Pharmacy Department, Georges Pompidou European Hospital, AP-HP, 20 Rue Leblanc, 75015 Paris, France
| | - Pierre Nguyen
- Pharmacy Department, Georges Pompidou European Hospital, AP-HP, 20 Rue Leblanc, 75015 Paris, France
| | - Isabelle Borget
- Groupe de Recherche et d'accueil en Droit et Economie de la Santé (GRADES) Department, University Paris-Saclay, Orsay, France; Department of Biostatistics and Epidemiology, Gustave Roussy, University Paris-Saclay, 94805 Villejuif, France; Oncostat U1018, Inserm, University Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, Villejuif, France
| | - Nicolas Martelli
- Groupe de Recherche et d'accueil en Droit et Economie de la Santé (GRADES) Department, University Paris-Saclay, Orsay, France; Pharmacy Department, Georges Pompidou European Hospital, AP-HP, 20 Rue Leblanc, 75015 Paris, France
| |
Collapse
|
10
|
Alfalahi H, Dias SB, Khandoker AH, Chaudhuri KR, Hadjileontiadis LJ. A scoping review of neurodegenerative manifestations in explainable digital phenotyping. NPJ Parkinsons Dis 2023; 9:49. [PMID: 36997573 PMCID: PMC10063633 DOI: 10.1038/s41531-023-00494-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/03/2023] Open
Abstract
Neurologists nowadays no longer view neurodegenerative diseases, like Parkinson's and Alzheimer's disease, as single entities, but rather as a spectrum of multifaceted symptoms with heterogeneous progression courses and treatment responses. The definition of the naturalistic behavioral repertoire of early neurodegenerative manifestations is still elusive, impeding early diagnosis and intervention. Central to this view is the role of artificial intelligence (AI) in reinforcing the depth of phenotypic information, thereby supporting the paradigm shift to precision medicine and personalized healthcare. This suggestion advocates the definition of disease subtypes in a new biomarker-supported nosology framework, yet without empirical consensus on standardization, reliability and interpretability. Although the well-defined neurodegenerative processes, linked to a triad of motor and non-motor preclinical symptoms, are detected by clinical intuition, we undertake an unbiased data-driven approach to identify different patterns of neuropathology distribution based on the naturalistic behavior data inherent to populations in-the-wild. We appraise the role of remote technologies in the definition of digital phenotyping specific to brain-, body- and social-level neurodegenerative subtle symptoms, emphasizing inter- and intra-patient variability powered by deep learning. As such, the present review endeavors to exploit digital technologies and AI to create disease-specific phenotypic explanations, facilitating the understanding of neurodegenerative diseases as "bio-psycho-social" conditions. Not only does this translational effort within explainable digital phenotyping foster the understanding of disease-induced traits, but it also enhances diagnostic and, eventually, treatment personalization.
Collapse
Affiliation(s)
- Hessa Alfalahi
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| | - Sofia B Dias
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- CIPER, Faculdade de Motricidade Humana, University of Lisbon, Lisbon, Portugal
| | - Ahsan H Khandoker
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Kallol Ray Chaudhuri
- Parkinson Foundation, International Center of Excellence, King's College London, Denmark Hills, London, UK
- Department of Basic and Clinical Neurosciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, London, UK
| | - Leontios J Hadjileontiadis
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
11
|
Chiang S, Rao VR. Choosing the Best Antiseizure Medication-Can Artificial Intelligence Help? JAMA Neurol 2022; 79:970-972. [PMID: 36036914 PMCID: PMC11163946 DOI: 10.1001/jamaneurol.2022.2441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Sharon Chiang
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco
| | - Vikram R Rao
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco
| |
Collapse
|
12
|
Chiang S, Baud MO, Worrell GA, Rao VR. Editorial: Seizure Forecasting and Detection: Computational Models, Machine Learning, and Translation Into Devices. Front Neurol 2022; 13:874070. [PMID: 35370904 PMCID: PMC8966607 DOI: 10.3389/fneur.2022.874070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 02/15/2022] [Indexed: 11/21/2022] Open
Affiliation(s)
- Sharon Chiang
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| | - Maxime O. Baud
- Department of Neurology, Sleep-Wake-Epilepsy Center and Center for Experimental Neurology, Inselspital Bern, University Hospital, University of Bern, Bern, Switzerland
- Wyss Center for Bio- and Neuro-Technology, Geneva, Switzerland
| | | | - Vikram R. Rao
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
13
|
Daneshjou R, Barata C, Betz-Stablein B, Celebi ME, Codella N, Combalia M, Guitera P, Gutman D, Halpern A, Helba B, Kittler H, Kose K, Liopyris K, Malvehy J, Seog HS, Soyer HP, Tkaczyk ER, Tschandl P, Rotemberg V. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines From the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol 2022; 158:90-96. [PMID: 34851366 PMCID: PMC9845064 DOI: 10.1001/jamadermatol.2021.4915] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
IMPORTANCE The use of artificial intelligence (AI) is accelerating in all aspects of medicine and has the potential to transform clinical care and dermatology workflows. However, to develop image-based algorithms for dermatology applications, comprehensive criteria establishing development and performance evaluation standards are required to ensure product fairness, reliability, and safety. OBJECTIVE To consolidate limited existing literature with expert opinion to guide developers and reviewers of dermatology AI. EVIDENCE REVIEW In this consensus statement, the 19 members of the International Skin Imaging Collaboration AI working group volunteered to provide a consensus statement. A systematic PubMed search was performed of English-language articles published between December 1, 2008, and August 24, 2021, for "artificial intelligence" and "reporting guidelines," as well as other pertinent studies identified by the expert panel. Factors that were viewed as critical to AI development and performance evaluation were included and underwent 2 rounds of electronic discussion to achieve consensus. FINDINGS A checklist of items was developed that outlines best practices of image-based AI development and assessment in dermatology. CONCLUSIONS AND RELEVANCE Clinically effective AI needs to be fair, reliable, and safe; this checklist of best practices will help both developers and reviewers achieve this goal.
Collapse
Affiliation(s)
- Roxana Daneshjou
- Stanford Department of Dermatology, Stanford School of Medicine, Redwood City, CA, USA,Stanford Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA
| | - Catarina Barata
- Institute for Systems and Robotics, Instituto Superior Tecnico, Lisboa, Portugal
| | - Brigid Betz-Stablein
- The University of Queensland Diamantina Institute, The University of Queensland, Dermatology Research Centre, Brisbane, Australia
| | - M. Emre Celebi
- Department of Computer Science and Engineering, University of Central Arkansas, Conway, Arkansas, USA
| | | | - Marc Combalia
- Melanoma Unit, Dermatology Department, Hospital Cĺınic Barcelona, Universitat de Barcelona, IDIBAPS, Barcelona, Spain
| | - Pascale Guitera
- Melanoma Institute Australia, the University of Sydney, Camperdown, Australia,Sydney Melanoma Diagnostic Centre, Royal Prince Alfred Hospital, Camperdown, Australia
| | - David Gutman
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, USA
| | - Allan Halpern
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | - Harald Kittler
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Kivanc Kose
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | - Josep Malvehy
- Melanoma Unit, Dermatology Department, Hospital Cĺınic Barcelona, Universitat de Barcelona, IDIBAPS, Barcelona, Spain
| | - Han Seung Seog
- Department of Dermatology, I Dermatology Clinic, Seoul, Korea.,IDerma, Inc., Seoul, Korea
| | - H. Peter Soyer
- The University of Queensland Diamantina Institute, The University of Queensland, Dermatology Research Centre, Brisbane, Australia
| | - Eric R Tkaczyk
- Dermatology Service and Research Service, Tennessee Valley Healthcare System, Department of Veterans Affairs, Nashville TN, USA,Vanderbilt Dermatology Translational Research Clinic, Department of Dermatology, Vanderbilt University Medical Center, Nashville TN, USA,Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Philipp Tschandl
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Veronica Rotemberg
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
14
|
Cabedo-Peris J, Martí-Vilar M, Merino-Soto C, Ortiz-Morán M. Basic Empathy Scale: A Systematic Review and Reliability Generalization Meta-Analysis. Healthcare (Basel) 2021; 10:29. [PMID: 35052193 PMCID: PMC8775461 DOI: 10.3390/healthcare10010029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 12/18/2021] [Accepted: 12/23/2021] [Indexed: 12/17/2022] Open
Abstract
The Basic Empathy Scale (BES) has been internationally used to measure empathy. A systematic review including 74 articles that implement the instrument since its development in 2006 was carried out. Moreover, an evidence validity analysis and a reliability generalization meta-analysis were performed to examine if the scale presented the appropriate values to justify its application. Results from the systematic review showed that the use of the BES is increasing, although the research areas in which it is being implemented are currently being broadened. The validity analyses indicated that both the type of factor analysis and reliability are reported in validation studies much more than the consequences of testing are. Regarding the meta-analysis results, the mean of Cronbach's α for cognitive empathy was 0.81 (95% CI: 0.77-0.85), with high levels of heterogeneity (I2 = 98.81%). Regarding affective empathy, the mean of Cronbach's α was 0.81 (95% CI: 0.76-0.84), with high levels of heterogeneity. It was concluded that BES is appropriate to be used in general population groups, although not recommended for clinical diagnosis; and there is a moderate to high heterogeneity in the mean of Cronbach's α. The practical implications of the results in mean estimation and heterogeneity are discussed.
Collapse
Affiliation(s)
- Javier Cabedo-Peris
- Department of Basic Psychology, Faculty of Psychology and Speech Therapy, Universitat de València, 46010 Valencia, Spain;
| | - Manuel Martí-Vilar
- Department of Basic Psychology, Faculty of Psychology and Speech Therapy, Universitat de València, 46010 Valencia, Spain;
| | - César Merino-Soto
- Research Institute of the School of Psychology, Universidad de San Martín de Porres, Lima 15102, Peru
| | - Mafalda Ortiz-Morán
- Department of Psychology, Faculty of Psychology, Universidad Nacional Federico Villarreal, Lima 15088, Peru;
| |
Collapse
|