1
|
Colacci M, Huang YQ, Postill G, Zhelnov P, Fennelly O, Verma A, Straus S, Tricco AC. Sociodemographic bias in clinical machine learning models: a scoping review of algorithmic bias instances and mechanisms. J Clin Epidemiol 2025; 178:111606. [PMID: 39532254 DOI: 10.1016/j.jclinepi.2024.111606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/22/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND AND OBJECTIVES Clinical machine learning (ML) technologies can sometimes be biased and their use could exacerbate health disparities. The extent to which bias is present, the groups who most frequently experience bias, and the mechanism through which bias is introduced in clinical ML applications is not well described. The objective of this study was to examine instances of bias in clinical ML models. We identified the sociodemographic subgroups PROGRESS that experienced bias and the reported mechanisms of bias introduction. METHODS We searched MEDLINE, EMBASE, PsycINFO, and Web of Science for all studies that evaluated bias on sociodemographic factors within ML algorithms created for the purpose of facilitating clinical care. The scoping review was conducted according to the Joanna Briggs Institute guide and reported using the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for scoping reviews. RESULTS We identified 6448 articles, of which 760 reported on a clinical ML model and 91 (12.0%) completed a bias evaluation and met all inclusion criteria. Most studies evaluated a single sociodemographic factor (n = 56, 61.5%). The most frequently evaluated sociodemographic factor was race (n = 59, 64.8%), followed by sex/gender (n = 41, 45.1%), and age (n = 24, 26.4%), with one study (1.1%) evaluating intersectional factors. Of all studies, 74.7% (n = 68) reported that bias was present, 18.7% (n = 17) reported bias was not present, and 6.6% (n = 6) did not state whether bias was present. When present, 87% of studies reported bias against groups with socioeconomic disadvantage. CONCLUSION Most ML algorithms that were evaluated for bias demonstrated bias on sociodemographic factors. Furthermore, most bias evaluations concentrated on race, sex/gender, and age, while other sociodemographic factors and their intersection were infrequently assessed. Given potential health equity implications, bias assessments should be completed for all clinical ML models.
Collapse
Affiliation(s)
- Michael Colacci
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada.
| | - Yu Qing Huang
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| | - Gemma Postill
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Pavel Zhelnov
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada
| | - Orna Fennelly
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada
| | - Amol Verma
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Sharon Straus
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Andrea C Tricco
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| |
Collapse
|
2
|
Mehmood R, Lazar M, Liang X, Corchado JM, See S. Editorial: Protecting privacy in neuroimaging analysis: balancing data sharing and privacy preservation. Front Neuroinform 2025; 18:1543121. [PMID: 39839854 PMCID: PMC11746894 DOI: 10.3389/fninf.2024.1543121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Accepted: 12/17/2024] [Indexed: 01/23/2025] Open
Affiliation(s)
- Rashid Mehmood
- Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi Arabia
| | - Mariana Lazar
- Grossman School of Medicine, New York University, New York, NY, United States
| | - Xiaohui Liang
- Department of Computer Science, University of Massachusetts, Boston, MA, United States
| | - Juan M. Corchado
- BISITE Research Group, University of Salamanca, Salamanca, Spain
- Air Institute, IoT Digital Innovation Hub, Salamanca, Spain
- Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, Osaka, Japan
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, United States
| |
Collapse
|
3
|
De Bonis MLN, Fasano G, Lombardi A, Ardito C, Ferrara A, Di Sciascio E, Di Noia T. Explainable brain age prediction: a comparative evaluation of morphometric and deep learning pipelines. Brain Inform 2024; 11:33. [PMID: 39692946 DOI: 10.1186/s40708-024-00244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 11/23/2024] [Indexed: 12/19/2024] Open
Abstract
Brain age, a biomarker reflecting brain health relative to chronological age, is increasingly used in neuroimaging to detect early signs of neurodegenerative diseases and support personalized treatment plans. Two primary approaches for brain age prediction have emerged: morphometric feature extraction from MRI scans and deep learning (DL) applied to raw MRI data. However, a systematic comparison of these methods regarding performance, interpretability, and clinical utility has been limited. In this study, we present a comparative evaluation of two pipelines: one using morphometric features from FreeSurfer and the other employing 3D convolutional neural networks (CNNs). Using a multisite neuroimaging dataset, we assessed both model performance and the interpretability of predictions through eXplainable Artificial Intelligence (XAI) methods, applying SHAP to the feature-based pipeline and Grad-CAM and DeepSHAP to the CNN-based pipeline. Our results show comparable performance between the two pipelines in Leave-One-Site-Out (LOSO) validation, achieving state-of-the-art performance on the independent test set ( M A E = 3.21 with DNN and morphometric features and M A E = 3.08 with a DenseNet-121 architecture). SHAP provided the most consistent and interpretable results, while DeepSHAP exhibited greater variability. Further work is needed to assess the clinical utility of Grad-CAM. This study addresses a critical gap by systematically comparing the interpretability of multiple XAI methods across distinct brain age prediction pipelines. Our findings underscore the importance of integrating XAI into clinical practice, offering insights into how XAI outputs vary and their potential utility for clinicians.
Collapse
Affiliation(s)
- Maria Luigia Natalia De Bonis
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| | - Giuseppe Fasano
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| | - Angela Lombardi
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy.
| | - Carmelo Ardito
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| | - Antonio Ferrara
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| | - Eugenio Di Sciascio
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| | - Tommaso Di Noia
- Department of Electrical and Information Engineering, Polytechnic University of Bari, Via E. Orabona, 4, 70125, Bari, Italy
| |
Collapse
|
4
|
Alsaigh R, Mehmood R, Katib I, Liang X, Alshanqiti A, Corchado JM, See S. Harmonizing AI governance regulations and neuroinformatics: perspectives on privacy and data sharing. Front Neuroinform 2024; 18:1472653. [PMID: 39741922 PMCID: PMC11685213 DOI: 10.3389/fninf.2024.1472653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 12/04/2024] [Indexed: 01/03/2025] Open
Affiliation(s)
- Roba Alsaigh
- Department of Computer Science, Faculty of Computing and Information Technology (FCIT), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Rashid Mehmood
- Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi Arabia
| | - Iyad Katib
- Department of Computer Science, Faculty of Computing and Information Technology (FCIT), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Xiaohui Liang
- Department of Computer Science, University of Massachusetts, Boston, MA, United States
| | - Abdullah Alshanqiti
- Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi Arabia
| | - Juan M. Corchado
- BISITE Research Group, University of Salamanca, Salamanca, Spain
- Air Institute, IoT Digital Innovation Hub, Salamanca, Spain
- Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, Osaka, Japan
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, United States
| |
Collapse
|
5
|
Souza R, Stanley EAM, Gulve V, Moore J, Kang C, Camicioli R, Monchi O, Ismail Z, Wilms M, Forkert ND. HarmonyTM: multi-center data harmonization applied to distributed learning for Parkinson's disease classification. J Med Imaging (Bellingham) 2024; 11:054502. [PMID: 39308760 PMCID: PMC11413651 DOI: 10.1117/1.jmi.11.5.054502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/29/2024] [Accepted: 09/03/2024] [Indexed: 09/25/2024] Open
Abstract
Purpose Distributed learning is widely used to comply with data-sharing regulations and access diverse datasets for training machine learning (ML) models. The traveling model (TM) is a distributed learning approach that sequentially trains with data from one center at a time, which is especially advantageous when dealing with limited local datasets. However, a critical concern emerges when centers utilize different scanners for data acquisition, which could potentially lead models to exploit these differences as shortcuts. Although data harmonization can mitigate this issue, current methods typically rely on large or paired datasets, which can be impractical to obtain in distributed setups. Approach We introduced HarmonyTM, a data harmonization method tailored for the TM. HarmonyTM effectively mitigates bias in the model's feature representation while retaining crucial disease-related information, all without requiring extensive datasets. Specifically, we employed adversarial training to "unlearn" bias from the features used in the model for classifying Parkinson's disease (PD). We evaluated HarmonyTM using multi-center three-dimensional (3D) neuroimaging datasets from 83 centers using 23 different scanners. Results Our results show that HarmonyTM improved PD classification accuracy from 72% to 76% and reduced (unwanted) scanner classification accuracy from 53% to 30% in the TM setup. Conclusion HarmonyTM is a method tailored for harmonizing 3D neuroimaging data within the TM approach, aiming to minimize shortcut learning in distributed setups. This prevents the disease classifier from leveraging scanner-specific details to classify patients with or without PD-a key aspect for deploying ML models for clinical applications.
Collapse
Affiliation(s)
- Raissa Souza
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Biomedical Engineering Graduate Program, Calgary, Alberta, Canada
- University of Calgary, Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
| | - Emma A. M. Stanley
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Biomedical Engineering Graduate Program, Calgary, Alberta, Canada
- University of Calgary, Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
| | - Vedant Gulve
- Indian Institute of Technology, Department of Electronics and Electrical Communication Engineering, Kharagpur, West Bengal, India
| | - Jasmine Moore
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Biomedical Engineering Graduate Program, Calgary, Alberta, Canada
- University of Calgary, Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
| | - Chris Kang
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
| | - Richard Camicioli
- University of Alberta, Neuroscience and Mental Health Institute and Department of Medicine (Neurology), Edmonton, Alberta, Canada
| | - Oury Monchi
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- Université de Montréal, Department of Radiology, Radio-oncology and Nuclear Medicine, Montréal, Quebec, Canada
- Centre de Recherche, Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
- University of Calgary, Department of Clinical Neurosciences, Cumming School of Medicine, Calgary, Alberta, Canada
| | - Zahinoor Ismail
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Department of Clinical Neurosciences, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Department of Psychiatry, Calgary, Alberta, Canada
- University of Exeter, Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, Exeter, United Kingdom
| | - Matthias Wilms
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
- University of Calgary, Department of Pediatrics, Calgary, Alberta, Canada
- University of Calgary, Department of Community Health Sciences, Calgary, Alberta, Canada
| | - Nils D. Forkert
- University of Calgary, Department of Radiology, Cumming School of Medicine, Calgary, Alberta, Canada
- University of Calgary, Hotchkiss Brain Institute, Calgary, Alberta, Canada
- University of Calgary, Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
- University of Calgary, Department of Clinical Neurosciences, Cumming School of Medicine, Calgary, Alberta, Canada
| |
Collapse
|
6
|
Schielen SJC, Pilmeyer J, Aldenkamp AP, Zinger S. The diagnosis of ASD with MRI: a systematic review and meta-analysis. Transl Psychiatry 2024; 14:318. [PMID: 39095368 PMCID: PMC11297045 DOI: 10.1038/s41398-024-03024-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 06/25/2024] [Accepted: 07/15/2024] [Indexed: 08/04/2024] Open
Abstract
While diagnosing autism spectrum disorder (ASD) based on an objective test is desired, the current diagnostic practice involves observation-based criteria. This study is a systematic review and meta-analysis of studies that aim to diagnose ASD using magnetic resonance imaging (MRI). The main objective is to describe the state of the art of diagnosing ASD using MRI in terms of performance metrics and interpretation. Furthermore, subgroups, including different MRI modalities and statistical heterogeneity, are analyzed. Studies that dichotomously diagnose individuals with ASD and healthy controls by analyses progressing from magnetic resonance imaging obtained in a resting state were systematically selected by two independent reviewers. Studies were sought on Web of Science and PubMed, which were last accessed on February 24, 2023. The included studies were assessed on quality and risk of bias using the revised Quality Assessment of Diagnostic Accuracy Studies tool. A bivariate random-effects model was used for syntheses. One hundred and thirty-four studies were included comprising 159 eligible experiments. Despite the overlap in the studied samples, an estimated 4982 unique participants consisting of 2439 individuals with ASD and 2543 healthy controls were included. The pooled summary estimates of diagnostic performance are 76.0% sensitivity (95% CI 74.1-77.8), 75.7% specificity (95% CI 74.0-77.4), and an area under curve of 0.823, but uncertainty in the study assessments limits confidence. The main limitations are heterogeneity and uncertainty about the generalization of diagnostic performance. Therefore, comparisons between subgroups were considered inappropriate. Despite the current limitations, methods progressing from MRI approach the diagnostic performance needed for clinical practice. The state of the art has obstacles but shows potential for future clinical application.
Collapse
Affiliation(s)
- Sjir J C Schielen
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands.
| | - Jesper Pilmeyer
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
| | - Albert P Aldenkamp
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
- Department of Behavioral Sciences, Epilepsy Center Kempenhaeghe, Heeze, the Netherlands
| | - Svitlana Zinger
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
| |
Collapse
|
7
|
Kulyabin M, Zhdanov A, Maier A, Loh L, Estevez JJ, Constable PA. Generating Synthetic Light-Adapted Electroretinogram Waveforms Using Artificial Intelligence to Improve Classification of Retinal Conditions in Under-Represented Populations. J Ophthalmol 2024; 2024:1990419. [PMID: 39045382 PMCID: PMC11265936 DOI: 10.1155/2024/1990419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 05/27/2024] [Accepted: 06/25/2024] [Indexed: 07/25/2024] Open
Abstract
Visual electrophysiology is often used clinically to determine the functional changes associated with retinal or neurological conditions. The full-field flash electroretinogram (ERG) assesses the global contribution of the outer and inner retinal layers initiated by the rods and cone pathways depending on the state of retinal adaptation. Within clinical centers, reference normative data are used to compare clinical cases that may be rare or underpowered within a specific demographic. To bolster either the reference dataset or the case dataset, the application of synthetic ERG waveforms may offer benefits to disease classification and case-control studies. In this study and as a proof of concept, artificial intelligence (AI) to generate synthetic signals using generative adversarial networks is deployed to upscale male participants within an ISCEV reference dataset containing 68 participants, with waveforms from the right and left eye. Random forest classifiers further improved classification for sex within the group from a balanced accuracy of 0.72-0.83 with the added synthetic male waveforms. This is the first study to demonstrate the generation of synthetic ERG waveforms to improve machine learning classification modelling with electroretinogram waveforms.
Collapse
Affiliation(s)
- Mikhail Kulyabin
- Pattern Recognition LabDepartment of Computer ScienceFriedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Aleksei Zhdanov
- Engineering School of Information TechnologiesTelecommunications and Control SystemsUral Federal University, Yekaterinburg, Russia
| | - Andreas Maier
- Pattern Recognition LabDepartment of Computer ScienceFriedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Lynne Loh
- Flinders UniversityCollege of Nursing and Health SciencesCaring Futures Institute, Adelaide, South Australia, Australia
| | - Jose J. Estevez
- Flinders UniversityCollege of Nursing and Health SciencesCaring Futures Institute, Adelaide, South Australia, Australia
| | - Paul A. Constable
- Flinders UniversityCollege of Nursing and Health SciencesCaring Futures Institute, Adelaide, South Australia, Australia
| |
Collapse
|
8
|
Marzi C, Giannelli M, Barucci A, Tessa C, Mascalchi M, Diciotti S. Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets. Sci Data 2024; 11:115. [PMID: 38263181 PMCID: PMC10805868 DOI: 10.1038/s41597-023-02421-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/27/2023] [Indexed: 01/25/2024] Open
Abstract
Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T1-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.
Collapse
Affiliation(s)
- Chiara Marzi
- Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy
- "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy
| | - Marco Giannelli
- Unit of Medical Physics, Pisa University Hospital "Azienda Ospedaliero-Universitaria Pisana", 56126, Pisa, Italy
| | - Andrea Barucci
- "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy
| | - Carlo Tessa
- Radiology Unit Apuane e Lunigiana, Azienda USL Toscana Nord Ovest, 54100, Massa, Italy
| | - Mario Mascalchi
- Department of Experimental and Clinical Biomedical Sciences "Mario Serio", University of Florence, 50139, Florence, Italy
- Division of Epidemiology and Clinical Governance, Institute for Study, Prevention and netwoRk in Oncology (ISPRO), 50139, Florence, Italy
| | - Stefano Diciotti
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy.
- Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, 40121, Bologna, Italy.
| |
Collapse
|
9
|
Saponaro S, Lizzi F, Serra G, Mainas F, Oliva P, Giuliano A, Calderoni S, Retico A. Deep learning based joint fusion approach to exploit anatomical and functional brain information in autism spectrum disorders. Brain Inform 2024; 11:2. [PMID: 38194126 PMCID: PMC10776521 DOI: 10.1186/s40708-023-00217-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 12/20/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND The integration of the information encoded in multiparametric MRI images can enhance the performance of machine-learning classifiers. In this study, we investigate whether the combination of structural and functional MRI might improve the performances of a deep learning (DL) model trained to discriminate subjects with Autism Spectrum Disorders (ASD) with respect to typically developing controls (TD). MATERIAL AND METHODS We analyzed both structural and functional MRI brain scans publicly available within the ABIDE I and II data collections. We considered 1383 male subjects with age between 5 and 40 years, including 680 subjects with ASD and 703 TD from 35 different acquisition sites. We extracted morphometric and functional brain features from MRI scans with the Freesurfer and the CPAC analysis packages, respectively. Then, due to the multisite nature of the dataset, we implemented a data harmonization protocol. The ASD vs. TD classification was carried out with a multiple-input DL model, consisting in a neural network which generates a fixed-length feature representation of the data of each modality (FR-NN), and a Dense Neural Network for classification (C-NN). Specifically, we implemented a joint fusion approach to multiple source data integration. The main advantage of the latter is that the loss is propagated back to the FR-NN during the training, thus creating informative feature representations for each data modality. Then, a C-NN, with a number of layers and neurons per layer to be optimized during the model training, performs the ASD-TD discrimination. The performance was evaluated by computing the Area under the Receiver Operating Characteristic curve within a nested 10-fold cross-validation. The brain features that drive the DL classification were identified by the SHAP explainability framework. RESULTS The AUC values of 0.66±0.05 and of 0.76±0.04 were obtained in the ASD vs. TD discrimination when only structural or functional features are considered, respectively. The joint fusion approach led to an AUC of 0.78±0.04. The set of structural and functional connectivity features identified as the most important for the two-class discrimination supports the idea that brain changes tend to occur in individuals with ASD in regions belonging to the Default Mode Network and to the Social Brain. CONCLUSIONS Our results demonstrate that the multimodal joint fusion approach outperforms the classification results obtained with data acquired by a single MRI modality as it efficiently exploits the complementarity of structural and functional brain information.
Collapse
Affiliation(s)
- Sara Saponaro
- Medical Physics School, University of Pisa, Pisa, Italy.
- National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy.
| | - Francesca Lizzi
- National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy
| | - Giacomo Serra
- Department of Physics, University of Cagliari, Cagliari, Italy
- INFN, Cagliari Division, Cagliari, Italy
| | - Francesca Mainas
- INFN, Cagliari Division, Cagliari, Italy
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Piernicola Oliva
- INFN, Cagliari Division, Cagliari, Italy
- Department of Chemical, Physical, Mathematical and Natural Sciences, University of Sassari, Sassari, Italy
| | - Alessia Giuliano
- Unit of Medical Physics, Pisa University Hospital "Azienda Ospedaliero-Universitaria Pisana", Pisa, Italy
| | - Sara Calderoni
- Developmental Psychiatry Unit - IRCCS Stella Maris Foundation, Pisa, Italy
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Alessandra Retico
- National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy
| |
Collapse
|
10
|
Serra G, Mainas F, Golosio B, Retico A, Oliva P. Effect of data harmonization of multicentric dataset in ASD/TD classification. Brain Inform 2023; 10:32. [PMID: 38006422 PMCID: PMC10676338 DOI: 10.1186/s40708-023-00210-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/16/2023] [Indexed: 11/27/2023] Open
Abstract
Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlates in neurological and neurodevelopmental disorders. ML requires datasets of appropriate size for training, which in neuroimaging are typically obtained collecting data from multiple acquisition centers. However, analyzing large multicentric datasets can introduce bias due to differences between acquisition centers. ComBat harmonization is commonly used to address batch effects, but it can lead to data leakage when the entire dataset is used to estimate model parameters. In this study, structural and functional MRI data from the Autism Brain Imaging Data Exchange (ABIDE) collection were used to classify subjects with Autism Spectrum Disorders (ASD) compared to Typical Developing controls (TD). We compared the classical approach (external harmonization) in which harmonization is performed before train/test split, with an harmonization calculated only on the train set (internal harmonization), and with the dataset with no harmonization. The results showed that harmonization using the whole dataset achieved higher discrimination performance, while non-harmonized data and harmonization using only the train set showed similar results, for both structural and connectivity features. We also showed that the higher performances of the external harmonization are not due to larger size of the sample for the estimation of the model and hence these improved performance with the entire dataset may be ascribed to data leakage. In order to prevent this leakage, it is recommended to define the harmonization model solely using the train set.
Collapse
Affiliation(s)
- Giacomo Serra
- Department of Physics, University of Cagliari, Cagliari, Italy
- National Institute for Nuclear Physics (INFN), Cagliari Division, Cagliari, Italy
| | - Francesca Mainas
- Department of Physics, University of Cagliari, Cagliari, Italy.
- National Institute for Nuclear Physics (INFN), Cagliari Division, Cagliari, Italy.
| | - Bruno Golosio
- Department of Physics, University of Cagliari, Cagliari, Italy
- National Institute for Nuclear Physics (INFN), Cagliari Division, Cagliari, Italy
| | - Alessandra Retico
- National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy
| | - Piernicola Oliva
- National Institute for Nuclear Physics (INFN), Cagliari Division, Cagliari, Italy
- Department of Chemical, Physical, Mathematical and Natural Sciences, University of Sassari, Sassari, Italy
| |
Collapse
|
11
|
Ali MT, Gebreil A, ElNakieb Y, Elnakib A, Shalaby A, Mahmoud A, Sleman A, Giridharan GA, Barnes G, Elbaz AS. A personalized classification of behavioral severity of autism spectrum disorder using a comprehensive machine learning framework. Sci Rep 2023; 13:17048. [PMID: 37813914 PMCID: PMC10562430 DOI: 10.1038/s41598-023-43478-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 09/25/2023] [Indexed: 10/11/2023] Open
Abstract
Autism Spectrum Disorder (ASD) is characterized as a neurodevelopmental disorder with a heterogeneous nature, influenced by genetics and exhibiting diverse clinical presentations. In this study, we dissect Autism Spectrum Disorder (ASD) into its behavioral components, mirroring the diagnostic process used in clinical settings. Morphological features are extracted from magnetic resonance imaging (MRI) scans, found in the publicly available dataset ABIDE II, identifying the most discriminative features that differentiate ASD within various behavioral domains. Then, each subject is categorized as having severe, moderate, or mild ASD, or typical neurodevelopment (TD), based on the behavioral domains of the Social Responsiveness Scale (SRS). Through this study, multiple artificial intelligence (AI) models are utilized for feature selection and classifying each ASD severity and behavioural group. A multivariate feature selection algorithm, investigating four different classifiers with linear and non-linear hypotheses, is applied iteratively while shuffling the training-validation subjects to find the set of cortical regions with statistically significant association with ASD. A set of six classifiers are optimized and trained on the selected set of features using 5-fold cross-validation for the purpose of severity classification for each behavioural group. Our AI-based model achieved an average accuracy of 96%, computed as the mean accuracy across the top-performing AI models for feature selection and severity classification across the different behavioral groups. The proposed AI model has the ability to accurately differentiate between the functionalities of specific brain regions, such as the left and right caudal middle frontal regions. We propose an AI-based model that dissects ASD into behavioral components. For each behavioral component, the AI-based model is capable of identifying the brain regions which are associated with ASD as well as utilizing those regions for diagnosis. The proposed system can increase the speed and accuracy of the diagnostic process and result in improved outcomes for individuals with ASD, highlighting the potential of AI in this area.
Collapse
Affiliation(s)
- Mohamed T Ali
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA
- UT Southwestern Medical Center, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ahmad Gebreil
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA
| | - Yaser ElNakieb
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA
- UT Southwestern Medical Center, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ahmed Elnakib
- Electrical and Computer Engineering, Penn State Erie-The Behrend College, Erie, PA, 16563, USA
| | - Ahmed Shalaby
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ali Mahmoud
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA
| | - Ahmed Sleman
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA
| | | | - Gregory Barnes
- Department of Neurology and Pediatric Research Institute, University of Louisville, Louisville, KY, 40202, USA
| | - Ayman S Elbaz
- Bioengineering Department, University of Louisville, Louisville, KY, 40292, USA.
| |
Collapse
|
12
|
Dhinagar NJ, Santhalingam V, Lawrence KE, Laltoo E, Thompson PM. Few-Shot Classification of Autism Spectrum Disorder using Site-Agnostic Meta-Learning and Brain MRI. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-6. [PMID: 38082874 DOI: 10.1109/embc40787.2023.10340852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
For machine learning applications in medical imaging, the availability of training data is often limited, which hampers the design of radiological classifiers for subtle conditions such as autism spectrum disorder (ASD). Transfer learning is one method to counter this problem of low training data regimes. Here we explore the use of meta-learning for very low data regimes in the context of having prior data from multiple sites - an approach we term site-agnostic meta-learning. Inspired by the effectiveness of meta-learning for optimizing a model across multiple tasks, here we propose a framework to adapt it to learn across multiple sites. We tested our meta-learning model for classifying ASD versus typically developing controls in 2,201 T1-weighted (T1-w) MRI scans collected from 38 imaging sites as part of Autism Brain Imaging Data Exchange (ABIDE) [age: 5.2 -64.0 years]. The method was trained to find a good initialization state for our model that can quickly adapt to data from new unseen sites by fine-tuning on the limited data that is available. The proposed method achieved an area under the receiver operating characteristic curve (ROC-AUC)=0.857 on 370 scans from 7 unseen sites in ABIDE using a few-shot setting of 2-way 20-shot i.e., 20 training samples per site. Our results outperformed a transfer learning baseline by generalizing across a wider range of sites as well as other related prior work. We also tested our model in a zero-shot setting on an independent test site without any additional fine-tuning. Our experiments show the promise of the proposed site-agnostic meta-learning framework for challenging neuroimaging tasks involving multi-site heterogeneity with limited availability of training data.Clinical Relevance- We propose a learning framework that accommodates multi-site heterogeneity and limited data to assist in challenging neuroimaging tasks.
Collapse
|
13
|
Rana A, Dumka A, Singh R, Panda MK, Priyadarshi N. A Computerized Analysis with Machine Learning Techniques for the Diagnosis of Parkinson's Disease: Past Studies and Future Perspectives. Diagnostics (Basel) 2022; 12:2708. [PMID: 36359550 PMCID: PMC9689408 DOI: 10.3390/diagnostics12112708] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 10/30/2022] [Accepted: 11/02/2022] [Indexed: 08/03/2023] Open
Abstract
According to the World Health Organization (WHO), Parkinson's disease (PD) is a neurodegenerative disease of the brain that causes motor symptoms including slower movement, rigidity, tremor, and imbalance in addition to other problems like Alzheimer's disease (AD), psychiatric problems, insomnia, anxiety, and sensory abnormalities. Techniques including artificial intelligence (AI), machine learning (ML), and deep learning (DL) have been established for the classification of PD and normal controls (NC) with similar therapeutic appearances in order to address these problems and improve the diagnostic procedure for PD. In this article, we examine a literature survey of research articles published up to September 2022 in order to present an in-depth analysis of the use of datasets, various modalities, experimental setups, and architectures that have been applied in the diagnosis of subjective disease. This analysis includes a total of 217 research publications with a list of the various datasets, methodologies, and features. These findings suggest that ML/DL methods and novel biomarkers hold promising results for application in medical decision-making, leading to a more methodical and thorough detection of PD. Finally, we highlight the challenges and provide appropriate recommendations on selecting approaches that might be used for subgrouping and connection analysis with structural magnetic resonance imaging (sMRI), DaTSCAN, and single-photon emission computerized tomography (SPECT) data for future Parkinson's research.
Collapse
Affiliation(s)
- Arti Rana
- Computer Science & Engineering, Veer Madho Singh Bhandari Uttarakhand Technical University, Dehradun 248007, Uttarakhand, India
| | - Ankur Dumka
- Department of Computer Science and Engineering, Women Institute of Technology, Dehradun 248007, Uttarakhand, India
- Department of Computer Science & Engineering, Graphic Era Deemed to be University, Dehradun 248001, Uttarakhand, India
| | - Rajesh Singh
- Division of Research and Innovation, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun 248007, Uttarakhand, India
- Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico
| | - Manoj Kumar Panda
- Department of Electrical Engineering, G.B. Pant Institute of Engineering and Technology, Pauri 246194, Uttarakhand, India
| | - Neeraj Priyadarshi
- Department of Electrical Engineering, JIS College of Engineering, Kolkata 741235, West Bengal, India
| |
Collapse
|