1
|
Alsaedi S, Ogasawara M, Alarawi M, Gao X, Gojobori T. AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare. NAR Genom Bioinform 2025; 7:lqaf038. [PMID: 40330081 PMCID: PMC12051108 DOI: 10.1093/nargab/lqaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 02/11/2025] [Accepted: 04/17/2025] [Indexed: 05/08/2025] Open
Abstract
The convergence of artificial intelligence (AI) and biomedical data is transforming precision medicine by enabling the use of genetic risk factors (GRFs) for customized healthcare services based on individual needs. Although GRFs play an essential role in disease susceptibility, progression, and therapeutic outcomes, a gap exists in exploring their contribution to AI-powered precision medicine. This paper addresses this need by investigating the significance and potential of utilizing GRFs with AI in the medical field. We examine their applications, particularly emphasizing their impact on disease prediction, treatment personalization, and overall healthcare improvement. This review explores the application of AI algorithms to optimize the use of GRFs, aiming to advance precision medicine in disease screening, patient stratification, drug discovery, and understanding disease mechanisms. Through a variety of case studies and examples, we demonstrate the potential of incorporating GRFs facilitated by AI into medical practice, resulting in more precise diagnoses, targeted therapies, and improved patient outcomes. This review underscores the potential of GRFs, empowered by AI, to enhance precision medicine by improving diagnostic accuracy, treatment precision, and individualized healthcare solutions.
Collapse
Affiliation(s)
- Sakhaa Alsaedi
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- College of Computer Science and Engineering (CCSE), Taibah University, 42353 Madinah, Kingdom of Saudi Arabia
| | - Michihiro Ogasawara
- Department of Internal Medicine and Rheumatology, Juntendo University, 113-8431 Tokyo, Japan
| | - Mohammed Alarawi
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Takashi Gojobori
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Marine Open Innovation Institute (MaOI), 113-8431 Shizuoka, Japan
| |
Collapse
|
2
|
Cersosimo A, Zito E, Pierucci N, Matteucci A, La Fazia VM. A Talk with ChatGPT: The Role of Artificial Intelligence in Shaping the Future of Cardiology and Electrophysiology. J Pers Med 2025; 15:205. [PMID: 40423076 DOI: 10.3390/jpm15050205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2025] [Revised: 05/08/2025] [Accepted: 05/12/2025] [Indexed: 05/28/2025] Open
Abstract
Background: Artificial intelligence (AI) is poised to significantly impact the future of cardiology and electrophysiology, offering new tools to interpret complex datasets, improve diagnosis, optimize clinical workflows, and personalize therapy. ChatGPT-4o, a leading AI-based language model, exemplifies the transformative potential of AI in clinical research, medical education, and patient care. Aim and Methods: In this paper, we present an exploratory dialogue with ChatGPT to assess the role of AI in shaping the future of cardiology, with a particular focus on arrhythmia management and cardiac electrophysiology. Topics discussed include AI applications in ECG interpretation, arrhythmia detection, procedural guidance during ablation, and risk stratification for sudden cardiac death. We also examine the risks associated with AI use, including overreliance, interpretability challenges, data bias, and generalizability. Conclusions: The integration of AI into cardiovascular care offers the potential to enhance diagnostic accuracy, tailor interventions, and support decision-making. However, the adoption of AI must be carefully balanced with clinical expertise and ethical considerations. By fostering collaboration between clinicians and AI developers, it is possible to guide the development of reliable, transparent, and effective tools that will shape the future of personalized cardiology and electrophysiology.
Collapse
Affiliation(s)
- Angelica Cersosimo
- ASST Spedali Civili di Brescia, Division of Cardiology and Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, 25121 Brescia, Italy
| | - Elio Zito
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX 78705, USA
| | - Nicola Pierucci
- Department of Cardiovascular, Respiratory, Nephrological, Anesthesiological and Geriatric Sciences, "Sapienza" University of Rome, 00185 Rome, Italy
| | - Andrea Matteucci
- Department of Experimental Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Vincenzo Mirco La Fazia
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX 78705, USA
- Department of Experimental Medicine, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
3
|
Yeganegi M, Danaei M, Azizi S, Jayervand F, Bahrami R, Dastgheib SA, Rashnavadi H, Masoudi A, Shiri A, Aghili K, Noorishadkam M, Neamatzadeh H. Research advancements in the Use of artificial intelligence for prenatal diagnosis of neural tube defects. Front Pediatr 2025; 13:1514447. [PMID: 40313675 PMCID: PMC12043698 DOI: 10.3389/fped.2025.1514447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 02/03/2025] [Indexed: 05/03/2025] Open
Abstract
Artificial Intelligence is revolutionizing prenatal diagnostics by enhancing the accuracy and efficiency of procedures. This review explores AI and machine learning (ML) in the early detection, prediction, and assessment of neural tube defects (NTDs) through prenatal ultrasound imaging. Recent studies highlight the effectiveness of AI techniques, such as convolutional neural networks (CNNs) and support vector machines (SVMs), achieving detection accuracy rates of up to 95% across various datasets, including fetal ultrasound images, genetic data, and maternal health records. SVM models have demonstrated 71.50% accuracy on training datasets and 68.57% on testing datasets for NTD classification, while advanced deep learning (DL) methods report patient-level prediction accuracy of 94.5% and an area under the receiver operating characteristic curve (AUROC) of 99.3%. AI integration with genomic analysis has identified key biomarkers associated with NTDs, such as Growth Associated Protein 43 (GAP43) and Glial Fibrillary Acidic Protein (GFAP), with logistic regression models achieving 86.67% accuracy. Current AI-assisted ultrasound technologies have improved diagnostic accuracy, yielding sensitivity and specificity rates of 88.9% and 98.0%, respectively, compared to traditional methods with 81.5% sensitivity and 92.2% specificity. AI systems have also streamlined workflows, reducing median scan times from 19.7 min to 11.4 min, allowing sonographers to prioritize critical patient care. Advancements in DL algorithms, including Oct-U-Net and PAICS, have achieved recall and precision rates of 0.93 and 0.96, respectively, in identifying fetal abnormalities. Moreover, AI's evolving role in genetic research supports personalized NTD prevention strategies and enhances public awareness through AI-generated health messages. In conclusion, the integration of AI in prenatal diagnostics significantly improves the detection and assessment of NTDs, leading to greater accuracy and efficiency in ultrasound imaging. As AI continues to advance, it has the potential to further enhance personalized healthcare strategies and raise public awareness about NTDs, ultimately contributing to better maternal and fetal outcomes.
Collapse
Affiliation(s)
- Maryam Yeganegi
- Department of Obstetrics and Gynecology, School of Medicine, Iranshahr University of Medical Sciences, Iranshahr, Iran
| | - Mahsa Danaei
- Department of Obstetrics and Gynecology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Sepideh Azizi
- Shahid Akbarabadi Clinical Research Development Unit, Iran University of Medical Sciences, Tehran, Iran
| | - Fatemeh Jayervand
- Department of Obstetrics and Gynecology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Reza Bahrami
- Neonatal Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Seyed Alireza Dastgheib
- Department of Medical Genetics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Heewa Rashnavadi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Masoudi
- School of Medicine, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Amirmasoud Shiri
- School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Kazem Aghili
- Department of Radiology, School of Medicine, Shahid Rahnamoun Hospital, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Mahood Noorishadkam
- Mother and Newborn Health Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Hossein Neamatzadeh
- Mother and Newborn Health Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| |
Collapse
|
4
|
Li S, Arora S, Attaoua R, Hamet P, Tremblay J, Bihlo A, Liu B, Rutter G. Leveraging hierarchical structures for genetic block interaction studies using the hierarchical transformer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2024.11.18.24317486. [PMID: 39606365 PMCID: PMC11601704 DOI: 10.1101/2024.11.18.24317486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Initially introduced in 1909 by William Bateson, classic epistasis (genetic variant interaction) refers to the phenomenon that one variant prevents another variant from a different locus from manifesting its effects. The potential effects of genetic variant interactions on complex diseases have been recognized for the past decades. Moreover, It has been studied and demonstrated that leveraging the combined SNP effects within the genetic block can significantly increase calculation power, reducing background noise, ultimately leading to novel epistasis discovery that the single SNP statistical epistasis study might overlook. However, it is still an open question how we can best combine gene structure representation modelling and interaction learning into an end-to-end model for gene interaction searching. Here, in the current study, we developed a neural genetic block interaction searching model that can effectively process large SNP chip inputs and output the potential genetic block interaction heatmap. Our model augments a previously published hierarchical transformer architecture (Liu and Lapata, 2019) with the ability to model genetic blocks. The cross-block relationship mapping was achieved via a hierarchical attention mechanism which allows the sharing of information regarding specific phenotypes, as opposed to simple unsupervised dimensionality reduction methods e.g. PCA. Results on both simulation and UK Biobank studies show our model brings substantial improvements compared to traditional exhaustive searching and neural network methods.
Collapse
Affiliation(s)
- Shiying Li
- Centre de Recherche du CHUM, and Faculty of Medicine, University of Montreal, QC, Canada
| | - Shivam Arora
- Department of Mathematics and Statistics, Memorial University of Newfoundland, NL, Canada
| | - Redha Attaoua
- Centre de Recherche du CHUM, and Faculty of Medicine, University of Montreal, QC, Canada
| | - Pavel Hamet
- Centre de Recherche du CHUM, and Faculty of Medicine, University of Montreal, QC, Canada
| | - Johanne Tremblay
- Centre de Recherche du CHUM, and Faculty of Medicine, University of Montreal, QC, Canada
| | - Alexander Bihlo
- Department of Mathematics and Statistics, Memorial University of Newfoundland, NL, Canada
| | - Bang Liu
- Département d’informatique et de recherche opérationnelle, Université de Montréal, QC, Canada
| | - Guy Rutter
- Centre de Recherche du CHUM, and Faculty of Medicine, University of Montreal, QC, Canada
- Section of Cell Biology and Functional Genomics, Department of Metabolism, Diabetes and Reproduction, Imperial College of London, du Cane Road, London W120NN, United Kingdom
- Lee Kong Chian School of Medicine, Nan Yang Technological University, Singapore
| |
Collapse
|
5
|
Strudwick J, Gardiner LJ, Denning-James K, Haiminen N, Evans A, Kelly J, Madgwick M, Utro F, Seabolt E, Gibson C, Bedi B, Clayton D, Howell C, Parida L, Carrieri AP. AutoXAI4Omics: an automated explainable AI tool for omics and tabular data. Brief Bioinform 2024; 26:bbae593. [PMID: 39576223 PMCID: PMC11583442 DOI: 10.1093/bib/bbae593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 09/17/2024] [Accepted: 11/01/2024] [Indexed: 11/24/2024] Open
Abstract
Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.
Collapse
Affiliation(s)
- James Strudwick
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Laura-Jayne Gardiner
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | | | - Niina Haiminen
- IBM T.J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, United States
| | - Ashley Evans
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Jennifer Kelly
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Matthew Madgwick
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Filippo Utro
- IBM T.J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, United States
| | - Ed Seabolt
- IBM Research, Almaden, 650 Harry Rd, San Jose, CA 95120, United States
| | - Christopher Gibson
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Bharat Bedi
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Daniel Clayton
- STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Ciaron Howell
- STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| | - Laxmi Parida
- IBM T.J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, United States
| | - Anna Paola Carrieri
- IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom
| |
Collapse
|
6
|
Sun L, Bian J, Xin Y, Jiang L, Zheng L. Epi-SSA: A novel epistasis detection method based on a multi-objective sparrow search algorithm. PLoS One 2024; 19:e0311223. [PMID: 39446852 PMCID: PMC11500897 DOI: 10.1371/journal.pone.0311223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 09/16/2024] [Indexed: 10/26/2024] Open
Abstract
Genome-wide association studies typically considers epistatic interactions as a crucial factor in exploring complex diseases. However, the current methods primarily concentrate on the detection of two-order epistatic interactions, with flaws in accuracy. In this work, we introduce a novel method called Epi-SSA, which can be better utilized to detect high-order epistatic interactions. Epi-SSA draws inspiration from the sparrow search algorithm and optimizes the population based on multiple objective functions in each iteration, in order to be able to more precisely identify epistatic interactions. To evaluate its performance, we conducted a comprehensive comparison between Epi-SSA and seven other methods using five simulation datasets: DME 100, DNME 100, DME 1000, DNME 1000 and DNME3 100. The DME 100 dataset encompasses eight second-order epistasis disease models with marginal effects, each comprising 100 simulated data instances, featuring 100 SNPs per instance, alongside 800 case and 800 control samples. The DNME 100 encompasses eight second-order epistasis disease models without marginal effects and retains other properties consistent with DME 100. Experiments on the DME 100 and DNME 100 datasets were designed to evaluate the algorithms' capacity to detect epistasis across varying disease models. The DME 1000 and DNME 1000 datasets extend the complexity with 1000 SNPs per simulated data instance, while retaining other properties consistent with DME 100 and DNME 100. These experiments aimed to gauge the algorithms' adaptability in detecting epistasis as the number of SNPs in the data increases. The DNME3 100 dataset introduces a higher level of complexity with six third-order epistasis disease models, otherwise paralleling the structure of DNME 100, serving to test the algorithms' proficiency in identifying higher-order epistasis. The highest average F-measures achieved by the seven other existing methods on the five datasets are 0.86, 0.86, 0.41, 0.56, and 0.79 respectively, while the average F-measures of Epi-SSA on the five datasets are 0.92, 0.97, 0.79, 0.86, and 0.97 respectively. The experimental results demonstrate that the Epi-SSA algorithm outperforms other methods in a variety of epistasis detection tasks. As the number of SNPs in the data set increases and the order of epistasis rises, the advantages of the Epi-SSA algorithm become increasingly pronounced. In addition, we applied Epi-SSA to the analysis of the WTCCC dataset, uncovering numerous genes and gene pairs that might play a significant role in the pathogenesis of seven complex diseases. It is worthy of note that some of these genes have been relatedly reported in the Comparative Toxicogenomics Database (CTD). Epi-SSA is a potent tool for detecting epistatic interactions, which aids us in further comprehending the pathogenesis of common and complex diseases. The source code of Epi-SSA can be obtained at https://osf.io/6sqwj/.
Collapse
Affiliation(s)
- Liyan Sun
- College of Computer Science and Technology, Changchun University, Changchun City, Jilin Province, China
| | - Jingwen Bian
- School of Cultural and Media Studies, Changchun University of Science and Technology, Changchun City, Jilin Province, China
| | - Yi Xin
- College of Computer Science and Technology, Changchun University, Changchun City, Jilin Province, China
| | - Linqing Jiang
- College of Computer Science and Technology, Changchun University, Changchun City, Jilin Province, China
| | - Linxuan Zheng
- College of Computer Science and Technology, Changchun University, Changchun City, Jilin Province, China
| |
Collapse
|
7
|
He Y, Mulqueeney JM, Watt EC, Salili-James A, Barber NS, Camaiti M, Hunt ESE, Kippax-Chui O, Knapp A, Lanzetti A, Rangel-de Lázaro G, McMinn JK, Minus J, Mohan AV, Roberts LE, Adhami D, Grisan E, Gu Q, Herridge V, Poon STS, West T, Goswami A. Opportunities and Challenges in Applying AI to Evolutionary Morphology. Integr Org Biol 2024; 6:obae036. [PMID: 40433986 PMCID: PMC12082097 DOI: 10.1093/iob/obae036] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 08/07/2024] [Accepted: 09/20/2024] [Indexed: 05/29/2025] Open
Abstract
Artificial intelligence (AI) is poised to revolutionize many aspects of science, including the study of evolutionary morphology. While classical AI methods such as principal component analysis and cluster analysis have been commonplace in the study of evolutionary morphology for decades, recent years have seen increasing application of deep learning to ecology and evolutionary biology. As digitized specimen databases become increasingly prevalent and openly available, AI is offering vast new potential to circumvent long-standing barriers to rapid, big data analysis of phenotypes. Here, we review the current state of AI methods available for the study of evolutionary morphology, which are most developed in the area of data acquisition and processing. We introduce the main available AI techniques, categorizing them into 3 stages based on their order of appearance: (1) machine learning, (2) deep learning, and (3) the most recent advancements in large-scale models and multimodal learning. Next, we present case studies of existing approaches using AI for evolutionary morphology, including image capture and segmentation, feature recognition, morphometrics, and phylogenetics. We then discuss the prospectus for near-term advances in specific areas of inquiry within this field, including the potential of new AI methods that have not yet been applied to the study of morphological evolution. In particular, we note key areas where AI remains underutilized and could be used to enhance studies of evolutionary morphology. This combination of current methods and potential developments has the capacity to transform the evolutionary analysis of the organismal phenotype into evolutionary phenomics, leading to an era of "big data" that aligns the study of phenotypes with genomics and other areas of bioinformatics.
Collapse
Affiliation(s)
- Y He
- Life Sciences, Natural History Museum, London, UK
| | - J M Mulqueeney
- Life Sciences, Natural History Museum, London, UK
- Department of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton, Southampton, UK
| | - E C Watt
- Life Sciences, Natural History Museum, London, UK
- Division of Biosciences, University College London, London, UK
| | - A Salili-James
- AI and Innovation, Natural History Museum, London, UK
- Digital, Data and Informatics, Natural History Museum, London, UK
| | - N S Barber
- Life Sciences, Natural History Museum, London, UK
- Department of Anthropology, University College London, London, UK
| | - M Camaiti
- Life Sciences, Natural History Museum, London, UK
| | - E S E Hunt
- Life Sciences, Natural History Museum, London, UK
- Department of Life Sciences, Imperial College London, London, UK
- Grantham Institute, Imperial College London, London, UK
| | - O Kippax-Chui
- Life Sciences, Natural History Museum, London, UK
- Grantham Institute, Imperial College London, London, UK
- Department of Earth Science and Engineering, Imperial College London, London, UK
| | - A Knapp
- Life Sciences, Natural History Museum, London, UK
- Centre for Integrative Anatomy, University College London, London, UK
| | - A Lanzetti
- Life Sciences, Natural History Museum, London, UK
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, UK
| | - G Rangel-de Lázaro
- Life Sciences, Natural History Museum, London, UK
- School of Oriental and African Studies, London, UK
| | - J K McMinn
- Life Sciences, Natural History Museum, London, UK
- Department of Earth Sciences, University of Oxford, Oxford, UK
| | - J Minus
- Life Sciences, Natural History Museum, London, UK
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | - A V Mohan
- Life Sciences, Natural History Museum, London, UK
- Biodiversity Genomics Laboratory, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - L E Roberts
- Life Sciences, Natural History Museum, London, UK
| | - D Adhami
- Life Sciences, Natural History Museum, London, UK
- Department of Life Sciences, Imperial College London, London, UK
- Imaging and Analysis Centre, Natural History Museum, London, UK
| | - E Grisan
- School of Engineering, London South Bank University, London, UK
| | - Q Gu
- AI and Innovation, Natural History Museum, London, UK
- Digital, Data and Informatics, Natural History Museum, London, UK
| | - V Herridge
- Life Sciences, Natural History Museum, London, UK
- School of Biosciences, University of Sheffield, Sheffield, UK
| | - S T S Poon
- AI and Innovation, Natural History Museum, London, UK
- Digital, Data and Informatics, Natural History Museum, London, UK
| | - T West
- Centre for Integrative Anatomy, University College London, London, UK
- Imaging and Analysis Centre, Natural History Museum, London, UK
| | - A Goswami
- Life Sciences, Natural History Museum, London, UK
| |
Collapse
|
8
|
Martins D, Abbasi M, Egas C, Arrais JP. Enhancing schizophrenia phenotype prediction from genotype data through knowledge-driven deep neural network models. Genomics 2024; 116:110910. [PMID: 39111546 DOI: 10.1016/j.ygeno.2024.110910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/08/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024]
Abstract
This article explores deep learning model design, drawing inspiration from the omnigenic model and genetic heterogeneity concepts, to improve schizophrenia prediction using genotype data. It introduces an innovative three-step approach leveraging neural networks' capabilities to efficiently handle genetic interactions. A locally connected network initially routes input data from variants to their corresponding genes. The second step employs an Encoder-Decoder to capture relationships among identified genes. The final model integrates knowledge from the first two and incorporates a parallel component to consider the effects of additional genes. This expansion enhances prediction scores by considering a larger number of genes. Trained models achieved an average AUC of 0.83, surpassing other genotype-trained models and matching gene expression dataset-based approaches. Additionally, tests on held-out sets reported an average sensitivity of 0.72 and an accuracy of 0.76, aligning with schizophrenia heritability predictions. Moreover, the study addresses genetic heterogeneity challenges by considering diverse population subsets.
Collapse
Affiliation(s)
- Daniel Martins
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal; Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal.
| | - Conceição Egas
- Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal; Biocant - Transfer Technology Association, Cantanhede, Portugal; CNC - CNC Center for Neuroscience and Cell Biology, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
9
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
10
|
Graça M, Nobre R, Sousa L, Ilic A. Distributed transformer for high order epistasis detection in large-scale datasets. Sci Rep 2024; 14:14579. [PMID: 38918413 PMCID: PMC11199512 DOI: 10.1038/s41598-024-65317-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 06/19/2024] [Indexed: 06/27/2024] Open
Abstract
Understanding the genetic basis of complex diseases is one of the most important challenges in current precision medicine. To this end, Genome-Wide Association Studies aim to correlate Single Nucleotide Polymorphisms (SNPs) to the presence or absence of certain traits. However, these studies do not consider interactions between several SNPs, known as epistasis, which explain most genetic diseases. Analyzing SNP combinations to detect epistasis is a major computational task, due to the enormous search space. A possible solution is to employ deep learning strategies for genomic prediction, but the lack of explainability derived from the black-box nature of neural networks is a challenge yet to be addressed. Herein, a novel, flexible, portable, and scalable framework for network interpretation based on transformers is proposed to tackle any-order epistasis. The results on various epistasis scenarios show that the proposed framework outperforms state-of-the-art methods for explainability, while being scalable to large datasets and portable to various deep learning accelerators. The proposed framework is validated on three WTCCC datasets, identifying SNPs related to genes known in the literature that have direct relationships with the studied diseases.
Collapse
Affiliation(s)
- Miguel Graça
- INESC-ID, Instituto Superior Técnico, 1000-029, Lisbon, Portugal.
| | - Ricardo Nobre
- INESC-ID, Instituto Superior Técnico, 1000-029, Lisbon, Portugal
| | - Leonel Sousa
- INESC-ID, Instituto Superior Técnico, 1000-029, Lisbon, Portugal
| | - Aleksandar Ilic
- INESC-ID, Instituto Superior Técnico, 1000-029, Lisbon, Portugal
| |
Collapse
|
11
|
Chang-Brahim I, Koppensteiner LJ, Beltrame L, Bodner G, Saranti A, Salzinger J, Fanta-Jende P, Sulzbachner C, Bruckmüller F, Trognitz F, Samad-Zamini M, Zechner E, Holzinger A, Molin EM. Reviewing the essential roles of remote phenotyping, GWAS and explainable AI in practical marker-assisted selection for drought-tolerant winter wheat breeding. FRONTIERS IN PLANT SCIENCE 2024; 15:1319938. [PMID: 38699541 PMCID: PMC11064034 DOI: 10.3389/fpls.2024.1319938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/13/2024] [Indexed: 05/05/2024]
Abstract
Marker-assisted selection (MAS) plays a crucial role in crop breeding improving the speed and precision of conventional breeding programmes by quickly and reliably identifying and selecting plants with desired traits. However, the efficacy of MAS depends on several prerequisites, with precise phenotyping being a key aspect of any plant breeding programme. Recent advancements in high-throughput remote phenotyping, facilitated by unmanned aerial vehicles coupled to machine learning, offer a non-destructive and efficient alternative to traditional, time-consuming, and labour-intensive methods. Furthermore, MAS relies on knowledge of marker-trait associations, commonly obtained through genome-wide association studies (GWAS), to understand complex traits such as drought tolerance, including yield components and phenology. However, GWAS has limitations that artificial intelligence (AI) has been shown to partially overcome. Additionally, AI and its explainable variants, which ensure transparency and interpretability, are increasingly being used as recognised problem-solving tools throughout the breeding process. Given these rapid technological advancements, this review provides an overview of state-of-the-art methods and processes underlying each MAS, from phenotyping, genotyping and association analyses to the integration of explainable AI along the entire workflow. In this context, we specifically address the challenges and importance of breeding winter wheat for greater drought tolerance with stable yields, as regional droughts during critical developmental stages pose a threat to winter wheat production. Finally, we explore the transition from scientific progress to practical implementation and discuss ways to bridge the gap between cutting-edge developments and breeders, expediting MAS-based winter wheat breeding for drought tolerance.
Collapse
Affiliation(s)
- Ignacio Chang-Brahim
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
| | | | - Lorenzo Beltrame
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Gernot Bodner
- Department of Crop Sciences, Institute of Agronomy, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| | - Anna Saranti
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| | - Jules Salzinger
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Phillipp Fanta-Jende
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Christoph Sulzbachner
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Felix Bruckmüller
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Friederike Trognitz
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
| | | | - Elisabeth Zechner
- Verein zur Förderung einer nachhaltigen und regionalen Pflanzenzüchtung, Zwettl, Austria
| | - Andreas Holzinger
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| | - Eva M. Molin
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| |
Collapse
|
12
|
Sigala RE, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M, Prokopenko I, Mahdi A, Demirkan A. Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
Affiliation(s)
- Rafaella E. Sigala
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Vasiliki Lagou
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Aleksey Shmeliov
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Sara Atito
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Samaneh Kouchaki
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Muhammad Awais
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Inga Prokopenko
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| | - Adam Mahdi
- Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK;
| | - Ayse Demirkan
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| |
Collapse
|
13
|
Bettencourt C, Skene N, Bandres-Ciga S, Anderson E, Winchester LM, Foote IF, Schwartzentruber J, Botia JA, Nalls M, Singleton A, Schilder BM, Humphrey J, Marzi SJ, Toomey CE, Kleifat AA, Harshfield EL, Garfield V, Sandor C, Keat S, Tamburin S, Frigerio CS, Lourida I, Ranson JM, Llewellyn DJ. Artificial intelligence for dementia genetics and omics. Alzheimers Dement 2023; 19:5905-5921. [PMID: 37606627 PMCID: PMC10841325 DOI: 10.1002/alz.13427] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 08/23/2023]
Abstract
Genetics and omics studies of Alzheimer's disease and other dementia subtypes enhance our understanding of underlying mechanisms and pathways that can be targeted. We identified key remaining challenges: First, can we enhance genetic studies to address missing heritability? Can we identify reproducible omics signatures that differentiate between dementia subtypes? Can high-dimensional omics data identify improved biomarkers? How can genetics inform our understanding of causal status of dementia risk factors? And which biological processes are altered by dementia-related genetic variation? Artificial intelligence (AI) and machine learning approaches give us powerful new tools in helping us to tackle these challenges, and we review possible solutions and examples of best practice. However, their limitations also need to be considered, as well as the need for coordinated multidisciplinary research and diverse deeply phenotyped cohorts. Ultimately AI approaches improve our ability to interrogate genetics and omics data for precision dementia medicine. HIGHLIGHTS: We have identified five key challenges in dementia genetics and omics studies. AI can enable detection of undiscovered patterns in dementia genetics and omics data. Enhanced and more diverse genetics and omics datasets are still needed. Multidisciplinary collaborative efforts using AI can boost dementia research.
Collapse
Affiliation(s)
- Conceicao Bettencourt
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Nathan Skene
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Sara Bandres-Ciga
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Emma Anderson
- Department of Mental Health of Older People, Division of Psychiatry, University College London, London, UK
| | | | - Isabelle F Foote
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jeremy Schwartzentruber
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, California, USA
| | - Juan A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mike Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International LLC, Washington, DC, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Jack Humphrey
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Christina E Toomey
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
- Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Ahmad Al Kleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Eric L Harshfield
- Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Victoria Garfield
- MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, UK
| | - Cynthia Sandor
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Samuel Keat
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Section, University of Verona, Verona, Italy
| | - Carlo Sala Frigerio
- UK Dementia Research Institute, Queen Square Institute of Neurology, University College London, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
14
|
Santorsola M, Lescai F. The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI. N Biotechnol 2023; 77:1-11. [PMID: 37329982 DOI: 10.1016/j.nbt.2023.06.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 06/14/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
Collapse
Affiliation(s)
| | - Francesco Lescai
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
| |
Collapse
|
15
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
16
|
Markelova M, Senina A, Khusnutdinova D, Siniagina M, Kupriyanova E, Shakirova G, Odintsova A, Abdulkhakov R, Kolesnikova I, Shagaleeva O, Lyamina S, Abdulkhakov S, Zakharzhevskaya N, Grigoryeva T. Association between Taxonomic Composition of Gut Microbiota and Host Single Nucleotide Polymorphisms in Crohn's Disease Patients from Russia. Int J Mol Sci 2023; 24:ijms24097998. [PMID: 37175705 PMCID: PMC10178390 DOI: 10.3390/ijms24097998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
Crohn's disease (CD) is a chronic relapsing inflammatory bowel disease of unknown etiology. Genetic predisposition and dysbiotic gut microbiota are important factors in the pathogenesis of CD. In this study, we analyzed the taxonomic composition of the gut microbiota and genotypes of 24 single nucleotide polymorphisms (SNP) associated with the risk of CD. The studied cohorts included 96 CD patients and 24 healthy volunteers from Russia. Statistically significant differences were found in the allele frequencies for 8 SNPs and taxonomic composition of the gut microbiota in CD patients compared with controls. In addition, two types of gut microbiota communities were identified in CD patients. The main distinguishing driver of bacterial families for the first community type are Bacteroidaceae and unclassified members of the Clostridiales order, and the second type is characterized by increased abundance of Streptococcaceae and Enterobacteriaceae. Differences in the allele frequencies of the rs9858542 (BSN), rs3816769 (STAT3), and rs1793004 (NELL1) were also found between groups of CD patients with different types of microbiota communities. These findings confirm the complex multifactorial nature of CD.
Collapse
Affiliation(s)
- Maria Markelova
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | - Anastasia Senina
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | - Dilyara Khusnutdinova
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | - Maria Siniagina
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | - Elena Kupriyanova
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | | | | | - Rustam Abdulkhakov
- Hospital Therapy Department, Kazan State Medical University, 420012 Kazan, Russia
| | - Irina Kolesnikova
- Lopukhin Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
| | - Olga Shagaleeva
- Lopukhin Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
| | - Svetlana Lyamina
- Molecular Pathology of Digestion Laboratory, A.I. Yevdokimov Moscow State University of Medicine and Dentistry, 127473 Moscow, Russia
| | - Sayar Abdulkhakov
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| | - Natalia Zakharzhevskaya
- Lopukhin Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
| | - Tatiana Grigoryeva
- Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, 420008 Kazan, Russia
| |
Collapse
|
17
|
Abstract
Advancements in high-throughput sequencing have yielded vast amounts of genomic data, which are studied using genome-wide association study (GWAS)/phenome-wide association study (PheWAS) methods to identify associations between the genotype and phenotype. The associated findings have contributed to pharmacogenomics and improved clinical decision support at the point of care in many healthcare systems. However, the accumulation of genomic data from sequencing and clinical data from electronic health records (EHRs) poses significant challenges for data scientists. Following the rise of artificial intelligence (AI) technology such as machine learning and deep learning, an increasing number of GWAS/PheWAS studies have successfully leveraged this technology to overcome the aforementioned challenges. In this review, we focus on the application of data science and AI technology in three areas, including risk prediction and identification of causal single-nucleotide polymorphisms, EHR-based phenotyping and CRISPR guide RNA design. Additionally, we highlight a few emerging AI technologies, such as transfer learning and multi-view learning, which will or have started to benefit genomic studies.
Collapse
Affiliation(s)
- Jing Lin
- NUHS Corporate Office, National University Health System, Singapore
| | - Kee Yuan Ngiam
- NUHS Corporate Office, National University Health System, Singapore,Department of Surgery, National University of Singapore, Singapore,Correspondence: A/Prof Kee Yuan Ngiam, Group Chief Technology Officer, NUHS Corporate Office, National University Health System, 1E Kent Ridge Road, 119228, Singapore. E-mail:
| |
Collapse
|
18
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:bbac202. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
19
|
Zhou X, Chen L, Liu HX. Applications of Machine Learning Models to Predict and Prevent Obesity: A Mini-Review. Front Nutr 2022; 9:933130. [PMID: 35866076 PMCID: PMC9294383 DOI: 10.3389/fnut.2022.933130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 05/19/2022] [Indexed: 11/28/2022] Open
Abstract
Research on obesity and related diseases has received attention from government policymakers; interventions targeting nutrient intake, dietary patterns, and physical activity are deployed globally. An urgent issue now is how can we improve the efficiency of obesity research or obesity interventions. Currently, machine learning (ML) methods have been widely applied in obesity-related studies to detect obesity disease biomarkers or discover intervention strategies to optimize weight loss results. In addition, an open source of these algorithms is necessary to check the reproducibility of the research results. Furthermore, appropriate applications of these algorithms could greatly improve the efficiency of similar studies by other researchers. Here, we proposed a mini-review of several open-source ML algorithms, platforms, or related databases that are of particular interest or can be applied in the field of obesity research. We focus our topic on nutrition, environment and social factor, genetics or genomics, and microbiome-adopting ML algorithms.
Collapse
Affiliation(s)
- Xiaobei Zhou
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
| | - Lei Chen
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
- Institute of Life Sciences, China Medical University, Shenyang, China
| | - Hui-Xin Liu
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
- Institute of Life Sciences, China Medical University, Shenyang, China
| |
Collapse
|
20
|
Hassam M, Shamsi JA, Khan A, Al-Harrasi A, Uddin R. Prediction of inhibitory activities of small molecules against Pantothenate synthetase from Mycobacterium tuberculosis using Machine Learning models. Comput Biol Med 2022; 145:105453. [DOI: 10.1016/j.compbiomed.2022.105453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/18/2022] [Accepted: 03/23/2022] [Indexed: 11/03/2022]
|
21
|
Shiels D, Prestwich BD, Koo O, Kanchiswamy CN, O'Halloran R, Badmi R. Hemp Genome Editing-Challenges and Opportunities. Front Genome Ed 2022; 4:823486. [PMID: 35187530 PMCID: PMC8847435 DOI: 10.3389/fgeed.2022.823486] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 01/05/2022] [Indexed: 11/13/2022] Open
Abstract
Hemp (Cannabis sativa L.) is a multipurpose crop with many important uses including medicine, fibre, food and biocomposites. This plant is currently gaining prominence and acceptance for its valuable applications. Hemp is grown as a cash crop for its novel cannabinoids which are estimated to be a multibillion-dollar downstream market. Hemp cultivation can play a major role in carbon sequestration with good CO2 to biomass conversion in low input systems and can also improve soil health and promote phytoremediation. The recent advent of genome editing tools to produce non-transgenic genome-edited crops with no trace of foreign genetic material has the potential to overcome regulatory hurdles faced by genetically modified crops. The use of Artificial Intelligence - mediated trait discovery platforms are revolutionizing the agricultural industry to produce desirable crops with unprecedented accuracy and speed. However, genome editing tools to improve the beneficial properties of hemp have not yet been deployed. Recent availability of high-quality Cannabis genome sequences from several strains (cannabidiol and tetrahydrocannabinol balanced and CBD/THC rich strains) have paved the way for improving the production of valuable bioactive molecules for the welfare of humankind and the environment. In this context, the article focuses on exploiting advanced genome editing tools to produce non-transgenic hemp to improve the most industrially desirable traits. The challenges, opportunities and interdisciplinary approaches that can be adopted from existing technologies in other plant species are highlighted.
Collapse
Affiliation(s)
- Donal Shiels
- School of Biological Earth and Environmental Sciences, Environmental Research Institute, University College Cork, Cork, Ireland
| | - Barbara Doyle Prestwich
- School of Biological Earth and Environmental Sciences, Environmental Research Institute, University College Cork, Cork, Ireland
| | | | | | - Roisin O'Halloran
- School of Biological Earth and Environmental Sciences, Environmental Research Institute, University College Cork, Cork, Ireland
| | - Raghuram Badmi
- School of Biological Earth and Environmental Sciences, Environmental Research Institute, University College Cork, Cork, Ireland
- Plantedit Pvt Ltd, Cork, Ireland
| |
Collapse
|
22
|
Lim AJW, Lim LJ, Ooi BNS, Koh ET, Tan JWL, Chong SS, Khor CC, Tucker-Kellogg L, Leong KP, Lee CG. Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients. EBioMedicine 2022; 75:103800. [PMID: 35022146 PMCID: PMC8808170 DOI: 10.1016/j.ebiom.2021.103800] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Major challenges in large scale genetic association studies include not only the identification of causative single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus proposes a novel feature engineering approach integrating potentially functional coding haplotypes (pfcHap) with machine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, that take into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX) response in rheumatoid arthritis (RA) patients. METHODS Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set. Predictive capacity and robustness of the selected features were assessed using six popular machine learning models through a train set cross-validation and evaluated in an unseen test set. FINDINGS Significantly, 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive performance (AUC: 0.776-0.828; Sensitivity: 0.656-0.813; Specificity: 0.684-0.868) across all six ML models in an unseen test dataset for the prediction of MTX response in RA patients. INTERPRETATION Majority of the predictive pfcHap SNPs were predicted to be potentially functional and some of the genes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA pathways. FUNDING Singapore Ministry of Health's National Medical Research Council (NMRC) [NMRC/CBRG/0095/2015; CG12Aug17; CGAug16M012; NMRC/CG/017/2013]; National Cancer Center Research Fund and block funding Duke-NUS Medical School.; Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2019-T2-1-138.
Collapse
Affiliation(s)
- Ashley J W Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Lee Jin Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Brandon N S Ooi
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ee Tzun Koh
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Justina Wei Lynn Tan
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Samuel S Chong
- Dept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Chiea Chuen Khor
- Division of Human Genetics, Genome Institute of Singapore, Singapore
| | - Lisa Tucker-Kellogg
- Centre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore
| | - Khai Pang Leong
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore; Clinical Research & Innovation Office, Tan Tock Seng Hospital, Singapore.
| | - Caroline G Lee
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Div of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore; Duke-NUS Medical School, Singapore; NUS Graduate School, National University of Singapore, Singapore.
| |
Collapse
|
23
|
Kondratyev NV, Alfimova MV, Golov AK, Golimbet VE. Bench Research Informed by GWAS Results. Cells 2021; 10:3184. [PMID: 34831407 PMCID: PMC8623533 DOI: 10.3390/cells10113184] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 11/11/2021] [Accepted: 11/11/2021] [Indexed: 12/15/2022] Open
Abstract
Scientifically interesting as well as practically important phenotypes often belong to the realm of complex traits. To the extent that these traits are hereditary, they are usually 'highly polygenic'. The study of such traits presents a challenge for researchers, as the complex genetic architecture of such traits makes it nearly impossible to utilise many of the usual methods of reverse genetics, which often focus on specific genes. In recent years, thousands of genome-wide association studies (GWAS) were undertaken to explore the relationships between complex traits and a large number of genetic factors, most of which are characterised by tiny effects. In this review, we aim to familiarise 'wet biologists' with approaches for the interpretation of GWAS results, to clarify some issues that may seem counterintuitive and to assess the possibility of using GWAS results in experiments on various complex traits.
Collapse
Affiliation(s)
| | | | - Arkadiy K. Golov
- Mental Health Research Center, 115522 Moscow, Russia; (M.V.A.); (A.K.G.); (V.E.G.)
- Institute of Gene Biology, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Vera E. Golimbet
- Mental Health Research Center, 115522 Moscow, Russia; (M.V.A.); (A.K.G.); (V.E.G.)
| |
Collapse
|