1
|
Adugna A, Amare GA, Jemal M. Machine Learning Approach and Bioinformatics Analysis Discovered Key Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and Hepatocellular Carcinoma. Cancer Inform 2025; 24:11769351251333847. [PMID: 40291818 PMCID: PMC12033511 DOI: 10.1177/11769351251333847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 03/24/2025] [Indexed: 04/30/2025] Open
Abstract
Hepatitis B virus (HBV) causes liver cancer, which is the third most common cause of cancer-related death worldwide. Chronic inflammation via HBV in the host hepatocytes causes hepatocyte remodeling (hepatocyte transformation and immortalization) and hepatocellular carcinoma (HCC). Recognizing cancer stages accurately to optimize early screening and diagnosis is a primary concern in the outlook of HBV-induced hepatocyte remodeling and liver cancer. Genomic signatures play important roles in addressing this issue. Recently, machine learning (ML) models and bioinformatics analysis have become very important in discovering novel genomic signatures for the early diagnosis, treatment, and prognosis of HBV-induced hepatic cell remodeling and HCC. We discuss the recent literature on the ML approach and bioinformatics analysis revealed novel genomic signatures for diagnosing and forecasting HBV-associated hepatocyte remodeling and HCC. Various genomic signatures, including various microRNAs and their associated genes, long noncoding RNAs (lncRNAs), and small nucleolar RNAs (snoRNAs), have been discovered to be involved in the upregulation and downregulation of HBV-HCC. Moreover, these genetic biomarkers also affect different biological processes, such as proliferation, migration, circulation, assault, dissemination, antiapoptosis, mitogenesis, transformation, and angiogenesis in HBV-infected hepatocytes.
Collapse
Affiliation(s)
- Adane Adugna
- Medical Laboratory Sciences, College of Health Sciences, Debre Markos University, Ethiopia
| | - Gashaw Azanaw Amare
- Medical Laboratory Sciences, College of Health Sciences, Debre Markos University, Ethiopia
| | - Mohammed Jemal
- Department of Biomedical Sciences, School of Medicine, Debre Markos University, Ethiopia
| |
Collapse
|
2
|
Mora‐Márquez F, Nuño JC, Soto Á, López de Heredia U. Missing genotype imputation in non-model species using self-organizing maps. Mol Ecol Resour 2025; 25:e13992. [PMID: 38970328 PMCID: PMC11887599 DOI: 10.1111/1755-0998.13992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 05/30/2024] [Accepted: 06/26/2024] [Indexed: 07/08/2024]
Abstract
Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.
Collapse
Affiliation(s)
- Fernando Mora‐Márquez
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Juan Carlos Nuño
- GI en Especies Leñosas (WooSp), Dpto. Matemática Aplicada, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Álvaro Soto
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| | - Unai López de Heredia
- GI en Especies Leñosas (WooSp), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio NaturalUniversidad Politécnica de Madrid, Ciudad UniversitariaMadridSpain
| |
Collapse
|
3
|
Tyagi N, Vahab N, Tyagi S. Genome language modeling (GLM): a beginner's cheat sheet. Biol Methods Protoc 2025; 10:bpaf022. [PMID: 40370585 PMCID: PMC12077296 DOI: 10.1093/biomethods/bpaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/17/2025] [Accepted: 03/23/2025] [Indexed: 05/16/2025] Open
Abstract
Integrating genomics with diverse data modalities has the potential to revolutionize personalized medicine. However, this integration poses significant challenges due to the fundamental differences in data types and structures. The vast size of the genome necessitates transformation into a condensed representation containing key biomarkers and relevant features to ensure interoperability with other modalities. This commentary explores both conventional and state-of-the-art approaches to genome language modeling (GLM), with a focus on representing and extracting meaningful features from genomic sequences. We focus on the latest trends of applying language modeling techniques on genomics sequence data, treating it as a text modality. Effective feature extraction is essential in enabling machine learning models to effectively analyze large genomic datasets, particularly within multimodal frameworks. We first provide a step-by-step guide to various genomic sequence preprocessing and tokenization techniques. Then we explore feature extraction methods for the transformation of tokens using frequency, embedding, and neural network-based approaches. In the end, we discuss machine learning (ML) applications in genomics, focusing on classification, regression, language processing algorithms, and multimodal integration. Additionally, we explore the role of GLM in functional annotation, emphasizing how advanced ML models, such as Bidirectional encoder representations from transformers, enhance the interpretation of genomic data. To the best of our knowledge, we compile the first end-to-end analytic guide to convert complex genomic data into biologically interpretable information using GLM, thereby facilitating the development of novel data-driven hypotheses.
Collapse
Affiliation(s)
- Navya Tyagi
- AI and Data Science, Indian Institute of Technology, Madras, Chennai 600036, Tamil Nadu, India
- Amity Institute of Integrative Health Sciences, Amity University, Gurugram 122412, Haryana, India
| | - Naima Vahab
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| | - Sonika Tyagi
- School of Computing Technologies, Royal Melbourne Institute of Technology (RMIT) University, 3001 Melbourne, Australia
| |
Collapse
|
4
|
Fonzino A, Mazzacuva PL, Handen A, Silvestris DA, Arnold A, Pecori R, Pesole G, Picardi E. REDInet: a temporal convolutional network-based classifier for A-to-I RNA editing detection harnessing million known events. Brief Bioinform 2025; 26:bbaf107. [PMID: 40112338 PMCID: PMC11924403 DOI: 10.1093/bib/bbaf107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/19/2025] [Accepted: 02/24/2025] [Indexed: 03/22/2025] Open
Abstract
A-to-I ribonucleic acid (RNA) editing detection is still a challenging task. Current bioinformatics tools rely on empirical filters and whole genome sequencing or whole exome sequencing data to remove background noise, sequencing errors, and artifacts. Sometimes they make use of cumbersome and time-consuming computational procedures. Here, we present REDInet, a temporal convolutional network-based deep learning algorithm, to profile RNA editing in human RNA sequencing (RNAseq) data. It has been trained on REDIportal RNA editing sites, the largest collection of human A-to-I changes from >8000 RNAseq data of the genotype-tissue expression project. REDInet can classify editing events with high accuracy harnessing RNAseq nucleotide frequencies of 101-base windows without the need for coupled genomic data.
Collapse
Affiliation(s)
- Adriano Fonzino
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy
| | - Pietro Luca Mazzacuva
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, Via Amendola 122/O, 70126, Bari, Italy
- Department of Engineering, University Campus Bio-Medico of Rome, Via Álvaro del Portillo 21, 00128, Rome, Italy
| | - Adam Handen
- Biological Sciences Division, University of Chicago, 5841 S Maryland Avenue, 60637, Chicago, USA
| | - Domenico Alessandro Silvestris
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy
| | - Annette Arnold
- Division of Immune Diversity, German Cancer Research Center, Im Neuenheimer Feld 28069120, Heidelberg, Germany
| | - Riccardo Pecori
- Division of Immune Diversity, German Cancer Research Center, Im Neuenheimer Feld 28069120, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology (HI-TRON), Obere Zahlbacherstr., 55131, Mainz, Germany
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, Via Amendola 122/O, 70126, Bari, Italy
| | - Ernesto Picardi
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, Via Amendola 122/O, 70126, Bari, Italy
| |
Collapse
|
5
|
Ndiaye M, Prieto-Baños S, Fitzgerald LM, Yazdizadeh Kharrazi A, Oreshkov S, Dessimoz C, Sedlazeck FJ, Glover N, Majidian S. When less is more: sketching with minimizers in genomics. Genome Biol 2024; 25:270. [PMID: 39402664 PMCID: PMC11472564 DOI: 10.1186/s13059-024-03414-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes. We also touch on alternative data sketching techniques including universal hitting sets, syncmers, or strobemers. Minimizers and their alternatives have rapidly become indispensable tools for handling vast amounts of data.
Collapse
Affiliation(s)
- Malick Ndiaye
- Department of Fundamental Microbiology, UNIL, Lausanne, Switzerland
| | - Silvia Prieto-Baños
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | - Sergey Oreshkov
- Department of Endocrinology, Diabetology, Metabolism, CHUV, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Natasha Glover
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Majidian
- Department of Computational Biology, UNIL, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
6
|
Vittoria Togo M, Mastrolorito F, Orfino A, Graps EA, Tondo AR, Altomare CD, Ciriaco F, Trisciuzzi D, Nicolotti O, Amoroso N. Where developmental toxicity meets explainable artificial intelligence: state-of-the-art and perspectives. Expert Opin Drug Metab Toxicol 2024; 20:561-577. [PMID: 38141160 DOI: 10.1080/17425255.2023.2298827] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/20/2023] [Indexed: 12/24/2023]
Abstract
INTRODUCTION The application of Artificial Intelligence (AI) to predictive toxicology is rapidly increasing, particularly aiming to develop non-testing methods that effectively address ethical concerns and reduce economic costs. In this context, Developmental Toxicity (Dev Tox) stands as a key human health endpoint, especially significant for safeguarding maternal and child well-being. AREAS COVERED This review outlines the existing methods employed in Dev Tox predictions and underscores the benefits of utilizing New Approach Methodologies (NAMs), specifically focusing on eXplainable Artificial Intelligence (XAI), which proves highly efficient in constructing reliable and transparent models aligned with recommendations from international regulatory bodies. EXPERT OPINION The limited availability of high-quality data and the absence of dependable Dev Tox methodologies render XAI an appealing avenue for systematically developing interpretable and transparent models, which hold immense potential for both scientific evaluations and regulatory decision-making.
Collapse
Affiliation(s)
- Maria Vittoria Togo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fabrizio Mastrolorito
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Angelica Orfino
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Elisabetta Anna Graps
- ARESS Puglia - Agenzia Regionale strategica per laSalute ed il Sociale, Presidenza della Regione Puglia", Bari, Italy
| | - Anna Rita Tondo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Cosimo Damiano Altomare
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fulvio Ciriaco
- Department of Chemistry, Universitá degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Daniela Trisciuzzi
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Orazio Nicolotti
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Nicola Amoroso
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| |
Collapse
|
7
|
Magarelli M, Novielli P, De Filippis F, Magliulo R, Di Bitonto P, Diacono D, Bellotti R, Tangaro S. Explainable artificial intelligence and microbiome data for food geographical origin: the Mozzarella di Bufala Campana PDO Case of Study. Front Microbiol 2024; 15:1393243. [PMID: 38887708 PMCID: PMC11180736 DOI: 10.3389/fmicb.2024.1393243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/13/2024] [Indexed: 06/20/2024] Open
Abstract
Identifying the origin of a food product holds paramount importance in ensuring food safety, quality, and authenticity. Knowing where a food item comes from provides crucial information about its production methods, handling practices, and potential exposure to contaminants. Machine learning techniques play a pivotal role in this process by enabling the analysis of complex data sets to uncover patterns and associations that can reveal the geographical source of a food item. This study aims to investigate the potential use of explainable artificial intelligence for identifying the food origin. The case of study of Mozzarella di Bufala Campana PDO has been considered by examining the composition of the microbiota in each samples. Three different supervised machine learning algorithms have been compared and the best classifier model is represented by Random Forest with an Area Under the Curve (AUC) value of 0.93 and the top accuracy of 0.87. Machine learning models effectively classify origin, offering innovative ways to authenticate regional products and support local economies. Further research can explore microbiota analysis and extend applicability to diverse food products and contexts for enhanced accuracy and broader impact.
Collapse
Affiliation(s)
- Michele Magarelli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Francesca De Filippis
- Dipartimento di Agraria, Università degli Studi di Napoli Federico II, Naples, Italy
| | - Raffaele Magliulo
- Dipartimento di Agraria, Università degli Studi di Napoli Federico II, Naples, Italy
| | - Pierpaolo Di Bitonto
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Domenico Diacono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
- Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| |
Collapse
|
8
|
Wang Q, Chang Z, Liu X, Wang Y, Feng C, Ping Y, Feng X. Predictive Value of Machine Learning for Platinum Chemotherapy Responses in Ovarian Cancer: Systematic Review and Meta-Analysis. J Med Internet Res 2024; 26:e48527. [PMID: 38252469 PMCID: PMC10845031 DOI: 10.2196/48527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND Machine learning is a potentially effective method for predicting the response to platinum-based treatment for ovarian cancer. However, the predictive performance of various machine learning methods and variables is still a matter of controversy and debate. OBJECTIVE This study aims to systematically review relevant literature on the predictive value of machine learning for platinum-based chemotherapy responses in patients with ovarian cancer. METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we systematically searched the PubMed, Embase, Web of Science, and Cochrane databases for relevant studies on predictive models for platinum-based therapies for the treatment of ovarian cancer published before April 26, 2023. The Prediction Model Risk of Bias Assessment tool was used to evaluate the risk of bias in the included articles. Concordance index (C-index), sensitivity, and specificity were used to evaluate the performance of the prediction models to investigate the predictive value of machine learning for platinum chemotherapy responses in patients with ovarian cancer. RESULTS A total of 1749 articles were examined, and 19 of them involving 39 models were eligible for this study. The most commonly used modeling methods were logistic regression (16/39, 41%), Extreme Gradient Boosting (4/39, 10%), and support vector machine (4/39, 10%). The training cohort reported C-index in 39 predictive models, with a pooled value of 0.806; the validation cohort reported C-index in 12 predictive models, with a pooled value of 0.831. Support vector machine performed well in both the training and validation cohorts, with a C-index of 0.942 and 0.879, respectively. The pooled sensitivity was 0.890, and the pooled specificity was 0.790 in the training cohort. CONCLUSIONS Machine learning can effectively predict how patients with ovarian cancer respond to platinum-based chemotherapy and may provide a reference for the development or updating of subsequent scoring systems.
Collapse
Affiliation(s)
- Qingyi Wang
- Department of First Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Zhuo Chang
- Basic Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Xiaofang Liu
- Department of First Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Yunrui Wang
- Department of First Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Chuwen Feng
- Department of First Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Yunlu Ping
- Department of First Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Xiaoling Feng
- Department of Gynecology, First Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, China
| |
Collapse
|
9
|
Golob JL, Oskotsky TT, Tang AS, Roldan A, Chung V, Ha CWY, Wong RJ, Flynn KJ, Parraga-Leo A, Wibrand C, Minot SS, Oskotsky B, Andreoletti G, Kosti I, Bletz J, Nelson A, Gao J, Wei Z, Chen G, Tang ZZ, Novielli P, Romano D, Pantaleo E, Amoroso N, Monaco A, Vacca M, De Angelis M, Bellotti R, Tangaro S, Kuntzleman A, Bigcraft I, Techtmann S, Bae D, Kim E, Jeon J, Joe S, Theis KR, Ng S, Lee YS, Diaz-Gimeno P, Bennett PR, MacIntyre DA, Stolovitzky G, Lynch SV, Albrecht J, Gomez-Lopez N, Romero R, Stevenson DK, Aghaeepour N, Tarca AL, Costello JC, Sirota M. Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research. Cell Rep Med 2024; 5:101350. [PMID: 38134931 PMCID: PMC10829755 DOI: 10.1016/j.xcrm.2023.101350] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 09/15/2023] [Accepted: 12/01/2023] [Indexed: 12/24/2023]
Abstract
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.
Collapse
Affiliation(s)
- Jonathan L Golob
- Division of Infectious Disease, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA.
| | - Tomiko T Oskotsky
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| | - Alice S Tang
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Alennie Roldan
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | | | - Connie W Y Ha
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Ronald J Wong
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; March of Dimes Prematurity Research Center at Stanford University, Stanford, CA, USA
| | | | - Antonio Parraga-Leo
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, Obstetrics and Gynaecology, Universidad de Valencia, Valencia, Spain; IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain
| | - Camilla Wibrand
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Samuel S Minot
- Data Core, Shared Resources, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Boris Oskotsky
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Gaia Andreoletti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Idit Kosti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | | | | | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zhoujingpeng Wei
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Donato Romano
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Mirco Vacca
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Maria De Angelis
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Abigail Kuntzleman
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Isaac Bigcraft
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Stephen Techtmann
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Daehun Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Jongbum Jeon
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Kevin R Theis
- Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA
| | - Sherrianne Ng
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Yun S Lee
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Patricia Diaz-Gimeno
- IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain
| | - Phillip R Bennett
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - David A MacIntyre
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Gustavo Stolovitzky
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA; Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA; Sema4, Stamford, CT, USA
| | - Susan V Lynch
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | | | - Nardhy Gomez-Lopez
- Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA; Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA
| | - Roberto Romero
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Detroit Medical Center, Detroit, MI, USA; Department of Obstetrics and Gynecology, Florida International University, Miami, FL, USA
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Center for Academic Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Nima Aghaeepour
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Adi L Tarca
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI, USA; Department of Computer Science, Wayne State University College of Engineering, Detroit, MI, USA
| | - James C Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Marina Sirota
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
10
|
YOUSEF M, ALLMER J. Deep learning in bioinformatics. Turk J Biol 2023; 47:366-382. [PMID: 38681776 PMCID: PMC11045206 DOI: 10.55730/1300-0152.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.
Collapse
Affiliation(s)
- Malik YOUSEF
- Department of Information Systems, Zefat Academic College, Zefat,
Israel
| | - Jens ALLMER
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr,
Germany
| |
Collapse
|
11
|
Golob JL, Oskotsky TT, Tang AS, Roldan A, Chung V, Ha CWY, Wong RJ, Flynn KJ, Parraga-Leo A, Wibrand C, Minot SS, Andreoletti G, Kosti I, Bletz J, Nelson A, Gao J, Wei Z, Chen G, Tang ZZ, Novielli P, Romano D, Pantaleo E, Amoroso N, Monaco A, Vacca M, De Angelis M, Bellotti R, Tangaro S, Kuntzleman A, Bigcraft I, Techtmann S, Bae D, Kim E, Jeon J, Joe S, Theis KR, Ng S, Lee Li YS, Diaz-Gimeno P, Bennett PR, MacIntyre DA, Stolovitzky G, Lynch SV, Albrecht J, Gomez-Lopez N, Romero R, Stevenson DK, Aghaeepour N, Tarca AL, Costello JC, Sirota M. Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.07.23286920. [PMID: 36945505 PMCID: PMC10029035 DOI: 10.1101/2023.03.07.23286920] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
Collapse
Affiliation(s)
- Jonathan L Golob
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
| | - Tomiko T Oskotsky
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Alice S Tang
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Alennie Roldan
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | | | - Connie W Y Ha
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | - Ronald J Wong
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
| | | | - Antonio Parraga-Leo
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Camilla Wibrand
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Samuel S Minot
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
| | - Gaia Andreoletti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Idit Kosti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | | | | | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Zhoujingpeng Wei
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Pierfrancesco Novielli
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Donato Romano
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Ester Pantaleo
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
| | - Nicola Amoroso
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
| | - Alfonso Monaco
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Mirco Vacca
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Maria De Angelis
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Roberto Bellotti
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Sabina Tangaro
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Abigail Kuntzleman
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
| | - Isaac Bigcraft
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Stephen Techtmann
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Daehun Bae
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Eunyoung Kim
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | | | - Soobok Joe
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Kevin R Theis
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | - Sherrianne Ng
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
| | - Yun S Lee Li
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Patricia Diaz-Gimeno
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Phillip R Bennett
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - David A MacIntyre
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Gustavo Stolovitzky
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Susan V Lynch
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | | | - Nardhy Gomez-Lopez
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Roberto Romero
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
| | - Nima Aghaeepour
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
| | - Adi L Tarca
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - James C Costello
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Marina Sirota
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| |
Collapse
|
12
|
Pantaleo E, Monaco A, Amoroso N, Lombardi A, Bellantuono L, Urso D, Lo Giudice C, Picardi E, Tafuri B, Nigro S, Pesole G, Tangaro S, Logroscino G, Bellotti R. A Machine Learning Approach to Parkinson’s Disease Blood Transcriptomics. Genes (Basel) 2022; 13:genes13050727. [PMID: 35627112 PMCID: PMC9141063 DOI: 10.3390/genes13050727] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/16/2022] [Accepted: 04/18/2022] [Indexed: 12/23/2022] Open
Abstract
The increased incidence and the significant health burden associated with Parkinson’s disease (PD) have stimulated substantial research efforts towards the identification of effective treatments and diagnostic procedures. Despite technological advancements, a cure is still not available and PD is often diagnosed a long time after onset when irreversible damage has already occurred. Blood transcriptomics represents a potentially disruptive technology for the early diagnosis of PD. We used transcriptome data from the PPMI study, a large cohort study with early PD subjects and age matched controls (HC), to perform the classification of PD vs. HC in around 550 samples. Using a nested feature selection procedure based on Random Forests and XGBoost we reached an AUC of 72% and found 493 candidate genes. We further discussed the importance of the selected genes through a functional analysis based on GOs and KEGG pathways.
Collapse
Affiliation(s)
- Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento di Scienze Mediche di Base, Neuroscienze e Organi di Senso, Università degli Studi di Bari Aldo Moro, Piazza G. Cesare 11, 70124 Bari, Italy;
- Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, Via G. Amendola 173, 70125 Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via A. Orabona 4, 70125 Bari, Italy
| | - Angela Lombardi
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, Via G. Amendola 173, 70125 Bari, Italy
- Correspondence:
| | - Loredana Bellantuono
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento di Scienze Mediche di Base, Neuroscienze e Organi di Senso, Università degli Studi di Bari Aldo Moro, Piazza G. Cesare 11, 70124 Bari, Italy;
| | - Daniele Urso
- Centro per le Malattie Neurodegenerative e l’Invecchiamento Cerebrale, Dipartimento di Ricerca Clinica in Neurologia, Università degli Studi di Bari Aldo Moro, Pia Fondazione Cardinale G. Panico, 73039 Tricase, Italy; (D.U.); (B.T.); (S.N.)
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London SE5 8AF, UK
| | - Claudio Lo Giudice
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari Aldo Moro, Via A. Orabona 4, 70125 Bari, Italy; (C.L.G.); (E.P.); (G.P.)
| | - Ernesto Picardi
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari Aldo Moro, Via A. Orabona 4, 70125 Bari, Italy; (C.L.G.); (E.P.); (G.P.)
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Benedetta Tafuri
- Centro per le Malattie Neurodegenerative e l’Invecchiamento Cerebrale, Dipartimento di Ricerca Clinica in Neurologia, Università degli Studi di Bari Aldo Moro, Pia Fondazione Cardinale G. Panico, 73039 Tricase, Italy; (D.U.); (B.T.); (S.N.)
| | - Salvatore Nigro
- Centro per le Malattie Neurodegenerative e l’Invecchiamento Cerebrale, Dipartimento di Ricerca Clinica in Neurologia, Università degli Studi di Bari Aldo Moro, Pia Fondazione Cardinale G. Panico, 73039 Tricase, Italy; (D.U.); (B.T.); (S.N.)
- Istituto di Nanotecnologia (NANOTEC), Consiglio Nazionale delle Ricerche, Via Monteroni, 73100 Lecce, Italy
| | - Graziano Pesole
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari Aldo Moro, Via A. Orabona 4, 70125 Bari, Italy; (C.L.G.); (E.P.); (G.P.)
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Via A. Orabona 4, 70125 Bari, Italy
| | - Giancarlo Logroscino
- Dipartimento di Scienze Mediche di Base, Neuroscienze e Organi di Senso, Università degli Studi di Bari Aldo Moro, Piazza G. Cesare 11, 70124 Bari, Italy;
- Centro per le Malattie Neurodegenerative e l’Invecchiamento Cerebrale, Dipartimento di Ricerca Clinica in Neurologia, Università degli Studi di Bari Aldo Moro, Pia Fondazione Cardinale G. Panico, 73039 Tricase, Italy; (D.U.); (B.T.); (S.N.)
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy; (E.P.); (A.M.); (N.A.); (L.B.); (S.T.); (R.B.)
- Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, Via G. Amendola 173, 70125 Bari, Italy
| |
Collapse
|