1
|
de Siqueira Santos S, Yang H, Galeano A, Paccanaro A. Host centric drug repurposing for viral diseases. PLoS Comput Biol 2025; 21:e1012876. [PMID: 40173200 PMCID: PMC12052139 DOI: 10.1371/journal.pcbi.1012876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 05/05/2025] [Accepted: 02/14/2025] [Indexed: 04/04/2025] Open
Abstract
Computational approaches for drug repurposing for viral diseases have mainly focused on a small number of antivirals that directly target pathogens (virus centric therapies). In this work, we combine ideas from collaborative filtering and network medicine for making predictions on a much larger set of drugs that could be repurposed for host centric therapies, that are aimed at interfering with host cell factors required by a pathogen. Our idea is to create matrices quantifying the perturbation that drugs and viruses induce on human protein interaction networks. Then, we decompose these matrices to learn embeddings of drugs, viruses, and proteins in a low dimensional space. Predictions of host-centric antivirals are obtained by taking the dot product between the corresponding drug and virus representations. Our approach is general and can be applied systematically to any compound with known targets and any virus whose host proteins are known. We show that our predictions have high accuracy and that the embeddings contain meaningful biological information that may provide insights into the underlying biology of viral infections. Our approach can integrate different types of information, does not rely on known drug-virus associations and can be applied to new viral diseases and drugs.
Collapse
Affiliation(s)
| | - Haixuan Yang
- School of Mathematical & Statistical Sciences, University of Galway, Galway, Ireland
| | - Aldo Galeano
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham Hill, Egham, United Kingdom
| |
Collapse
|
2
|
Caniza H, Cáceres JJ, Torres M, Paccanaro A. LanDis: the disease landscape explorer. Eur J Hum Genet 2024; 32:461-465. [PMID: 38200084 PMCID: PMC10999415 DOI: 10.1038/s41431-023-01511-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/01/2023] [Accepted: 11/23/2023] [Indexed: 01/12/2024] Open
Abstract
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
Collapse
Affiliation(s)
- Horacio Caniza
- Universidad Paraguayo Alemana de Ciencias Aplicadas, Facultad de Ciencias de la Ingeniería, San Lorenzo, Paraguay
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Juan J Cáceres
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Mateo Torres
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil.
| |
Collapse
|
3
|
Ghasemi M, Rahgozar M, Kavousi K. Complex Disease Genes Identification Using a Heterogeneous Network Embedding Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:875-882. [PMID: 35594221 DOI: 10.1109/tcbb.2022.3175598] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Finding the causal relation between a gene and a disease using experimental approaches is a time-consuming and expensive task. However, computational approaches are cost-efficient methods for identifying candidate genes. This article proposes a new heterogeneous biological network embedding approach, named NetEM, to identify disease-associated genes. To evaluate NetEM, we examine six complex diseases, including peroxisomal disorders, sarcoma, grave's disease, lysosomal storage diseases, blood coagulation disorders, and cardiomyopathy hypertrophic. Our experiments indicate that NetEM outperforms three well-known state-of-the-art algorithms: Cardigan, DIAMOnD and GeneWanderer, in identifying disease genes. We examine TCGA data of Invasive Lobular Breast Cancer and CPTAC data of human glioblastoma as other case studies to evaluate NetEM using real data. This evaluation also indicates the validity of the method. The source codes of NetEM and data are available in the supplementary of this article.
Collapse
|
4
|
Chen Y, Hu Y, Hu X, Feng C, Chen M. CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure. Bioinformatics 2022; 38:4380-4386. [PMID: 35900147 DOI: 10.1093/bioinformatics/btac520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multiview data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. RESULTS We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a nonlinear projection. Then cross-view contrastive loss is applied to maximize the agreement of corresponding gene-GO associations and lead to meaningful gene representation. Finally, CoGO infers the similarity between diseases by the cosine similarity of disease representation vectors derived from related gene embedding. In our experiments, CoGO outperforms the most competitive baseline method on both AUROC and AUPRC, especially improves 19.57% in AUPRC (0.7733). The prediction results are significantly comparable with other disease similarity studies and thus highly credible. Furthermore, we conduct a detailed case study of top similar disease pairs which is demonstrated by other studies. Empirical results show that CoGO achieves powerful performance in disease similarity problem. AVAILABILITY AND IMPLEMENTATION https://github.com/yhchen1123/CoGO.
Collapse
Affiliation(s)
- Yuhao Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yanshi Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Cong Feng
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Biomedical Big Data Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.,Institute of Hematology, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
5
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
6
|
Mancuso CA, Bills PS, Krum D, Newsted J, Liu R, Krishnan A. GenePlexus: a web-server for gene discovery using network-based machine learning. Nucleic Acids Res 2022; 50:W358-W366. [PMID: 35580053 PMCID: PMC9252732 DOI: 10.1093/nar/gkac335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/30/2022] [Indexed: 11/28/2022] Open
Abstract
Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.
Collapse
Affiliation(s)
- Christopher A Mancuso
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Patrick S Bills
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Douglas Krum
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Jacob Newsted
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Renming Liu
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
7
|
Ji Y, Chen R, Wang Q, Wei Q, Tao R, Li B. A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization. BMC Bioinformatics 2022; 23:146. [PMID: 35459094 PMCID: PMC9034518 DOI: 10.1186/s12859-022-04616-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 02/15/2022] [Indexed: 12/03/2022] Open
Abstract
Background Autism spectrum disorder (ASD) is a group of complex neurodevelopment disorders with a strong genetic basis. Large scale sequencing studies have identified over one hundred ASD risk genes. Nevertheless, the vast majority of ASD risk genes remain to be discovered, as it is estimated that more than 1000 genes are likely to be involved in ASD risk. Prioritization of risk genes is an effective strategy to increase the power of identifying novel risk genes in genetics studies of ASD. As ASD risk genes are likely to exhibit distinct properties from multiple angles, we reason that integrating multiple levels of genomic data is a powerful approach to pinpoint genuine ASD risk genes. Results We present BNScore, a Bayesian model selection framework to probabilistically prioritize ASD risk genes through explicitly integrating evidence from sequencing-identified ASD genes, biological annotations, and gene functional network. We demonstrate the validity of our approach and its improved performance over existing methods by examining the resulting top candidate ASD risk genes against sets of high-confidence benchmark genes and large-scale ASD genome-wide association studies. We assess the tissue-, cell type- and development stage-specific expression properties of top prioritized genes, and find strong expression specificity in brain tissues, striatal medium spiny neurons, and fetal developmental stages. Conclusions In summary, we show that by integrating sequencing findings, functional annotation profiles, and gene-gene functional network, our proposed BNScore provides competitive performance compared to current state-of-the-art methods in prioritizing ASD genes. Our method offers a general and flexible strategy to risk gene prioritization that can potentially be applied to other complex traits as well. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04616-y.
Collapse
Affiliation(s)
- Ying Ji
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Quan Wang
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA. .,Department of Biostatistics, Vanderbilt University, Nashville, TN, 37212, USA.
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA. .,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA.
| |
Collapse
|
8
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
9
|
Santos SDS, Torres M, Galeano D, Sánchez MDM, Cernuzzi L, Paccanaro A. Machine learning and network medicine approaches for drug repositioning for COVID-19. PATTERNS (NEW YORK, N.Y.) 2022; 3:100396. [PMID: 34778851 PMCID: PMC8576113 DOI: 10.1016/j.patter.2021.100396] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 06/21/2021] [Accepted: 11/01/2021] [Indexed: 12/13/2022]
Abstract
We present two machine learning approaches for drug repurposing. While we have developed them for COVID-19, they are disease-agnostic. The two methodologies are complementary, targeting SARS-CoV-2 and host factors, respectively. Our first approach consists of a matrix factorization algorithm to rank broad-spectrum antivirals. Our second approach, based on network medicine, uses graph kernels to rank drugs according to the perturbation they induce on a subnetwork of the human interactome that is crucial for SARS-CoV-2 infection/replication. Our experiments show that our top predicted broad-spectrum antivirals include drugs indicated for compassionate use in COVID-19 patients; and that the ranking obtained by our kernel-based approach aligns with experimental data. Finally, we present the COVID-19 repositioning explorer (CoREx), an interactive online tool to explore the interplay between drugs and SARS-CoV-2 host proteins in the context of biological networks, protein function, drug clinical use, and Connectivity Map. CoREx is freely available at: https://paccanarolab.org/corex/.
Collapse
Affiliation(s)
- Suzana de Siqueira Santos
- Escola de Matemática Aplicada, Fundação Getulio Vargas, Rio de Janeiro 22250-900, Brazil
- COVID-19 International Research Team
| | - Mateo Torres
- Escola de Matemática Aplicada, Fundação Getulio Vargas, Rio de Janeiro 22250-900, Brazil
- COVID-19 International Research Team
| | - Diego Galeano
- Escola de Matemática Aplicada, Fundação Getulio Vargas, Rio de Janeiro 22250-900, Brazil
- Facultad de Ingenieria, Universidad Nacional de Asunción, Luque 110948, Paraguay
- COVID-19 International Research Team
| | | | - Luca Cernuzzi
- Universidad Católica “Nuestra Señora de la Asunción”, Asunción C.C. 1683, Paraguay
| | - Alberto Paccanaro
- Escola de Matemática Aplicada, Fundação Getulio Vargas, Rio de Janeiro 22250-900, Brazil
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham Hill, Egham TW20 0EX, UK
- COVID-19 International Research Team
| |
Collapse
|
10
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
11
|
Petti M, Farina L, Francone F, Lucidi S, Macali A, Palagi L, De Santis M. MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction. Genes (Basel) 2021; 12:1713. [PMID: 34828319 PMCID: PMC8624742 DOI: 10.3390/genes12111713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/16/2021] [Accepted: 10/25/2021] [Indexed: 11/17/2022] Open
Abstract
Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Rome, Italy; (L.F.); (F.F.); (S.L.); (A.M.); (L.P.); (M.D.S.)
| | | | | | | | | | | | | |
Collapse
|
12
|
Huang K, Xiao C, Glass LM, Critchlow CW, Gibson G, Sun J. Machine learning applications for therapeutic tasks with genomics data. PATTERNS (NEW YORK, N.Y.) 2021; 2:100328. [PMID: 34693370 PMCID: PMC8515011 DOI: 10.1016/j.patter.2021.100328] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.
Collapse
Affiliation(s)
- Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Cao Xiao
- Amplitude, San Francisco, CA 94105, USA
| | - Lucas M. Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | | | - Greg Gibson
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jimeng Sun
- Computer Science Department and Carle's Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL 61820, USA
| |
Collapse
|
13
|
Habib N, Rahman MM. Diagnosis of corona diseases from associated genes and X-ray images using machine learning algorithms and deep CNN. INFORMATICS IN MEDICINE UNLOCKED 2021; 24:100621. [PMID: 34075341 PMCID: PMC8159714 DOI: 10.1016/j.imu.2021.100621] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 05/18/2021] [Accepted: 05/24/2021] [Indexed: 01/15/2023] Open
Abstract
Novel Coronavirus with its highly transmittable characteristics is rapidly spreading, endangering millions of human lives and the global economy. To expel the chain of alteration and subversive expansion, early and effective diagnosis of infected patients is immensely important. Unfortunately, there is a lack of testing equipment in many countries as compared with the number of infected patients. It would be desirable to have a swift diagnosis with identification of COVID-19 from disease genes or from CT or X-Ray images. COVID-19 causes flus, cough, pneumonia, and lung infection in patients, wherein massive alveolar damage and progressive respiratory failure can lead to death. This paper proposes two different detection methods - the first is a Gene-based screening method to detect Corona diseases (Middle East respiratory syndrome-related coronavirus, Severe acute respiratory syndrome coronavirus 2, and Human coronavirus HKU1) and differentiate it from Pneumonia. This novel approach to healthcare utilizes disease genes to build functional semantic similarity among genes. Different machine learning algorithms - eXtreme Gradient Boosting, Naïve Bayes, Regularized Random Forest, Random Forest Rule-Based Model, Random Ferns, C5.0 and Multi-Layer Perceptron, are trained and tested on the semantic similarities to classify Corona and Pneumonia diseases. The best performing models are then ensembled, yielding an accuracy of nearly 93%. The second diagnosis technique proposed herein is an automated COVID-19 diagnostic method which uses chest X-ray images to classify Normal versus COVID-19 and Pneumonia versus COVID-19 images using the deep-CNN technique, achieving 99.87% and 99.48% test accuracy. Thus, this research can be an assistance for providing better treatment against COVID-19.
Collapse
Affiliation(s)
- Nahida Habib
- Department of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail, 1902, Bangladesh
- Department of Computer Science and Engineering (CSE), Ranada Prasad Shaha University (RPSU), Narayanganj, 1400, Bangladesh
| | - Mohammad Motiur Rahman
- Department of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail, 1902, Bangladesh
| |
Collapse
|
14
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
15
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020; 36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open
Abstract
Background Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. Results In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. Availability and implementation The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. Contact arjun@msu.edu Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- To whom correspondence should be addressed.
| |
Collapse
|
16
|
Galeano D, Li S, Gerstein M, Paccanaro A. Predicting the frequencies of drug side effects. Nat Commun 2020; 11:4575. [PMID: 32917868 PMCID: PMC7486409 DOI: 10.1038/s41467-020-18305-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 08/07/2020] [Indexed: 12/25/2022] Open
Abstract
A central issue in drug risk-benefit assessment is identifying frequencies of side effects in humans. Currently, frequencies are experimentally determined in randomised controlled clinical trials. We present a machine learning framework for computationally predicting frequencies of drug side effects. Our matrix decomposition algorithm learns latent signatures of drugs and side effects that are both reproducible and biologically interpretable. We show the usefulness of our approach on 759 structurally and therapeutically diverse drugs and 994 side effects from all human physiological systems. Our approach can be applied to any drug for which a small number of side effect frequencies have been identified, in order to predict the frequencies of further, yet unidentified, side effects. We show that our model is informative of the biology underlying drug activity: individual components of the drug signatures are related to the distinct anatomical categories of the drugs and to the specific drug routes of administration. Currently, the frequencies of drug side effects are determined in randomised controlled clinical trials. Here the authors develop an interpretable machine learning approach to predict the frequencies of unknown side effects for drugs with a small number of determined side effect frequencies.
Collapse
Affiliation(s)
- Diego Galeano
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham Hill, Egham, UK.,School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Brazil
| | - Shantao Li
- Department of Computer Science and Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Department of Computer Science, and Department of Statistics and Data Science, Yale University, New Haven, CT, 06520, USA
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham Hill, Egham, UK. .,School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Brazil.
| |
Collapse
|
17
|
Liu X, He T, Guo Z, Ren M, Luo Y. Predicting essential genes of 41 prokaryotes by a semi-supervised method. Anal Biochem 2020; 609:113919. [PMID: 32827465 DOI: 10.1016/j.ab.2020.113919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/25/2020] [Accepted: 08/13/2020] [Indexed: 10/23/2022]
Abstract
Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China.
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Zhirui Guo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| |
Collapse
|
18
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020; 36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT arjun@msu.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
19
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
20
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|