1
|
Argov CM, Shneyour A, Jubran J, Sabag E, Mansbach A, Sepunaru Y, Filtzer E, Gruber G, Volozhinsky M, Yogev Y, Birk O, Chalifa-Caspi V, Rokach L, Yeger-Lotem E. Tissue-aware interpretation of genetic variants advances the etiology of rare diseases. Mol Syst Biol 2024; 20:1187-1206. [PMID: 39285047 PMCID: PMC11535248 DOI: 10.1038/s44320-024-00061-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/19/2024] Open
Abstract
Pathogenic variants underlying Mendelian diseases often disrupt the normal physiology of a few tissues and organs. However, variant effect prediction tools that aim to identify pathogenic variants are typically oblivious to tissue contexts. Here we report a machine-learning framework, denoted "Tissue Risk Assessment of Causality by Expression for variants" (TRACEvar, https://netbio.bgu.ac.il/TRACEvar/ ), that offers two advancements. First, TRACEvar predicts pathogenic variants that disrupt the normal physiology of specific tissues. This was achieved by creating 14 tissue-specific models that were trained on over 14,000 variants and combined 84 attributes of genetic variants with 495 attributes derived from tissue omics. TRACEvar outperformed 10 well-established and tissue-oblivious variant effect prediction tools. Second, the resulting models are interpretable, thereby illuminating variants' mode of action. Application of TRACEvar to variants of 52 rare-disease patients highlighted pathogenicity mechanisms and relevant disease processes. Lastly, the interpretation of all tissue models revealed that top-ranking determinants of pathogenicity included attributes of disease-affected tissues, particularly cellular process activities. Collectively, these results show that tissue contexts and interpretable machine-learning models can greatly enhance the etiology of rare diseases.
Collapse
Affiliation(s)
- Chanan M Argov
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ariel Shneyour
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Eric Sabag
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Avigdor Mansbach
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yair Sepunaru
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Emmi Filtzer
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Gil Gruber
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Miri Volozhinsky
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yuval Yogev
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ohad Birk
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Vered Chalifa-Caspi
- Ilse Katz Institute for Nanoscale Science & Technology, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
| |
Collapse
|
2
|
Odrzywolski A, Tüysüz B, Debeer P, Souche E, Voet A, Dimitrov B, Krzesińska P, Vermeesch JR, Tylzanowski P. Gollop-Wolfgang Complex Is Associated with a Monoallelic Variation in WNT11. Genes (Basel) 2024; 15:129. [PMID: 38275609 PMCID: PMC10815061 DOI: 10.3390/genes15010129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 01/27/2024] Open
Abstract
Gollop-Wolfgang complex (GWC) is a rare congenital limb anomaly characterized by tibial aplasia with femur bifurcation, ipsilateral bifurcation of the thigh bone, and split hand and monodactyly of the feet, resulting in severe and complex limb deformities. The genetic basis of GWC, however, has remained elusive. We studied a three-generation family with four GWC-affected family members. An analysis of whole-genome sequencing results using a custom pipeline identified the WNT11 c.1015G>A missense variant associated with the phenotype. In silico modelling and an in vitro reporter assay further supported the link between the variant and GWC. This finding further contributes to mapping the genetic heterogeneity underlying split hand/foot malformations in general and in GWC specifically.
Collapse
Affiliation(s)
- Adrian Odrzywolski
- Laboratory for Cytogenetics and Genome Research, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
- Department of Biochemistry and Molecular Biology, Medical University of Lublin, 20-093 Lublin, Poland
| | - Beyhan Tüysüz
- Department of Pediatric Genetics, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, 34098 Istanbul, Turkey
| | - Philippe Debeer
- Locomotor and Neurological Disorders, Department of Development and Regeneration, KU Leuven, B-3000 Leuven, Belgium
| | - Erika Souche
- Laboratory for Cytogenetics and Genome Research, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Arnout Voet
- Laboratory of Biomolecular Modelling and Design, Department of Chemistry, KU Leuven, 3001 Heverlee, Belgium
| | - Boyan Dimitrov
- Clinical Sciences, Research Group Reproduction and Genetics, Centre for Medical Genetics, Centre for Medical Genetics, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), 1090 Brussels, Belgium
| | - Paulina Krzesińska
- Laboratory of Molecular Genetics, Medical University of Lublin, 20-093 Lublin, Poland
| | - Joris Robert Vermeesch
- Laboratory for Cytogenetics and Genome Research, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Przemko Tylzanowski
- Laboratory of Molecular Genetics, Medical University of Lublin, 20-093 Lublin, Poland
- Skeletal Biology and Engineering Research Center, Department of Development and Regeneration, KU Leuven, B-3000 Leuven, Belgium
| |
Collapse
|
3
|
Alsentzer E, Finlayson SG, Li MM, Kobren SN, Kohane IS. Simulation of undiagnosed patients with novel genetic conditions. Nat Commun 2023; 14:6403. [PMID: 37828001 PMCID: PMC10570269 DOI: 10.1038/s41467-023-41980-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/26/2023] [Indexed: 10/14/2023] Open
Abstract
Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.
Collapse
Grants
- U01 HG007690 NHGRI NIH HHS
- U54 NS108251 NINDS NIH HHS
- U01 HG010219 NHGRI NIH HHS
- U01 HG007672 NHGRI NIH HHS
- U01 HG010233 NHGRI NIH HHS
- U01 HG010230 NHGRI NIH HHS
- U01 HG007943 NHGRI NIH HHS
- U01 HG010217 NHGRI NIH HHS
- U01 HG007942 NHGRI NIH HHS
- U01 HG010215 NHGRI NIH HHS
- U01 HG007708 NHGRI NIH HHS
- T32 HG002295 NHGRI NIH HHS
- T32 GM007753 NIGMS NIH HHS
- U01 HG007674 NHGRI NIH HHS
- U01 TR001395 NCATS NIH HHS
- U01 HG007709 NHGRI NIH HHS
- U54 NS093793 NINDS NIH HHS
- U01 HG007530 NHGRI NIH HHS
- U01 TR002471 NCATS NIH HHS
- U01 HG007703 NHGRI NIH HHS
- UDN research reported in this manuscript was supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number(s) U01HG007709, U01HG010219, U01HG010230, U01HG010217, U01HG010233, U01HG010215, U01HG007672, U01HG007690, U01HG007708, U01HG007703, U01HG007674, U01HG007530, U01HG007942, U01HG007943, U01TR001395, U01TR002471, U54NS108251, and U54NS093793.
- E.A. is supported by a Microsoft Research PhD Fellowship.
- S.F. is supported by award Number T32GM007753 from the National Institute of General Medical Sciences.
- M.L. is supported by T32HG002295 from the National Human Genome Research Institute and a National Science Foundation Graduate Research Fellowship.
Collapse
Affiliation(s)
- Emily Alsentzer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
| | - Samuel G Finlayson
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA, 98105, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105, USA
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Shilpa N Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
4
|
Simonovsky E, Sharon M, Ziv M, Mauer O, Hekselman I, Jubran J, Vinogradov E, Argov CM, Basha O, Kerber L, Yogev Y, Segrè AV, Im HK, Birk O, Rokach L, Yeger‐Lotem E. Predicting molecular mechanisms of hereditary diseases by using their tissue-selective manifestation. Mol Syst Biol 2023; 19:e11407. [PMID: 37232043 PMCID: PMC10407743 DOI: 10.15252/msb.202211407] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 04/30/2023] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
How do aberrations in widely expressed genes lead to tissue-selective hereditary diseases? Previous attempts to answer this question were limited to testing a few candidate mechanisms. To answer this question at a larger scale, we developed "Tissue Risk Assessment of Causality by Expression" (TRACE), a machine learning approach to predict genes that underlie tissue-selective diseases and selectivity-related features. TRACE utilized 4,744 biologically interpretable tissue-specific gene features that were inferred from heterogeneous omics datasets. Application of TRACE to 1,031 disease genes uncovered known and novel selectivity-related features, the most common of which was previously overlooked. Next, we created a catalog of tissue-associated risks for 18,927 protein-coding genes (https://netbio.bgu.ac.il/trace/). As proof-of-concept, we prioritized candidate disease genes identified in 48 rare-disease patients. TRACE ranked the verified disease gene among the patient's candidate genes significantly better than gene prioritization methods that rank by gene constraint or tissue expression. Thus, tissue selectivity combined with machine learning enhances genetic and clinical understanding of hereditary diseases.
Collapse
Affiliation(s)
- Eyal Simonovsky
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Moran Sharon
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Maya Ziv
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Omry Mauer
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Idan Hekselman
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Juman Jubran
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Ekaterina Vinogradov
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Chanan M Argov
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Omer Basha
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Lior Kerber
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Yuval Yogev
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health SciencesBen Gurion University of the NegevBeer ShevaIsrael
| | - Ayellet V Segrè
- Ocular Genomics Institute, Massachusetts Eye and EarHarvard Medical SchoolBostonMAUSA
- The Broad Institute of MIT and HarvardCambridgeMAUSA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoILUSA
| | | | - Ohad Birk
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health SciencesBen Gurion University of the NegevBeer ShevaIsrael
- The National Institute for Biotechnology in the NegevBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Lior Rokach
- Department of Software & Information Systems EngineeringBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Esti Yeger‐Lotem
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
- The National Institute for Biotechnology in the NegevBen‐Gurion University of the NegevBeer ShevaIsrael
| |
Collapse
|
5
|
Rahaie Z, Rabiee HR, Alinejad-Rokny H. DeepGenePrior: A deep learning model for prioritizing genes affected by copy number variants. PLoS Comput Biol 2023; 19:e1011249. [PMID: 37486921 PMCID: PMC10399873 DOI: 10.1371/journal.pcbi.1011249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 08/03/2023] [Accepted: 06/06/2023] [Indexed: 07/26/2023] Open
Abstract
The genetic etiology of brain disorders is highly heterogeneous, characterized by abnormalities in the development of the central nervous system that lead to diminished physical or intellectual capabilities. The process of determining which gene drives disease, known as "gene prioritization," is not entirely understood. Genome-wide searches for gene-disease associations are still underdeveloped due to reliance on previous discoveries and evidence sources with false positive or negative relations. This paper introduces DeepGenePrior, a model based on deep neural networks that prioritizes candidate genes in genetic diseases. Using the well-studied Variational AutoEncoder (VAE), we developed a score to measure the impact of genes on target diseases. Unlike other methods that use prior data to select candidate genes, based on the "guilt by association" principle and auxiliary data sources like protein networks, our study exclusively employs copy number variants (CNVs) for gene prioritization. By analyzing CNVs from 74,811 individuals with autism, schizophrenia, and developmental delay, we identified genes that best distinguish cases from controls. Our findings indicate a 12% increase in fold enrichment in brain-expressed genes compared to previous studies and a 15% increase in genes associated with mouse nervous system phenotypes. Furthermore, we identified common deletions in ZDHHC8, DGCR5, and CATG00000022283 among the top genes related to all three disorders, suggesting a common etiology among these clinically distinct conditions. DeepGenePrior is publicly available online at http://git.dml.ir/z_rahaie/DGP to address obstacles in existing gene prioritization studies identifying candidate genes.
Collapse
Affiliation(s)
- Zahra Rahaie
- BCB Group, DML, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Hamid R. Rabiee
- BCB Group, DML, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Hamid Alinejad-Rokny
- UNSW Biomedical Machine Learning Lab (BML), the Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Australia
| |
Collapse
|
6
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
7
|
Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks. Int J Mol Sci 2022; 23:ijms23137411. [PMID: 35806415 PMCID: PMC9266751 DOI: 10.3390/ijms23137411] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/25/2022] [Accepted: 06/30/2022] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies (GWAS) can be used to infer genome intervals that are involved in genetic diseases. However, investigating a large number of putative mutations for GWAS is resource- and time-intensive. Network-based computational approaches are being used for efficient disease-gene association prediction. Network-based methods are based on the underlying assumption that the genes causing the same diseases are located close to each other in a molecular network, such as a protein-protein interaction (PPI) network. In this survey, we provide an overview of network-based disease-gene association prediction methods based on three categories: graph-theoretic algorithms, machine learning algorithms, and an integration of these two. We experimented with six selected methods to compare their prediction performance using a heterogeneous network constructed by combining a genome-wide weighted PPI network, an ontology-based disease network, and disease-gene associations. The experiment was conducted in two different settings according to the presence and absence of known disease-associated genes. The results revealed that HerGePred, an integrative method, outperformed in the presence of known disease-associated genes, whereas PRINCE, which adopted a network propagation algorithm, was the most competitive in the absence of known disease-associated genes. Overall, the results demonstrated that the integrative methods performed better than the methods using graph-theory only, and the methods using a heterogeneous network performed better than those using a homogeneous PPI network only.
Collapse
|
8
|
Su L, Liu G, Guo Y, Zhang X, Zhu X, Wang J. Integration of Protein-Protein Interaction Networks and Gene Expression Profiles Helps Detect Pancreatic Adenocarcinoma Candidate Genes. Front Genet 2022; 13:854661. [PMID: 35711911 PMCID: PMC9197464 DOI: 10.3389/fgene.2022.854661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
More and more cancer-associated genes (CAGs) are being identified with the development of biological mechanism research. Integrative analysis of protein-protein interaction (PPI) networks and co-expression patterns of these genes can help identify new disease-associated genes and clarify their importance in specific diseases. This study proposed a PPI network and co-expression integration analysis model (PRNet) to integrate PPI networks and gene co-expression patterns to identify potential risk causative genes for pancreatic adenocarcinoma (PAAD). We scored the importance of the candidate genes by constructing a high-confidence co-expression-based edge-weighted PPI network, extracting protein regulatory sub-networks by random walk algorithm, constructing disease-specific networks based on known CAGs, and scoring the genes of the sub-networks with the PageRank algorithm. The results showed that our screened top-ranked genes were more critical in tumours relative to the known CAGs list and significantly differentiated the overall survival of PAAD patients. These results suggest that the PRNet method of ranking cancer-associated genes can identify new disease-associated genes and is more informative than the original CAGs list, which can help investigators to screen potential biomarkers for validation and molecular mechanism exploration.
Collapse
Affiliation(s)
- Lili Su
- College of Electronics and Information Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Guang Liu
- College of Electronics and Information Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Ying Guo
- Department of Histology and Embryology, College of Basic Medical Sciences, Jilin University, Changchun, China
| | - Xuanping Zhang
- College of Electronics and Information Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Xiaoyan Zhu
- College of Electronics and Information Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Jiayin Wang
- College of Electronics and Information Engineering, School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
9
|
Frederiksen SD. Prioritizing Suggestive Candidate Genes in Migraine: An Opinion. Front Neurol 2022; 13:910366. [PMID: 35785356 PMCID: PMC9240222 DOI: 10.3389/fneur.2022.910366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
|
10
|
Liu H, Hou L, Xu S, Li H, Chen X, Gao J, Wang Z, Han B, Liu X, Wan S. Discovering Cerebral Ischemic Stroke Associated Genes Based on Network Representation Learning. Front Genet 2021; 12:728333. [PMID: 34539754 PMCID: PMC8442767 DOI: 10.3389/fgene.2021.728333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
Cerebral ischemic stroke (IS) is a complex disease caused by multiple factors including vascular risk factors, genetic factors, and environment factors, which accentuates the difficulty in discovering corresponding disease-related genes. Identifying the genes associated with IS is critical for understanding the biological mechanism of IS, which would be significantly beneficial to the diagnosis and clinical treatment of cerebral IS. However, existing methods to predict IS-related genes are mainly based on the hypothesis of guilt-by-association (GBA). These methods cannot capture the global structure information of the whole protein-protein interaction (PPI) network. Inspired by the success of network representation learning (NRL) in the field of network analysis, we apply NRL to the discovery of disease-related genes and launch the framework to identify the disease-related genes of cerebral IS. The utilized framework contains three main parts: capturing the topological information of the PPI network with NRL, denoising the gene feature with the participation of a stacked autoencoder (SAE), and optimizing a support vector machine (SVM) classifier to identify IS-related genes. Superior to the existing methods on IS-related gene prediction, our framework presents more accurate results. The case study also shows that the proposed method can identify IS-related genes.
Collapse
Affiliation(s)
- Haijie Liu
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Liping Hou
- Department of Clinical Laboratory, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Shanhu Xu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - He Li
- Department of Automation, College of Information Science and Engineering, Tianjin Tianshi College, Tianjin, China
| | - Xiuju Chen
- Department of Neurology, Tianjin Nankai Hospital, Tianjin, China
| | - Juan Gao
- Department of Neurology, Baoding No. 1 Central Hospital, Baoding, China
| | - Ziwen Wang
- Graduate School of Chengde Medical College, Chengde, China
| | - Bo Han
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Xiaoli Liu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shu Wan
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
11
|
Shu J, Li Y, Wang S, Xi B, Ma J. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics 2021; 37:i410-i417. [PMID: 34252957 PMCID: PMC8275341 DOI: 10.1093/bioinformatics/btab310] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
Collapse
Affiliation(s)
- Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bowei Xi
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
12
|
Computational drug repositioning for ischemic stroke: neuroprotective drug discovery. Future Med Chem 2021; 13:1271-1283. [PMID: 34137272 DOI: 10.4155/fmc-2021-0022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background: A comprehensive approach to drug repositioning will be required to overcome translational hurdles and identify more neuroprotective drugs. Results & methods: Gene Set Enrichment Analysis was applied to identify related pathways and enriched genes. Candidate genes were optimized using ToppGene, ToppGenet and pBRIT. From the perspective of the local structures, gene-domain-substructure-drug relationships were constructed. Using the MCODE algorithm and K-means clustering, 31 functional subnetworks were obtained, and 252 drugs with proposed neuroprotective function were identified. Using computational analysis, 72 substructures with different scores were found to correspond to neuroprotective functions. The protective effects of benidipine and barnidipine were confirmed in vitro. Conclusion: The authors' research has great potential to discover more neuroprotective drugs and obtain more information regarding mechanisms of action and functional substructures.
Collapse
|
13
|
Tang X, Xiao Q, Yu K. Breast Cancer Candidate Gene Detection Through Integration of Subcellular Localization Data With Protein–Protein Interaction Networks. IEEE Trans Nanobioscience 2020; 19:556-561. [DOI: 10.1109/tnb.2020.2990178] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
14
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
15
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
16
|
Tan A, Huang H, Zhang P, Li S. Network-based cancer precision medicine: A new emerging paradigm. Cancer Lett 2019; 458:39-45. [DOI: 10.1016/j.canlet.2019.05.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/29/2019] [Accepted: 05/15/2019] [Indexed: 12/20/2022]
|
17
|
Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019; 13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Improving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms. Results In this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms. Conclusions Compared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.
Collapse
Affiliation(s)
- Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|