1
|
Kafkas Ş, Abdelhakim M, Althagafi A, Toonsi S, Alghamdi M, Schofield PN, Hoehndorf R. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients. Sci Rep 2025; 15:15093. [PMID: 40301638 PMCID: PMC12041562 DOI: 10.1038/s41598-025-99539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 04/21/2025] [Indexed: 05/01/2025] Open
Abstract
Computational methods for identifying gene-disease associations can use both genomic and phenotypic information to prioritize genes and variants that may be associated with genetic diseases. Phenotype-based methods commonly rely on comparing phenotypes observed in a patient with databases of genotype-to-phenotype associations using measures of semantic similarity. They are constrained by the quality and completeness of these resources as well as the quality and completeness of patient phenotype annotation. Genotype-to-phenotype associations used by these methods are largely derived from the literature and coded using phenotype ontologies. Large Language Models (LLMs) have been trained on large amounts of text and data and have shown their potential to answer complex questions across multiple domains. Here, we evaluate the effectiveness of LLMs in prioritizing disease-associated genes compared to existing bioinformatics methods. We show that LLMs can prioritize disease-associated genes as well, or better than, dedicated bioinformatics methods relying on pre-defined phenotype similarity, when gene sets range from 5 to 100 candidates. We apply our approach to a cohort of undiagnosed patients with rare diseases and show that LLMs can be used to provide diagnostic support that helps in identifying plausible candidate genes. Our results show that LLMs may offer an alternative to traditional bioinformatics methods to prioritize disease-associated genes based on disease phenotypes. They may, therefore, potentially enhance diagnostic accuracy and simplify the process for rare genetic diseases.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, 26571, Taif, Saudi Arabia
| | - Sumyyah Toonsi
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Generative AI, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
| | - Malak Alghamdi
- Medical Genetic Division, Department of Pediatrics, College of Medicine, King Saud University, 11461, Riyadh, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- KAUST Center of Excellence for Generative AI, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
2
|
Jiang X, Cui X, Nie R, You H, Tang Z, Liu W. Network pharmacology-based analysis on the key mechanisms of Yiguanjian acting on chronic hepatitis. Heliyon 2024; 10:e29977. [PMID: 38756592 PMCID: PMC11096846 DOI: 10.1016/j.heliyon.2024.e29977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/18/2024] Open
Abstract
Chronic hepatitis (CH) encompasses a prevalent array of liver conditions that significantly contribute to global morbidity and mortality. Yiguanjian (YGJ) is a classical traditional Chinese medicine with a long history of medicinal as a treatment for CH. Although it has been reported that YGJ can reduce liver inflammation, the intricate mechanism requires further elucidation. We used network pharmacology approaches in this work, such as gene ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, and network-based analysis of protein-protein interactions (PPIs), to clarify the pharmacological constituents, potential therapeutic targets, and YGJ signaling pathways associated with CH. Employing the random walk restart (RWR) algorithm, we identified GNAS, GNB1, CYP2E1, SFTPC, F2, MAPK3, PLG, SRC, HDAC1, and STAT3 as pivotal targets within the PPI network of YGJ-CH. YGJ attenuated liver inflammation and inhibited GNAS/STAT3 signaling in vivo. In vitro, we overexpressed the GNAS gene further to verify the critical role of GNAS in YGJ treatment. Our findings highlight GNAS/STAT3 as a promising therapeutic target for CH, providing a basis and direction for future investigations.
Collapse
Affiliation(s)
- Xiaodan Jiang
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, China
| | - Xinyi Cui
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, China
| | - Ruifang Nie
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, China
| | - Hongjie You
- School of Basic Medical Sciences, Capital Medical University, Beijing, China
| | - Zuoqing Tang
- School of Basic Medical Sciences, Capital Medical University, Beijing, China
| | - Wenlan Liu
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, China
| |
Collapse
|
3
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
4
|
Liu C, Xiao K, Yu C, Lei Y, Lyu K, Tian T, Zhao D, Zhou F, Tang H, Zeng J. A probabilistic knowledge graph for target identification. PLoS Comput Biol 2024; 20:e1011945. [PMID: 38578805 PMCID: PMC11034645 DOI: 10.1371/journal.pcbi.1011945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 04/22/2024] [Accepted: 02/24/2024] [Indexed: 04/07/2024] Open
Abstract
Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.
Collapse
Affiliation(s)
- Chang Liu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kaimin Xiao
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
- Joint Graduate Program of Peking-Tsinghua-NIBS, School of Life Sciences, Tsinghua University, Beijing, China
| | - Cuinan Yu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kangbo Lyu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, Jilin Province, China
| | - Haidong Tang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- School of Engineering, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Research Center for Industries of the Future and School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China
| |
Collapse
|
5
|
Canavati C, Sherill-Rofe D, Kamal L, Bloch I, Zahdeh F, Sharon E, Terespolsky B, Allan IA, Rabie G, Kawas M, Kassem H, Avraham KB, Renbaum P, Levy-Lahad E, Kanaan M, Tabach Y. Using multi-scale genomics to associate poorly annotated genes with rare diseases. Genome Med 2024; 16:4. [PMID: 38178268 PMCID: PMC10765705 DOI: 10.1186/s13073-023-01276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 12/15/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) has significantly transformed the landscape of identifying disease-causing genes associated with genetic disorders. However, a substantial portion of sequenced patients remains undiagnosed. This may be attributed not only to the challenges posed by harder-to-detect variants, such as non-coding and structural variations but also to the existence of variants in genes not previously associated with the patient's clinical phenotype. This study introduces EvORanker, an algorithm that integrates unbiased data from 1,028 eukaryotic genomes to link mutated genes to clinical phenotypes. METHODS EvORanker utilizes clinical data, multi-scale phylogenetic profiling, and other omics data to prioritize disease-associated genes. It was evaluated on solved exomes and simulated genomes, compared with existing methods, and applied to 6260 knockout genes with mouse phenotypes lacking human associations. Additionally, EvORanker was made accessible as a user-friendly web tool. RESULTS In the analyzed exomic cohort, EvORanker accurately identified the "true" disease gene as the top candidate in 69% of cases and within the top 5 candidates in 95% of cases, consistent with results from the simulated dataset. Notably, EvORanker outperformed existing methods, particularly for poorly annotated genes. In the case of the 6260 knockout genes with mouse phenotypes, EvORanker linked 41% of these genes to observed human disease phenotypes. Furthermore, in two unsolved cases, EvORanker successfully identified DLGAP2 and LPCAT3 as disease candidates for previously uncharacterized genetic syndromes. CONCLUSIONS We highlight clade-based phylogenetic profiling as a powerful systematic approach for prioritizing potential disease genes. Our study showcases the efficacy of EvORanker in associating poorly annotated genes to disease phenotypes observed in patients. The EvORanker server is freely available at https://ccanavati.shinyapps.io/EvORanker/ .
Collapse
Affiliation(s)
- Christina Canavati
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Lara Kamal
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Fouad Zahdeh
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Batel Terespolsky
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Islam Abu Allan
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Grace Rabie
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Mariana Kawas
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Hanin Kassem
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
| | - Karen B Avraham
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Paul Renbaum
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
| | - Ephrat Levy-Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, 91031, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel
| | - Moien Kanaan
- Molecular Genetics Lab, Istishari Arab Hospital, Ramallah, Palestine
- Hereditary Research Laboratory and Department of Life Sciences, Bethlehem University, Bethlehem, 72372, Palestine
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute of Medical Research - Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel.
| |
Collapse
|
6
|
Martin-Hernandez R, Espeso-Gil S, Domingo C, Latorre P, Hervas S, Hernandez Mora JR, Kotelnikova E. Machine learning combining multi-omics data and network algorithms identifies adrenocortical carcinoma prognostic biomarkers. Front Mol Biosci 2023; 10:1258902. [PMID: 38028548 PMCID: PMC10658191 DOI: 10.3389/fmolb.2023.1258902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/06/2023] [Indexed: 12/01/2023] Open
Abstract
Background: Rare endocrine cancers such as Adrenocortical Carcinoma (ACC) present a serious diagnostic and prognostication challenge. The knowledge about ACC pathogenesis is incomplete, and patients have limited therapeutic options. Identification of molecular drivers and effective biomarkers is required for timely diagnosis of the disease and stratify patients to offer the most beneficial treatments. In this study we demonstrate how machine learning methods integrating multi-omics data, in combination with system biology tools, can contribute to the identification of new prognostic biomarkers for ACC. Methods: ACC gene expression and DNA methylation datasets were downloaded from the Xena Browser (GDC TCGA Adrenocortical Carcinoma cohort). A highly correlated multi-omics signature discriminating groups of samples was identified with the data integration analysis for biomarker discovery using latent components (DIABLO) method. Additional regulators of the identified signature were discovered using Clarivate CBDD (Computational Biology for Drug Discovery) network propagation and hidden nodes algorithms on a curated network of molecular interactions (MetaBase™). The discriminative power of the multi-omics signature and their regulators was delineated by training a random forest classifier using 55 samples, by employing a 10-fold cross validation with five iterations. The prognostic value of the identified biomarkers was further assessed on an external ACC dataset obtained from GEO (GSE49280) using the Kaplan-Meier estimator method. An optimal prognostic signature was finally derived using the stepwise Akaike Information Criterion (AIC) that allowed categorization of samples into high and low-risk groups. Results: A multi-omics signature including genes, micro RNA's and methylation sites was generated. Systems biology tools identified additional genes regulating the features included in the multi-omics signature. RNA-seq, miRNA-seq and DNA methylation sets of features revealed a high power to classify patients from stages I-II and stages III-IV, outperforming previously identified prognostic biomarkers. Using an independent dataset, associations of the genes included in the signature with Overall Survival (OS) data demonstrated that patients with differential expression levels of 8 genes and 4 micro RNA's showed a statistically significant decrease in OS. We also found an independent prognostic signature for ACC with potential use in clinical practice, combining 9-gene/micro RNA features, that successfully predicted high-risk ACC cancer patients. Conclusion: Machine learning and integrative analysis of multi-omics data, in combination with Clarivate CBDD systems biology tools, identified a set of biomarkers with high prognostic value for ACC disease. Multi-omics data is a promising resource for the identification of drivers and new prognostic biomarkers in rare diseases that could be used in clinical practice.
Collapse
|
7
|
Guthrie J, Ko¨stel Bal S, Lombardo SD, Mu¨ller F, Sin C, Hu¨tter CV, Menche J, Boztug K. AutoCore: A network-based definition of the core module of human autoimmunity and autoinflammation. SCIENCE ADVANCES 2023; 9:eadg6375. [PMID: 37656781 PMCID: PMC10848965 DOI: 10.1126/sciadv.adg6375] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 08/01/2023] [Indexed: 09/03/2023]
Abstract
Although research on rare autoimmune and autoinflammatory diseases has enabled definition of nonredundant regulators of homeostasis in human immunity, because of the single gene-single disease nature of many of these diseases, contributing factors were mostly unveiled in sequential and noncoordinated individual studies. We used a network-based approach for integrating a set of 186 inborn errors of immunity with predominant autoimmunity/autoinflammation into a comprehensive map of human immune dysregulation, which we termed "AutoCore." The AutoCore is located centrally within the interactome of all protein-protein interactions, connecting and pinpointing multidisease markers for a range of common, polygenic autoimmune/autoinflammatory diseases. The AutoCore can be subdivided into 19 endotypes that correspond to molecularly and phenotypically cohesive disease subgroups, providing a molecular mechanism-based disease classification and rationale toward systematic targeting for therapeutic purposes. Our study provides a proof of concept for using network-based methods to systematically investigate the molecular relationships between individual rare diseases and address a range of conceptual, diagnostic, and therapeutic challenges.
Collapse
Affiliation(s)
- Julia Guthrie
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Zimmermannplatz 10, A-1090 Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090 Vienna, Austria
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Department of Structural and Computational Biology, University of Vienna, Dr.-Bohr-Gasse 9, 1030, Vienna Austria
| | - Sevgi Ko¨stel Bal
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Zimmermannplatz 10, A-1090 Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090 Vienna, Austria
- St. Anna Children’s Cancer Research Institute (CCRI), Zimmermannplatz 10, A-1090 Vienna, Austria
| | - Salvo Danilo Lombardo
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090 Vienna, Austria
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Department of Structural and Computational Biology, University of Vienna, Dr.-Bohr-Gasse 9, 1030, Vienna Austria
| | - Felix Mu¨ller
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Department of Structural and Computational Biology, University of Vienna, Dr.-Bohr-Gasse 9, 1030, Vienna Austria
| | - Celine Sin
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Department of Structural and Computational Biology, University of Vienna, Dr.-Bohr-Gasse 9, 1030, Vienna Austria
| | - Christiane V. R. Hu¨tter
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna BioCenter, A-1030 Vienna, Austria
| | - Jo¨rg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090 Vienna, Austria
- Max Perutz Labs, Vienna BioCenter Campus, Dr.-Bohr-Gasse 9, 1030 Vienna, Austria
- Department of Structural and Computational Biology, University of Vienna, Dr.-Bohr-Gasse 9, 1030, Vienna Austria
- Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria
| | - Kaan Boztug
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Zimmermannplatz 10, A-1090 Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3, A-1090 Vienna, Austria
- St. Anna Children’s Cancer Research Institute (CCRI), Zimmermannplatz 10, A-1090 Vienna, Austria
- St. Anna Children’s Hospital, Kinderspitalgasse 6, A-1090, Vienna, Austria
- Medical University of Vienna, Department of Pediatrics and Adolescent Medicine, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| |
Collapse
|
8
|
Woicik A, Zhang M, Xu H, Mostafavi S, Wang S. Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling. Bioinformatics 2023; 39:i504-i512. [PMID: 37387142 DOI: 10.1093/bioinformatics/btad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION Gemini can be accessed at: https://github.com/MinxZ/Gemini.
Collapse
Affiliation(s)
- Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Mingxin Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
9
|
Wang W, Yuan H, Han J, Liu W. PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery. Comput Struct Biotechnol J 2022; 21:365-377. [PMID: 36582441 PMCID: PMC9791601 DOI: 10.1016/j.csbj.2022.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 12/02/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022] Open
Abstract
Risk gene identification has attracted much attention in the past two decades. Since most genes need to be translated into proteins and cooperate with other proteins to form protein complexes to carry out cellular functions, which significantly extends the functional diversity of individual proteins, revealing the molecular mechanism of cancer from a comprehensive perspective needs to shift from identifying individual risk genes toward identifying risk protein complexes. Here, we embed protein complexes into the regularized learning framework and propose a protein complex-based, group Lasso-logistic model (PCLassoLog) to discover risk protein complexes. Experiments on deep proteomic data of two cancer types show that PCLassoLog yields superior predictive performance on independent datasets. More importantly, PCLassoLog identifies risk protein complexes that not only contain individual risk proteins but also incorporate close partners that synergize with them. Furthermore, selection probabilities are calculated and two other protein complex-based models are proposed to complement PCLassoLog in identifying reliable risk protein complexes. Based on PCLassoLog, a pan-cancer analysis is performed to identify risk protein complexes in 12 cancer types. Finally, PCLassoLog is used to discover risk protein complexes associated with gene mutation. We implement all protein complex-based models as an R package PCLassoReg, which may serve as an effective tool to discover risk protein complexes in various contexts.
Collapse
Affiliation(s)
- Wei Wang
- College of Science, Heilongjiang Institute of Technology, Harbin 150050, China
| | - Haiyan Yuan
- College of Science, Heilongjiang Institute of Technology, Harbin 150050, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China,Corresponding authors.
| | - Wei Liu
- College of Science, Heilongjiang Institute of Technology, Harbin 150050, China,Corresponding authors.
| |
Collapse
|
10
|
Ravindran V, Wagoner J, Athanasiadis P, Den Hartigh AB, Sidorova JM, Ianevski A, Fink SL, Frigessi A, White J, Polyak SJ, Aittokallio T. Discovery of host-directed modulators of virus infection by probing the SARS-CoV-2-host protein-protein interaction network. Brief Bioinform 2022; 23:bbac456. [PMID: 36305426 PMCID: PMC9677461 DOI: 10.1093/bib/bbac456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/05/2022] [Accepted: 09/23/2022] [Indexed: 12/14/2022] Open
Abstract
The ongoing coronavirus disease 2019 (COVID-19) pandemic has highlighted the need to better understand virus-host interactions. We developed a network-based method that expands the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)-host protein interaction network and identifies host targets that modulate viral infection. To disrupt the SARS-CoV-2 interactome, we systematically probed for potent compounds that selectively target the identified host proteins with high expression in cells relevant to COVID-19. We experimentally tested seven chemical inhibitors of the identified host proteins for modulation of SARS-CoV-2 infection in human cells that express ACE2 and TMPRSS2. Inhibition of the epigenetic regulators bromodomain-containing protein 4 (BRD4) and histone deacetylase 2 (HDAC2), along with ubiquitin-specific peptidase (USP10), enhanced SARS-CoV-2 infection. Such proviral effect was observed upon treatment with compounds JQ1, vorinostat, romidepsin and spautin-1, when measured by cytopathic effect and validated by viral RNA assays, suggesting that the host proteins HDAC2, BRD4 and USP10 have antiviral functions. We observed marked differences in antiviral effects across cell lines, which may have consequences for identification of selective modulators of viral infection or potential antiviral therapeutics. While network-based approaches enable systematic identification of host targets and selective compounds that may modulate the SARS-CoV-2 interactome, further developments are warranted to increase their accuracy and cell-context specificity.
Collapse
Affiliation(s)
- Vandana Ravindran
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
| | - Jessica Wagoner
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Paschalis Athanasiadis
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
| | - Andreas B Den Hartigh
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Julia M Sidorova
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Aleksandr Ianevski
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Susan L Fink
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Judith White
- Department of Cell Biology and Department of Microbiology, University of Virginia, Charlottesville, VA, USA
| | - Stephen J Polyak
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Tero Aittokallio
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| |
Collapse
|
11
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [PMID: 36407327 PMCID: PMC9672476 DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 12/09/2024] Open
Abstract
BACKGROUND AND CONTRIBUTION In network biology, molecular functions can be characterized by network-based inference, or "guilt-by-associations." PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process. RESULTS We describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding "non-seed" molecules to the original biomolecular interaction network consisting of "seed" molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree-preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND. CONCLUSION WINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Zongliang Yue
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Robert Welner
- Comprehensive Arthritis, Musculoskeletal, Bone and Autoimmunity Center (CAMBAC), School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
12
|
Chelu A, Williams SG, Keavney BD, Talavera D. Joint analysis of functionally related genes yields further candidates associated with Tetralogy of Fallot. J Hum Genet 2022; 67:613-615. [PMID: 35718831 PMCID: PMC7613636 DOI: 10.1038/s10038-022-01051-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 05/13/2022] [Accepted: 05/16/2022] [Indexed: 11/09/2022]
Abstract
Although several genes involved in the development of Tetralogy of Fallot have been identified, no genetic diagnosis is available for the majority of patients. Low statistical power may have prevented the identification of further causative genes in gene-by-gene survey analyses. Thus, bigger samples and/or novel analytic approaches may be necessary. We studied if a joint analysis of groups of functionally related genes might be a useful alternative approach. Our reanalysis of whole-exome sequencing data identified 12 groups of genes that exceedingly contribute to the burden of Tetralogy of Fallot. Further analysis of those groups showed that genes with high-impact variants tend to interact with each other. Thus, our results strongly suggest that additional candidate genes may be found by studying the protein interaction network of known causative genes. Moreover, our results show that the joint analysis of functionally related genes can be a useful complementary approach to classical single-gene analyses.
Collapse
Affiliation(s)
- Alexandru Chelu
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - Simon G Williams
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - Bernard D Keavney
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.
| |
Collapse
|
13
|
Dursun C, Kwitek AE, Bozdag S. PhenoGeneRanker: Gene and Phenotype Prioritization Using Multiplex Heterogeneous Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2950-2962. [PMID: 34283720 PMCID: PMC9704494 DOI: 10.1109/tcbb.2021.3098278] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Uncovering genotype-phenotype relationships is a fundamental challenge in genomics. Gene prioritization is an important step for this endeavor to make a short manageable list from a list of thousands of genes coming from high-throughput studies. Network propagation methods are promising and state of the art methods for gene prioritization based on the premise that functionally related genes tend to be close to each other in the biological networks. Recently, we introduced PhenoGeneRanker, a network-propagation algorithm for multiplex heterogeneous networks. PhenoGeneRanker allows multi-layer gene and phenotype networks. It also calculates empirical p values of gene and phenotype ranks using random stratified sampling of seeds of genes and phenotypes based on their connectivity degree in the network. In this study, we introduce the PhenoGeneRanker Bioconductor package and its application to multi-omics rat genome datasets to rank hypertension disease-related genes and strains. We showed that PhenoGeneRanker performed better to rank hypertension disease-related genes using multiplex gene networks than aggregated gene networks. We also showed that PhenoGeneRanker performed better to rank hypertension disease-related strains using multiplex phenotype network than single or aggregated phenotype networks. We performed a rigorous hyperparameter analysis and, finally showed that Gene Ontology (GO) enrichment of statistically significant top-ranked genes resulted in hypertension disease-related GO terms.
Collapse
|
14
|
Newaz K, Milenkovic T. Inference of a Dynamic Aging-related Biological Subnetwork via Network Propagation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:974-988. [PMID: 32897864 DOI: 10.1109/tcbb.2020.3022767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene expression (GE)data capture valuable condition-specific information ("condition" can mean a biological process, disease stage, age, patient, etc.)However, GE analyses ignore physical interactions between gene products, i.e., proteins. Because proteins function by interacting with each other, and because biological networks (BNs)capture these interactions, BN analyses are promising. However, current BN data fail to capture condition-specific information. Recently, GE and BN data have been integrated using network propagation (NP)to infer condition-specific BNs. However, existing NP-based studies result in a static condition-specific subnetwork, even though cellular processes are dynamic. A dynamic process of our interest is human aging. We use prominent existing NP methods in a new task of inferring a dynamic rather than static condition-specific (aging-related)subnetwork. Then, we study evolution of network structure with age - we identify proteins whose network positions significantly change with age and predict them as new aging-related candidates. We validate the predictions via e.g., functional enrichment analyses and literature search. Dynamic network inference via NP yields higher prediction quality than the only existing method for inferring a dynamic aging-related BN, which does not use NP. Our data and code are available at https://nd.edu/~cone/dynetinf.
Collapse
|
15
|
Ranganathan Ganakammal S, Huang K, Walkiewicz M, Xirasagar S. Genomics technologies and bioinformatics in allergy and immunology. ALLERGIC AND IMMUNOLOGIC DISEASES 2022:221-260. [DOI: 10.1016/b978-0-323-95061-9.00008-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
16
|
Althagafi A, Alsubaie L, Kathiresan N, Mineta K, Aloraini T, Al Mutairi F, Alfadhel M, Gojobori T, Alfares A, Hoehndorf R. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning. Bioinformatics 2021; 38:1677-1684. [PMID: 34951628 PMCID: PMC8896633 DOI: 10.1093/bioinformatics/btab859] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 12/07/2021] [Accepted: 12/21/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Structural genomic variants account for much of human variability and are involved in several diseases. Structural variants are complex and may affect coding regions of multiple genes, or affect the functions of genomic regions in different ways from single nucleotide variants. Interpreting the phenotypic consequences of structural variants relies on information about gene functions, haploinsufficiency or triplosensitivity and other genomic features. Phenotype-based methods to identifying variants that are involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been applied successfully to single nucleotide variants as well as short insertions and deletions, the complexity of structural variants makes it more challenging to link them to phenotypes. Furthermore, structural variants can affect a large number of coding regions, and phenotype information may not be available for all of them. RESULTS We developed DeepSVP, a computational method to prioritize structural variants involved in genetic diseases by combining genomic and gene functions information. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual cell types and anatomical sites of expression, and systematically relate them to their phenotypic consequences through ontologies and machine learning. DeepSVP significantly improves the success rate of finding causative variants in several benchmarks and can identify novel pathogenic structural variants in consanguineous families. AVAILABILITY AND IMPLEMENTATION https://github.com/bio-ontology-research-group/DeepSVP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia,Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Lamia Alsubaie
- Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,Center for Genetics and Inherited Diseases, Taibah University, Almadinah Almunwarah, Saudi Arabia
| | | | - Katsuhiko Mineta
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Taghrid Aloraini
- Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
| | - Fuad Al Mutairi
- Genetics & Precision Medicine Department, King Abdulaziz Medical City, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
| | - Majid Alfadhel
- Genetics & Precision Medicine Department, King Abdulaziz Medical City, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- KCBRC, Biological and Environmental Science and Engineering Division (BESE), KAUST, Thuwal, Saudi Arabia
| | - Ahmad Alfares
- Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,Department of Pediatrics, College of Medicine, Qassim University, Qassim, Saudi Arabia
| | | |
Collapse
|
17
|
Umlai UKI, Bangarusamy DK, Estivill X, Jithesh PV. Genome sequencing data analysis for rare disease gene discovery. Brief Bioinform 2021; 23:6366880. [PMID: 34498682 DOI: 10.1093/bib/bbab363] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/24/2021] [Accepted: 08/17/2021] [Indexed: 12/14/2022] Open
Abstract
Rare diseases occur in a smaller proportion of the general population, which is variedly defined as less than 200 000 individuals (US) or in less than 1 in 2000 individuals (Europe). Although rare, they collectively make up to approximately 7000 different disorders, with majority having a genetic origin, and affect roughly 300 million people globally. Most of the patients and their families undergo a long and frustrating diagnostic odyssey. However, advances in the field of genomics have started to facilitate the process of diagnosis, though it is hindered by the difficulty in genome data analysis and interpretation. A major impediment in diagnosis is in the understanding of the diverse approaches, tools and datasets available for variant prioritization, the most important step in the analysis of millions of variants to select a few potential variants. Here we present a review of the latest methodological developments and spectrum of tools available for rare disease genetic variant discovery and recommend appropriate data interpretation methods for variant prioritization. We have categorized the resources based on various steps of the variant interpretation workflow, starting from data processing, variant calling, annotation, filtration and finally prioritization, with a special emphasis on the last two steps. The methods discussed here pertain to elucidating the genetic basis of disease in individual patient cases via trio- or family-based analysis of the genome data. We advocate the use of a combination of tools and datasets and to follow multiple iterative approaches to elucidate the potential causative variant.
Collapse
Affiliation(s)
- Umm-Kulthum Ismail Umlai
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| | - Dhinoth Kumar Bangarusamy
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| | - Xavier Estivill
- Quantitative Genomics Laboratories (qGenomics), Barcelona, Catalonia, Spain
| | - Puthen Veettil Jithesh
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| |
Collapse
|
18
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|
19
|
Pirch S, Müller F, Iofinova E, Pazmandi J, Hütter CVR, Chiettini M, Sin C, Boztug K, Podkosova I, Kaufmann H, Menche J. The VRNetzer platform enables interactive network analysis in Virtual Reality. Nat Commun 2021; 12:2432. [PMID: 33893283 PMCID: PMC8065164 DOI: 10.1038/s41467-021-22570-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 03/09/2021] [Indexed: 12/17/2022] Open
Abstract
Networks provide a powerful representation of interacting components within complex systems, making them ideal for visually and analytically exploring big data. However, the size and complexity of many networks render static visualizations on typically-sized paper or screens impractical, resulting in proverbial ‘hairballs’. Here, we introduce a Virtual Reality (VR) platform that overcomes these limitations by facilitating the thorough visual, and interactive, exploration of large networks. Our platform allows maximal customization and extendibility, through the import of custom code for data analysis, integration of external databases, and design of arbitrary user interface elements, among other features. As a proof of concept, we show how our platform can be used to interactively explore genome-scale molecular networks to identify genes associated with rare diseases and understand how they might contribute to disease development. Our platform represents a general purpose, VR-based data exploration platform for large and diverse data types by providing an interface that facilitates the interaction between human intuition and state-of-the-art analysis methods. Data-rich networks can be difficult to interpret beyond a certain size. Here, the authors introduce a platform that uses virtual reality to allow the visual exploration of large networks, while interfacing with data repositories and other analytical methods to improve the interpretation of big data.
Collapse
Affiliation(s)
- Sebastian Pirch
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Felix Müller
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Eugenia Iofinova
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Julia Pazmandi
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
| | - Christiane V R Hütter
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Martin Chiettini
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Celine Sin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Kaan Boztug
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria.,St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria.,St. Anna Children's Hospital, Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria.,Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria
| | - Iana Podkosova
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Hannes Kaufmann
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria. .,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria. .,Faculty of Mathematics, University of Vienna, Vienna, Austria.
| |
Collapse
|
20
|
Romdhane L, Bouhamed H, Ghedira K, Ben Hamda C, Louhichi A, Jmel H, Romdhane S, Charfeddine C, Mokni M, Abdelhak S, Rebai A. The morbid cutaneous anatomy of the human genome revealed by a bioinformatic approach. Genomics 2020; 112:4232-4241. [PMID: 32650097 DOI: 10.1016/j.ygeno.2020.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 03/28/2020] [Accepted: 07/02/2020] [Indexed: 01/05/2023]
Abstract
Computational approaches have been developed to prioritize candidate genes in disease gene identification. They are based on different pieces of evidences associating each gene with the given disease. In this study, 648 genes underlying genodermatoses have been compared to 1808 genes involved in other genetic diseases using a bioinformatic approach. These genes were studied at the structural, evolutionary and functional levels. Results show that genes underlying genodermatoses present longer CDS and have more exons. Significant differences were observed in nucleotide motif and amino-acid compositions. Evolutionary conservation analysis revealed that genodermatoses genes have less paralogs, more orthologs in Mouse and Dog and are less conserved. Functional analysis revealed that genodermatosis genes seem to be involved in immune system and skin layers. The Bayesian network model returned a rate of good classification of around 80%. This computational approach could help investigators working in the field of dermatology by prioritizing positional candidate genes for mutation screening.
Collapse
Affiliation(s)
- Lilia Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; Department of Biology, Faculty of Sciences of Bizerte, Jarzouna, Université Tunis Carthage, Tunis, Tunisia.
| | - Heni Bouhamed
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Cherif Ben Hamda
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR16IPT09), Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Amel Louhichi
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| | - Haifa Jmel
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Safa Romdhane
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Chérine Charfeddine
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia; High Institut of Biotechnology of Sidi Thabet, University of Manouba, BiotechPole of Sidi Thabet, Ariana, Tunisia
| | - Mourad Mokni
- Department of Dermatology, CHU La Rabta Tunis, Tunis, Tunisia; Public health and infection Research Laboratory, La Rabta Hospital, Tunis, Tunisia
| | - Sonia Abdelhak
- Biomedical Genomics and Oncogenetics Laboratory LR11IPT05, LR16IPT05, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Ahmed Rebai
- Molecular and Cellular Screening Process Laboratory, Centre of Biotechnology of Sfax, Sfax, Tunisia
| |
Collapse
|
21
|
Hristov BH, Chazelle B, Singh M. uKIN Combines New and Prior Information with Guided Network Propagation to Accurately Identify Disease Genes. Cell Syst 2020; 10:470-479.e3. [PMID: 32684276 PMCID: PMC7821437 DOI: 10.1016/j.cels.2020.05.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/24/2020] [Accepted: 05/19/2020] [Indexed: 12/23/2022]
Abstract
Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
22
|
Liu L, Shao Z, Lv J, Xu F, Ren S, Jin Q, Yang J, Ma W, Xie H, Zhang D, Chen X. Identification of Early Warning Signals at the Critical Transition Point of Colorectal Cancer Based on Dynamic Network Analysis. Front Bioeng Biotechnol 2020; 8:530. [PMID: 32548109 PMCID: PMC7272579 DOI: 10.3389/fbioe.2020.00530] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/04/2020] [Indexed: 12/22/2022] Open
Abstract
Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Due to the lack of early diagnosis methods and warning signals of CRC and its strong heterogeneity, the determination of accurate treatments for CRC and the identification of specific early warning signals are still urgent problems for researchers. In this study, the expression profiles of cancer tissues and the expression profiles of tumor-adjacent tissues in 28 CRC patients were combined into a human protein–protein interaction (PPI) network to construct a specific network for each patient. A network propagation method was used to obtain a mutant giant cluster (GC) containing more than 90% of the mutation information of one patient. Next, mutation selection rules were applied to the GC to mine the mutation sequence of driver genes in each CRC patient. The mutation sequences from patients with the same type CRC were integrated to obtain the mutation sequences of driver genes of different types of CRC, which provide a reference for the diagnosis of clinical CRC disease progression. Finally, dynamic network analysis was used to mine dynamic network biomarkers (DNBs) in CRC patients. These DNBs were verified by clinical staging data to identify the critical transition point between the pre-disease state and the disease state in tumor progression. Twelve known drug targets were found in the DNBs, and 6 of them have been used as targets for anticancer drugs for clinical treatment. This study provides important information for the prognosis, diagnosis and treatment of CRC, especially for pre-emptive treatments. It is of great significance for reducing the incidence and mortality of CRC.
Collapse
Affiliation(s)
- Lei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhuo Shao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiaxuan Lv
- School of Stomatology, Harbin Medical University, Harbin, China
| | - Fei Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Sibo Ren
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qing Jin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingbo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weifang Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongbo Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Denan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiujie Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
23
|
Hur B, Kang D, Lee S, Moon JH, Lee G, Kim S. Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments. BMC Bioinformatics 2019; 20:667. [PMID: 31881980 PMCID: PMC6941187 DOI: 10.1186/s12859-019-3302-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 12/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. RESULTS We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. CONCLUSIONS We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianet.
Collapse
Affiliation(s)
- Benjamin Hur
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Ji Hwan Moon
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Gung Lee
- National Creative Research Initiatives Center for Adipose Tissue Remodeling, Institute of Molecular Biology and Genetics, Department of Biological Sciences, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea. .,Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea. .,Bioinformatics Institute, Seoul National University, 1 Gwanak-ro, Seoul, Korea.
| |
Collapse
|
24
|
Windels SFL, Malod-Dognin N, Pržulj N. Graphlet Laplacians for topology-function and topology-disease relationships. Bioinformatics 2019; 35:5226-5234. [PMID: 31192358 DOI: 10.1093/bioinformatics/btz455] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 05/08/2019] [Accepted: 06/10/2019] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Laplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be 'adjacent' if they simultaneously touch a given graphlet. RESULTS We utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. AVAILABILITY AND IMPLEMENTATION http://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sam F L Windels
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| | | | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom.,Barcelona Supercomputing Center, Barcelona, 08034, Spain.,ICREA, Pg. Lluís Companys 23, Barcelona, 08010, Spain
| |
Collapse
|
25
|
Janowska-Sejda EI, Lysenko A, Urban M, Rawlings C, Tsoka S, Hammond-Kosack KE. PHI-Nets: A Network Resource for Ascomycete Fungal Pathogens to Annotate and Identify Putative Virulence Interacting Proteins and siRNA Targets. Front Microbiol 2019; 10:2721. [PMID: 31866958 PMCID: PMC6908471 DOI: 10.3389/fmicb.2019.02721] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Accepted: 11/08/2019] [Indexed: 12/28/2022] Open
Abstract
Interactions between proteins underlie all aspects of complex biological mechanisms. Therefore, methodologies based on complex network analyses can facilitate identification of promising candidate genes involved in phenotypes of interest and put this information into appropriate contexts. To facilitate discovery and gain additional insights into globally important pathogenic fungi, we have reconstructed computationally inferred interactomes using an interolog and domain-based approach for 15 diverse Ascomycete fungal species, across nine orders, specifically Aspergillus fumigatus, Bipolaris sorokiniana, Blumeria graminis f. sp. hordei, Botrytis cinerea, Colletotrichum gloeosporioides, Colletotrichum graminicola, Fusarium graminearum, Fusarium oxysporum f. sp. lycopersici, Fusarium verticillioides, Leptosphaeria maculans, Magnaporthe oryzae, Saccharomyces cerevisiae, Sclerotinia sclerotiorum, Verticillium dahliae, and Zymoseptoria tritici. Network cartography analysis was associated with functional patterns of annotated genes linked to the disease-causing ability of each pathogen. In addition, for the best annotated organism, namely F. graminearum, the distribution of annotated genes with respect to network structure was profiled using a random walk with restart algorithm, which suggested possible co-location of virulence-related genes in the protein–protein interaction network. In a second ‘use case’ study involving two networks, namely B. cinerea and F. graminearum, previously identified small silencing plant RNAs were mapped to their targets. The F. graminearum phenotypic network analysis implicates eight B. cinerea targets and 35 F. graminearum predicted interacting proteins as prime candidate virulence genes for further testing. All 15 networks have been made accessible for download at www.phi-base.org providing a rich resource for major crop plant pathogens.
Collapse
Affiliation(s)
- Elzbieta I Janowska-Sejda
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden, United Kingdom.,Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom.,Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London, United Kingdom
| | - Artem Lysenko
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | - Martin Urban
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden, United Kingdom
| | - Chris Rawlings
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London, United Kingdom
| | - Kim E Hammond-Kosack
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden, United Kingdom
| |
Collapse
|
26
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2019; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
27
|
Hyung D, Mallon AM, Kyung DS, Cho SY, Seong JK. TarGo: network based target gene selection system for human disease related mouse models. Lab Anim Res 2019; 35:23. [PMID: 32257911 PMCID: PMC7081697 DOI: 10.1186/s42826-019-0023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022] Open
Abstract
Genetically engineered mouse models are used in high-throughput phenotyping screens to understand genotype-phenotype associations and their relevance to human diseases. However, not all mutant mouse lines with detectable phenotypes are associated with human diseases. Here, we propose the “Target gene selection system for Genetically engineered mouse models” (TarGo). Using a combination of human disease descriptions, network topology, and genotype-phenotype correlations, novel genes that are potentially related to human diseases are suggested. We constructed a gene interaction network using protein-protein interactions, molecular pathways, and co-expression data. Several repositories for human disease signatures were used to obtain information on human disease-related genes. We calculated disease- or phenotype-specific gene ranks using network topology and disease signatures. In conclusion, TarGo provides many novel features for gene function prediction.
Collapse
Affiliation(s)
- Daejin Hyung
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea
| | - Ann-Marie Mallon
- 2MRC Harwell Institute, Mammalian Genetics Unit, Oxfordshire, OX11 0RD UK
| | - Dong Soo Kyung
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| | - Soo Young Cho
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea
| | - Je Kyung Seong
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| |
Collapse
|
28
|
Lin CH, Konecki DM, Liu M, Wilson SJ, Nassar H, Wilkins AD, Gleich DF, Lichtarge O. Multimodal network diffusion predicts future disease-gene-chemical associations. Bioinformatics 2019; 35:1536-1543. [PMID: 30304494 PMCID: PMC6499233 DOI: 10.1093/bioinformatics/bty858] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 09/14/2018] [Accepted: 10/08/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Precision medicine is an emerging field with hopes to improve patient treatment and reduce morbidity and mortality. To these ends, computational approaches have predicted associations among genes, chemicals and diseases. Such efforts, however, were often limited to using just some available association types. This lowers prediction coverage and, since prior evidence shows that integrating heterogeneous data is likely beneficial, it may limit accuracy. Therefore, we systematically tested whether using more association types improves prediction. RESULTS We study multimodal networks linking diseases, genes and chemicals (drugs) by applying three diffusion algorithms and varying information content. Ten-fold cross-validation shows that these networks are internally consistent, both within and across association types. Also, diffusion methods recovered missing edges, even if all the edges from an entire mode of association were removed. This suggests that information is transferable between these association types. As a realistic validation, time-stamped experiments simulated the predictions of future associations based solely on information known prior to a given date. The results show that many future published results are predictable from current associations. Moreover, in most cases, using more association types increases prediction coverage without significantly decreasing sensitivity and specificity. In case studies, literature-supported validation shows that these predictions mimic human-formulated hypotheses. Overall, this study suggests that diffusion over a more comprehensive multimodal network will generate more useful hypotheses of associations among diseases, genes and chemicals, which may guide the development of precision therapies. AVAILABILITY AND IMPLEMENTATION Code and data are available at https://github.com/LichtargeLab/multimodal-network-diffusion. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chih-Hsu Lin
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Daniel M Konecki
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Meng Liu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Stephen J Wilson
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
| | - Huda Nassar
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Angela D Wilkins
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Olivier Lichtarge
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
29
|
Abstract
Computational prediction of the clinical success or failure of a potential drug target for therapeutic use is a challenging problem. Novel network propagation algorithms that integrate heterogeneous biological networks are proving useful for drug target identification and prioritization. These approaches typically utilize a network describing relationships between targets, a method to disseminate the relevant information through the network, and a method to elucidate new associations between targets and diseases. Here, we utilize one such network propagation-based approach, DTINet, which starts with diffusion component analysis of networks of both potential drug targets and diseases. Then an inductive matrix completion algorithm is applied to identify novel disease targets based on their network topological similarities with known disease targets with successfully launched drugs. DTINet performed well as assessed with area under the precision-recall curve (AUPR = 0.88 ± 0.007) and area under the receiver operating characteristic curve (AUROC = 0.86 ± 0.008). These metrics improved when we combined data from multiple networks in the target space but reduced significantly when we used a more conservative method to define negative controls (AUPR = 0.56 ± 0.007, AUROC = 0.57 ± 0.007). We are optimistic that integration of more relevant and cleaner datasets and networks, careful calibration of model parameters, as well as algorithmic improvements will improve prediction accuracy. However, we also recognize that predicting drug targets that are likely to be successful is an extremely challenging problem due to its complex nature and sparsity of known disease targets.
Collapse
|
30
|
Integrating Multiple Interaction Networks for Gene Function Inference. Molecules 2018; 24:molecules24010030. [PMID: 30577643 PMCID: PMC6337127 DOI: 10.3390/molecules24010030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 01/17/2023] Open
Abstract
In the past few decades, the number and variety of genomic and proteomic data available have increased dramatically. Molecular or functional interaction networks are usually constructed according to high-throughput data and the topological structure of these interaction networks provide a wealth of information for inferring the function of genes or proteins. It is a widely used way to mine functional information of genes or proteins by analyzing the association networks. However, it remains still an urgent but unresolved challenge how to combine multiple heterogeneous networks to achieve more accurate predictions. In this paper, we present a method named ReprsentConcat to improve function inference by integrating multiple interaction networks. The low-dimensional representation of each node in each network is extracted, then these representations from multiple networks are concatenated and fed to gcForest, which augment feature vectors by cascading and automatically determines the number of cascade levels. We experimentally compare ReprsentConcat with a state-of-the-art method, showing that it achieves competitive results on the datasets of yeast and human. Moreover, it is robust to the hyperparameters including the number of dimensions.
Collapse
|
31
|
Arachchi H, Wojcik MH, Weisburd B, Jacobsen JOB, Valkanas E, Baxter S, Byrne AB, O'Donnell-Luria AH, Haendel M, Smedley D, MacArthur DG, Philippakis AA, Rehm HL. matchbox: An open-source tool for patient matching via the Matchmaker Exchange. Hum Mutat 2018; 39:1827-1834. [PMID: 30240502 DOI: 10.1002/humu.23655] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/11/2022]
Abstract
Rare disease investigators constantly face challenges in identifying additional cases to build evidence for gene-disease causality. The Matchmaker Exchange (MME) addresses this limitation by providing a mechanism for matching patients across genomic centers via a federated network. The MME has revolutionized searching for additional cases by making it possible to query across institutional boundaries, so that what was once a laborious and manual process of contacting researchers is now automated and computable. However, while the MME network is beginning to scale, the growth of additional nodes is limited by the lack of easy-to-use solutions that can be implemented by any rare disease database owner, even one without significant software engineering resources. Here, we describe matchbox, which is an open-source, platform-independent, portable bridge between any given rare disease genomic center and the MME network, which has already led to novel gene discoveries. We also describe how matchbox greatly reduces the barrier to participation by overcoming challenges for new databases to join the MME.
Collapse
Affiliation(s)
- Harindra Arachchi
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Monica H Wojcik
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Benjamin Weisburd
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK
| | - Elise Valkanas
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Samantha Baxter
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Alicia B Byrne
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Department of Genetics and Molecular Pathology, Centre for Cancer Biology, SA Pathology, Adelaide, Australia.,School of Pharmacy and Medical Sciences, University of South Australia, Adelaide, Australia
| | - Anne H O'Donnell-Luria
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Melissa Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, Oregon.,Linus Pauling Institute, Oregon State University, Corvallis, Oregon
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK
| | - Daniel G MacArthur
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Heidi L Rehm
- Center for Mendelian Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| |
Collapse
|
32
|
Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 2018; 35:497-505. [DOI: 10.1093/bioinformatics/bty637] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Affiliation(s)
- Alberto Valdeolivas
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
- ProGeLife, Marseille
| | - Laurent Tichit
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Claire Navarro
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Sophie Perrin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Gaëlle Odelin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Nicolas Levy
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Pierre Cau
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Elisabeth Remy
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| |
Collapse
|
33
|
Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 2018; 103:58-73. [PMID: 29961570 DOI: 10.1016/j.ajhg.2018.05.010] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 05/24/2018] [Indexed: 01/17/2023] Open
Abstract
Integration of detailed phenotype information with genetic data is well established to facilitate accurate diagnosis of hereditary disorders. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from heterogeneous EHR narratives remains a challenge. Here, we present EHR-Phenolyzer, a high-throughput EHR framework for extracting and analyzing phenotypes. EHR-Phenolyzer extracts and normalizes Human Phenotype Ontology (HPO) concepts from EHR narratives and then prioritizes genes with causal variants on the basis of the HPO-coded phenotype manifestations. We assessed EHR-Phenolyzer on 28 pediatric individuals with confirmed diagnoses of monogenic diseases and found that the genes with causal variants were ranked among the top 100 genes selected by EHR-Phenolyzer for 16/28 individuals (p < 2.2 × 10-16), supporting the value of phenotype-driven gene prioritization in diagnostic sequence interpretation. To assess the generalizability, we replicated this finding on an independent EHR dataset of ten individuals with a positive diagnosis from a different institution. We then assessed the broader utility by examining two additional EHR datasets, including 31 individuals who were suspected of having a Mendelian disease and underwent different types of genetic testing and 20 individuals with positive diagnoses of specific Mendelian etiologies of chronic kidney disease from exome sequencing. Finally, through several retrospective case studies, we demonstrated how combined analyses of genotype data and deep phenotype data from EHRs can expedite genetic diagnoses. In summary, EHR-Phenolyzer leverages EHR narratives to automate phenotype-driven analysis of clinical exomes or genomes, facilitating the broader implementation of genomic medicine.
Collapse
|
34
|
Lin JR, Zhang Q, Cai Y, Morrow BE, Zhang ZD. Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 2017; 13:e1007142. [PMID: 29281626 PMCID: PMC5760082 DOI: 10.1371/journal.pgen.1007142] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 01/09/2018] [Accepted: 12/01/2017] [Indexed: 12/17/2022] Open
Abstract
Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone. Case-control sequencing studies are a promising design to uncover risk genes of human complex diseases implicated by rare variants. The recent development of different types of rare variant association tests has improved the statistical power to identify disease genes that harbor risk rare variants. However, none of the recent sequencing-based genome-wide association studies identified robust disease association of rare variants or genes based on them. Due to limited sample sizes that can be feasibly achieved in real applications, current rare variant association tests can only generate marginal association signals for most risk genes. Here we proposed an integrated method that combined association signals with orthogonal biological evidence to uncover risk genes in sequencing studies. Designed to address the lack-of-power issue, our method was shown to effectively uncover risk genes with marginal association signals in data simulation. Indeed, in a real application demonstrated in our case study our method disclosed important risk genes of congenital heart disease in 22q11.2 deletion syndrome that were missed by the previous study.
Collapse
Affiliation(s)
- Jhih-Rong Lin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Quanwei Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Ying Cai
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Zhengdong D Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| |
Collapse
|
35
|
Pengelly RJ, Alom T, Zhang Z, Hunt D, Ennis S, Collins A. Evaluating phenotype-driven approaches for genetic diagnoses from exomes in a clinical setting. Sci Rep 2017; 7:13509. [PMID: 29044180 PMCID: PMC5647373 DOI: 10.1038/s41598-017-13841-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 10/02/2017] [Indexed: 12/27/2022] Open
Abstract
Next generation sequencing is transforming clinical medicine and genome research, providing a powerful route to establishing molecular diagnoses for genetic conditions; however, challenges remain given the volume and complexity of genetic variation. A number of methods integrate patient phenotype and genotypic data to prioritise variants as potentially causal. Some methods have a clinical focus while others are more research-oriented. With clinical applications in mind we compare results from alternative methods using 21 exomes for which the disease causal variant has been previously established through traditional clinical evaluation. In this case series we find that the PhenIX program is the most effective, ranking the true causal variant at between 1 and 10 in 85% of these cases. This is a significantly higher proportion than the combined results from five alternative methods tested (p = 0.003). The next best method is Exomiser (hiPHIVE), in which the causal variant is ranked 1–10 in 25% of cases. The widely different targets of these methods (more clinical focus, considering known Mendelian genes, in PhenIX, versus gene discovery in Exomiser) is perhaps not fully appreciated but may impact strongly on their utility for molecular diagnosis using clinical exome data.
Collapse
Affiliation(s)
- Reuben J Pengelly
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Duthie Building, Mailpoint 808, Tremona Road, Southampton, SO16 6YD, UK.
| | - Thahmina Alom
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Duthie Building, Mailpoint 808, Tremona Road, Southampton, SO16 6YD, UK
| | - Zijian Zhang
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Duthie Building, Mailpoint 808, Tremona Road, Southampton, SO16 6YD, UK
| | - David Hunt
- Wessex Clinical Genetics Service, Level G, Mailpoint 105, Princess Anne Hospital, Coxford Road, Southampton, SO16 5YA, UK
| | - Sarah Ennis
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Duthie Building, Mailpoint 808, Tremona Road, Southampton, SO16 6YD, UK
| | - Andrew Collins
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Duthie Building, Mailpoint 808, Tremona Road, Southampton, SO16 6YD, UK
| |
Collapse
|
36
|
Gu C, Liao B, Li X, Cai L, Li Z, Li K, Yang J. Global network random walk for predicting potential human lncRNA-disease associations. Sci Rep 2017; 7:12442. [PMID: 28963512 PMCID: PMC5622075 DOI: 10.1038/s41598-017-12763-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 09/14/2017] [Indexed: 12/13/2022] Open
Abstract
There is more and more evidence that the mutation and dysregulation of long non-coding RNA (lncRNA) are associated with numerous diseases, including cancers. However, experimental methods to identify associations between lncRNAs and diseases are expensive and time-consuming. Effective computational approaches to identify disease-related lncRNAs are in high demand; and would benefit the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In light of some limitations of existing computational methods, we developed a global network random walk model for predicting lncRNA-disease associations (GrwLDA) to reveal the potential associations between lncRNAs and diseases. GrwLDA is a universal network-based method and does not require negative samples. This method can be applied to a disease with no known associated lncRNA (isolated disease) and to lncRNA with no known associated disease (novel lncRNA). The leave-one-out cross validation (LOOCV) method was implemented to evaluate the predicted performance of GrwLDA. As a result, GrwLDA obtained reliable AUCs of 0.9449, 0.8562, and 0.8374 for overall, novel lncRNA and isolated disease prediction, respectively, significantly outperforming previous methods. Case studies of colon, gastric, and kidney cancers were also implemented, and the top 5 disease-lncRNA associations were reported for each disease. Interestingly, 13 (out of the 15) associations were confirmed by literature mining.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Zejun Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.,School of Computer and Information Science, Hunan Institute of Technology, Hengyang, 412002, China
| | - Keqin Li
- Department of Computer Science, State University of New York, New Paltz, New York, 12561, USA
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, USA
| |
Collapse
|
37
|
Isik Z, Ercan ME. Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients. Comput Biol Med 2017; 89:397-404. [PMID: 28869900 DOI: 10.1016/j.compbiomed.2017.08.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Revised: 08/20/2017] [Accepted: 08/25/2017] [Indexed: 10/19/2022]
Abstract
Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future.
Collapse
Affiliation(s)
- Zerrin Isik
- Computer Engineering Department, Dokuz Eylul Universitesi, 35160, Izmir, Turkey.
| | - Muserref Ece Ercan
- Computer Engineering Department, Dokuz Eylul Universitesi, 35160, Izmir, Turkey
| |
Collapse
|
38
|
Lysenko A, Boroevich KA, Tsunoda T. Arete - candidate gene prioritization using biological network topology with additional evidence types. BioData Min 2017; 10:22. [PMID: 28694847 PMCID: PMC5501438 DOI: 10.1186/s13040-017-0141-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 06/12/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of specific causal genes. Given the qualitative and semantic complexity of biological data, successfully addressing this challenge requires development of flexible and interoperable solutions for making the best possible use of the largest possible fraction of all available data. RESULTS We have developed an easily accessible framework that links two established network-based gene prioritization approaches with a supporting isolation forest-based integrative ranking method. The defining feature of the method is that both topological information of the biological networks and additional sources of evidence can be considered at the same time. The implementation was realized as an app extension for the Cytoscape graph analysis suite, and therefore can further benefit from the synergy with other analysis methods available as part of this system. CONCLUSIONS We provide efficient reference implementations of two popular gene prioritization algorithms - DIAMOnD and random walk with restart for the Cytoscape system. An extension of those methods was also developed that allows outputs of these algorithms to be combined with additional data. To demonstrate the utility of our software, we present two example disease gene prioritization application cases and show how our tool can be used to evaluate these different approaches.
Collapse
Affiliation(s)
- Artem Lysenko
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, 230-0045 Japan
| | - Keith Anthony Boroevich
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, 230-0045 Japan
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, 230-0045 Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510 Japan.,CREST, JST, Tokyo, 113-8510 Japan
| |
Collapse
|
39
|
Abstract
Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.
Collapse
|
40
|
Caldera M, Buphamalai P, Müller F, Menche J. Interactome-based approaches to human disease. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.coisb.2017.04.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
41
|
Requena T, Gallego-Martinez A, Lopez-Escamez JA. A pipeline combining multiple strategies for prioritizing heterozygous variants for the identification of candidate genes in exome datasets. Hum Genomics 2017; 11:11. [PMID: 28532469 PMCID: PMC5441048 DOI: 10.1186/s40246-017-0107-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 05/11/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The identification of disease-causing variants in autosomal dominant diseases using exome-sequencing data remains a difficult task in small pedigrees. We combined several strategies to improve filtering and prioritizing of heterozygous variants using exome-sequencing datasets in familial Meniere disease: an in-house Pathogenic Variant (PAVAR) score, the Variant Annotation Analysis and Search Tool (VAAST-Phevor), Exomiser-v2, CADD, and FATHMM. We also validated the method by a benchmarking procedure including causal mutations in synthetic exome datasets. RESULTS PAVAR and VAAST were able to select the same sets of candidate variants independently of the studied disease. In contrast, Exomiser V2 and VAAST-Phevor had a variable correlation depending on the phenotypic information available for the disease on each family. Nevertheless, all the selected diseases ranked a limited number of concordant variants in the top 10 ranking, using the three systems or other combined algorithm such as CADD or FATHMM. Benchmarking analyses confirmed that the combination of systems with different approaches improves the prediction of candidate variants compared with the use of a single method. The overall efficiency of combined tools ranges between 68 and 71% in the top 10 ranked variants. CONCLUSIONS Our pipeline prioritizes a short list of heterozygous variants in exome datasets based on the top 10 concordant variants combining multiple systems.
Collapse
Affiliation(s)
- Teresa Requena
- Otology & Neurotology Group CTS495, Department of Genomic Medicine, GENYO - Centre for Genomics and Oncological Research – Pfizer/University of Granada/Junta de Andalucía, PTS, 18016 Granada, Spain
| | - Alvaro Gallego-Martinez
- Otology & Neurotology Group CTS495, Department of Genomic Medicine, GENYO - Centre for Genomics and Oncological Research – Pfizer/University of Granada/Junta de Andalucía, PTS, 18016 Granada, Spain
| | - Jose A. Lopez-Escamez
- Otology & Neurotology Group CTS495, Department of Genomic Medicine, GENYO - Centre for Genomics and Oncological Research – Pfizer/University of Granada/Junta de Andalucía, PTS, 18016 Granada, Spain
- Department of Otolaryngology, Complejo Hospitalario Universidad de Granada (CHUGRA), ibs.granada, 18014 Granada, Spain
| |
Collapse
|
42
|
Molecular genetic analysis of consanguineous families with primary microcephaly identified pathogenic variants in the ASPM gene. J Genet 2017; 96:383-387. [DOI: 10.1007/s12041-017-0759-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
43
|
Köhler S, Robinson PN. [Diagnostics in human genetics : Integration of phenotypic and genomic data]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2017; 60:542-549. [PMID: 28293716 DOI: 10.1007/s00103-017-2538-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The development of reliable methods for annotation of clinical phenotypes and algorithms to calculate similarity values for clinical phenotype profiles will be a major challenge for genomic personalized medicine, since combined analysis of phenotypic features and genetic variants can increase diagnostic yield, especially with exome or genome sequencing. The Human Phenotype Ontology project (HPO; www.human-phenotype-ontology.org ) provides an ontology for capturing phenotypic abnormalities in human disease in a precise and comprehensive fashion. The HPO not only enables reliable integration of disease-relevant information from numerous databases, but it also allows for similarity between patients or between patients and disease descriptions to be calculated algorithmically. The HPO thereby represents a solid foundation for differential diagnostic applications as well as for translational research and prioritization of novel disease genes in exome or genome sequencing projects.
Collapse
Affiliation(s)
- Sebastian Köhler
- NeuroCure Cluster of Excellence, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Deutschland.
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, 06032, Farmington, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, USA
| |
Collapse
|
44
|
Godard P, Page M. PCAN: phenotype consensus analysis to support disease-gene association. BMC Bioinformatics 2016; 17:518. [PMID: 27923364 PMCID: PMC5142268 DOI: 10.1186/s12859-016-1401-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 12/01/2016] [Indexed: 11/12/2022] Open
Abstract
Background Bridging genotype and phenotype is a fundamental biomedical challenge that underlies more effective target discovery and patient-tailored therapy. Approaches that can flexibly and intuitively, integrate known gene-phenotype associations in the context of molecular signaling networks are vital to effectively prioritize and biologically interpret genes underlying disease traits of interest. Results We describe Phenotype Consensus Analysis (PCAN); a method to assess the consensus semantic similarity of phenotypes in a candidate gene’s signaling neighborhood. We demonstrate that significant phenotype consensus (p < 0.05) is observable for ~67% of 4,549 OMIM disease-gene associations, using a combination of high quality String interactions + Metabase pathways and use Joubert Syndrome to demonstrate the ease with which a significant result can be interrogated to highlight discriminatory traits linked to mechanistically related genes. Conclusions We advocate phenotype consensus as an intuitive and versatile method to aid disease-gene association, which naturally lends itself to the mechanistic deconvolution of diverse phenotypes. We provide PCAN to the community as an R package (http://bioconductor.org/packages/PCAN/) to allow flexible configuration, extension and standalone use or integration to supplement existing gene prioritization workflows. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1401-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrice Godard
- Clarivate Analytics (formerly the IP & Science business of Thomson Reuters), 5901 Priestly Dr., #200, Carlsbad, CA, 92008, USA
| | - Matthew Page
- Translational Bioinformatics, UCB Pharma, 208 Bath Road, Slough, SL1 3WE, UK.
| |
Collapse
|
45
|
Valkanas E, Schaffer K, Dunham C, Maduro V, du Souich C, Rupps R, Adams DR, Baradaran-Heravi A, Flynn E, Malicdan MC, Gahl WA, Toro C, Boerkoel CF. Phenotypic evolution of UNC80 loss of function. Am J Med Genet A 2016; 170:3106-3114. [PMID: 27513830 DOI: 10.1002/ajmg.a.37929] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/03/2016] [Indexed: 12/27/2022]
Abstract
Failure to thrive arises as a complication of a heterogeneous group of disorders. We describe two female siblings with spastic paraplegia and global developmental delay but also, atypically for the HSPs, poor weight gain classified as failure to thrive. After extensive clinical and biochemical investigations failed to identify the etiology, we used exome sequencing to identify biallelic UNC80 mutations (NM_032504.1:c.[3983-3_3994delinsA];[2431C>T]. The paternally inherited NM_032504.1:c.3983-3_3994delinsA is predicted to encode p.Ser1328Argfs*19 and the maternally inherited NM_032504.1:c.2431C>T is predicted to encode p.Arg811*. No UNC80 mRNA was detectable in patient cultured skin fibroblasts, suggesting UNC80 loss of function by nonsense mediated mRNA decay. Further supporting the UNC80 mutations as causative of these siblings' disorder, biallelic mutations in UNC80 have recently been described among individuals with an overlapping phenotype. This report expands the disease spectrum associated with UNC80 mutations. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Elise Valkanas
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - Katherine Schaffer
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - Christopher Dunham
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Valerie Maduro
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - Christèle du Souich
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.,Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Rosemarie Rupps
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.,Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - David R Adams
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - Alireza Baradaran-Heravi
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.,Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Elise Flynn
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - May C Malicdan
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - William A Gahl
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland.,NHGRI, National Institutes of Health, Bethesda, Maryland
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland
| | - Cornelius F Boerkoel
- NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, National Institutes of Health, Bethesda, Maryland.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.,Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
46
|
Bone WP, Washington NL, Buske OJ, Adams DR, Davis J, Draper D, Flynn ED, Girdea M, Godfrey R, Golas G, Groden C, Jacobsen J, Köhler S, Lee EMJ, Links AE, Markello TC, Mungall CJ, Nehrebecky M, Robinson PN, Sincan M, Soldatos AG, Tifft CJ, Toro C, Trang H, Valkanas E, Vasilevsky N, Wahl C, Wolfe LA, Boerkoel CF, Brudno M, Haendel MA, Gahl WA, Smedley D. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med 2016; 18:608-17. [PMID: 26562225 PMCID: PMC4916229 DOI: 10.1038/gim.2015.137] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 08/27/2015] [Indexed: 01/18/2023] Open
Abstract
PURPOSE Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles. METHODS Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease-gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein-protein association neighbors. RESULTS Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease-gene associations and ranked the correct seeded variant in up to 87% when detectable disease-gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation. CONCLUSION Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.Genet Med 18 6, 608-617.
Collapse
Affiliation(s)
- William P. Bone
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Nicole L. Washington
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Orion J. Buske
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - David R. Adams
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Joie Davis
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - David Draper
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Elise D. Flynn
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Marta Girdea
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Rena Godfrey
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Gretchen Golas
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Catherine Groden
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Julius Jacobsen
- Skarnes Faculty group, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Elizabeth M. J. Lee
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Amanda E. Links
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Thomas C. Markello
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Michele Nehrebecky
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Murat Sincan
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Ariane G. Soldatos
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Cynthia J. Tifft
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Camilo Toro
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Heather Trang
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Elise Valkanas
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Nicole Vasilevsky
- Library; and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Colleen Wahl
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Lynne A. Wolfe
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Cornelius F. Boerkoel
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
| | - Michael Brudno
- Centre for Computational Medicine Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Melissa A. Haendel
- Library; and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - William A. Gahl
- Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institutes of Health, Bethesda, Maryland, USA
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Damian Smedley
- Skarnes Faculty group, Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
47
|
Mining for genes related to choroidal neovascularization based on the shortest path algorithm and protein interaction information. Biochim Biophys Acta Gen Subj 2016; 1860:2740-9. [PMID: 26987808 DOI: 10.1016/j.bbagen.2016.03.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/05/2016] [Accepted: 03/10/2016] [Indexed: 12/24/2022]
Abstract
BACKGROUND Choroidal neovascularization (CNV) is a serious eye disease that may cause visual loss, especially for older people. Many factors have been proven to induce this disease including age, gender, obesity, and so on. However, until now, we have had limited knowledge on CNV's pathogenic mechanism. Discovering the genes that underlie this disease and performing extensive studies on them can help us to understand how CNV occurs and design effective treatments. METHODS In this study, we designed a computational method to identify novel CNV-related genes in a large protein network constructed using the protein-protein interaction information in STRING. The candidate genes were first extracted from the shortest paths connecting any two known CNV-related genes and then filtered by a permutation test and using knowledge of their linkages to known CNV-related genes. RESULTS A list of putative CNV-related candidate genes was accessed by our method. These genes are deemed to have strong relationships with CNV. CONCLUSIONS Extensive analyses of several of the putative genes such as ANK1, ITGA4, CD44 and others indicate that they are related to specific biological processes involved in CNV, implying they may be novel CNV-related genes. GENERAL SIGNIFICANCE The newfound putative CNV-related genes may provide new insights into CNV and help design more effective treatments. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Collapse
|
48
|
Chen L, Zhang YH, Huang T, Cai YD. Identifying novel protein phenotype annotations by hybridizing protein-protein interactions and protein sequence similarities. Mol Genet Genomics 2016; 291:913-34. [PMID: 26728152 DOI: 10.1007/s00438-015-1157-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 12/08/2015] [Indexed: 01/18/2023]
Abstract
Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein-protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China. .,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
49
|
Snider J, Kotlyar M, Saraon P, Yao Z, Jurisica I, Stagljar I. Fundamentals of protein interaction network mapping. Mol Syst Biol 2015; 11:848. [PMID: 26681426 PMCID: PMC4704491 DOI: 10.15252/msb.20156351] [Citation(s) in RCA: 201] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Studying protein interaction networks of all proteins in an organism (“interactomes”) remains one of the major challenges in modern biomedicine. Such information is crucial to understanding cellular pathways and developing effective therapies for the treatment of human diseases. Over the past two decades, diverse biochemical, genetic, and cell biological methods have been developed to map interactomes. In this review, we highlight basic principles of interactome mapping. Specifically, we discuss the strengths and weaknesses of individual assays, how to select a method appropriate for the problem being studied, and provide general guidelines for carrying out the necessary follow‐up analyses. In addition, we discuss computational methods to predict, map, and visualize interactomes, and provide a summary of some of the most important interactome resources. We hope that this review serves as both a useful overview of the field and a guide to help more scientists actively employ these powerful approaches in their research.
Collapse
Affiliation(s)
- Jamie Snider
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Max Kotlyar
- Princess Margaret Cancer Center, IBM Life Sciences Discovery Centre, University Health Network, Ontario, Canada
| | - Punit Saraon
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Zhong Yao
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Igor Jurisica
- Princess Margaret Cancer Center, IBM Life Sciences Discovery Centre, University Health Network, Ontario, Canada
| | - Igor Stagljar
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
50
|
Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 2015; 10:2004-15. [PMID: 26562621 DOI: 10.1038/nprot.2015.124] [Citation(s) in RCA: 247] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
Collapse
Affiliation(s)
- Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Marten Jäger
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Manuel Holtgrewe
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany
| | - Max Schubach
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Enrico Siragusa
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute for Health, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.,Labor Berlin - Charité Vivantes, Humangenetik, Berlin, Germany
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicole L Washington
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - William P Bone
- The National Institutes of Health (NIH) Undiagnosed Diseases Program, Common Fund, Office of the Director, NIH, Bethesda, Maryland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health &Science University, Portland, Oregon, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|