51
|
Huang XT, Jia S, Gao L, Wu J. Reconstruction of human protein-coding gene functional association network based on machine learning. Brief Bioinform 2022; 23:6502555. [PMID: 35021191 DOI: 10.1093/bib/bbab552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/13/2021] [Accepted: 12/02/2021] [Indexed: 01/02/2023] Open
Abstract
Networks consisting of molecular interactions are intrinsically dynamical systems of an organism. These interactions curated in molecular interaction databases are still not complete and contain false positives introduced by high-throughput screening experiments. In this study, we propose a framework to integrate interactions of functional associated protein-coding genes from 31 data sources to reconstruct a network with high coverage and quality. For each interaction, 369 features were constructed including properties of both the interaction and the involved genes. The training and validation sets were built on the pathway interactions as positives and the potential negative instances resulting from our proposed semi-supervised strategy. Random forest classification method was then applied to train and predict multiple times to give a score for each interaction. After setting a threshold estimated by a Binomial distribution, a Human protein-coding Gene Functional Association Network (HuGFAN) was reconstructed with 20 383 genes and 1185 429 high confidence interactions. Then, HuGFAN was compared with other networks from data sources with respect to network properties, suggesting that HuGFAN is more function and pathway related. Finally, HuGFAN was applied to identify cancer driver through two famous network-based methods (DriverNet and HotNet2) to show its outstanding performance compared with other networks. HuGFAN and other supplementary files are freely available at https://github.com/xthuang226/HuGFAN.
Collapse
Affiliation(s)
- Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Jing Wu
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, 523808, Guangdong, China
| |
Collapse
|
52
|
Wang S, Li J, Wang Y. M2PP: a novel computational model for predicting drug-targeted pathogenic proteins. BMC Bioinformatics 2022; 23:7. [PMID: 34983358 PMCID: PMC8728953 DOI: 10.1186/s12859-021-04522-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 12/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detecting pathogenic proteins is the origin way to understand the mechanism and resist the invasion of diseases, making pathogenic protein prediction develop into an urgent problem to be solved. Prediction for genome-wide proteins may be not necessarily conducive to rapidly cure diseases as developing new drugs specifically for the predicted pathogenic protein always need major expenditures on time and cost. In order to facilitate disease treatment, computational method to predict pathogenic proteins which are targeted by existing drugs should be exploited. RESULTS In this study, we proposed a novel computational model to predict drug-targeted pathogenic proteins, named as M2PP. Three types of features were presented on our constructed heterogeneous network (including target proteins, diseases and drugs), which were based on the neighborhood similarity information, drug-inferred information and path information. Then, a random forest regression model was trained to score unconfirmed target-disease pairs. Five-fold cross-validation experiment was implemented to evaluate model's prediction performance, where M2PP achieved advantageous results compared with other state-of-the-art methods. In addition, M2PP accurately predicted high ranked pathogenic proteins for common diseases with public biomedical literature as supporting evidence, indicating its excellent ability. CONCLUSIONS M2PP is an effective and accurate model to predict drug-targeted pathogenic proteins, which could provide convenience for the future biological researches.
Collapse
Affiliation(s)
- Shiming Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
| |
Collapse
|
53
|
Giri R, Sharma RK. Analysis of protein association networks regulating the neuroactive metabolites production in Lactobacillus species. Enzyme Microb Technol 2021; 154:109978. [PMID: 34968825 DOI: 10.1016/j.enzmictec.2021.109978] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/25/2021] [Accepted: 12/19/2021] [Indexed: 12/13/2022]
Abstract
Human population is intensively suffering from mental disorders and stress. Microbial metabolites may alter the brain activity, which seems to be an effective approach in the treatment of psychological distress. Earlier, microbial neuroactive metabolites such as trimethylamine, imidazolone propionate and taurine have been shown to alter the brain activity. In the present study proteins regulating their production and activity were explored in Lactobacillus species with the help of STRING (11.5) as a bioinformatic tool. Dataset network of urocanate hydratase, glycine radical enzyme and taurine ABC transporter protein (ATP-dependent transporter) have been identified in Lactobacillus nodensis, Lactobacillus vini and Lactobacillus paraplantarum strains. Further, cluster analysis of network resulted with groups of homologous proteins that most likely related to reductive monocarboxylic acid cycle, pyruvate fermentation to acetate IV and L-histidine degradation I pathway. The findings emphasize on the use and evaluation of selected Lactobacillus strains as psychoactive bacteria for the prevention and treatment of certain neurological and neurophysiological conditions.
Collapse
Affiliation(s)
- Rajat Giri
- Department of Biosciences, Manipal University Jaipur, Jaipur 303007, Rajasthan, India
| | - Rakesh Kumar Sharma
- Department of Biosciences, Manipal University Jaipur, Jaipur 303007, Rajasthan, India.
| |
Collapse
|
54
|
Zhang G, Li M, Deng H, Xu X, Liu X, Zhang W. SGNNMD: signed graph neural network for predicting deregulation types of miRNA-disease associations. Brief Bioinform 2021; 23:6455665. [PMID: 34875683 DOI: 10.1093/bib/bbab464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
MiRNAs are a class of small non-coding RNA molecules that play an important role in many biological processes, and determining miRNA-disease associations can benefit drug development and clinical diagnosis. Although great efforts have been made to develop miRNA-disease association prediction methods, few attention has been paid to in-depth classification of miRNA-disease associations, e.g. up/down-regulation of miRNAs in diseases. In this paper, we regard known miRNA-disease associations as a signed bipartite network, which has miRNA nodes, disease nodes and two types of edges representing up/down-regulation of miRNAs in diseases, and propose a signed graph neural network method (SGNNMD) for predicting deregulation types of miRNA-disease associations. SGNNMD extracts subgraphs around miRNA-disease pairs from the signed bipartite network and learns structural features of subgraphs via a labeling algorithm and a neural network, and then combines them with biological features (i.e. miRNA-miRNA functional similarity and disease-disease semantic similarity) to build the prediction model. In the computational experiments, SGNNMD achieves highly competitive performance when compared with several baselines, including the signed graph link prediction methods, multi-relation prediction methods and one existing deregulation type prediction method. Moreover, SGNNMD has good inductive capability and can generalize to miRNAs/diseases unseen during the training.
Collapse
Affiliation(s)
- Guangzhan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Huan Deng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinran Xu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
55
|
Ding Y, Cui M, Qian J, Wang C, Shen Q, Ren H, Li L, Zhang F, Zhang R. Calculation of Similarity Between 26 Autoimmune Diseases Based on Three Measurements Including Network, Function, and Semantics. Front Genet 2021; 12:758041. [PMID: 34858474 PMCID: PMC8632457 DOI: 10.3389/fgene.2021.758041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 10/27/2021] [Indexed: 11/13/2022] Open
Abstract
Autoimmune diseases (ADs) are a broad range of diseases in which the immune response to self-antigens causes damage or disorder of tissues, and the genetic susceptibility is regarded as the key etiology of ADs. Accumulating evidence has suggested that there are certain commonalities among different ADs. However, the theoretical research about similarity between ADs is still limited. In this work, we first computed the genetic similarity between 26 ADs based on three measurements: network similarity (NetSim), functional similarity (FunSim), and semantic similarity (SemSim), and systematically identified three significant pairs of similar ADs: rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE), myasthenia gravis (MG) and autoimmune thyroiditis (AIT), and autoimmune polyendocrinopathies (AP) and uveomeningoencephalitic syndrome (Vogt-Koyanagi-Harada syndrome, VKH). Then we investigated the gene ontology terms and pathways enriched by the three significant AD pairs through functional analysis. By the cluster analysis on the similarity matrix of 26 ADs, we embedded the three significant AD pairs in three different disease clusters respectively, and the ADs of each disease cluster might have high genetic similarity. We also detected the risk genes in common among the ADs which belonged to the same disease cluster. Overall, our findings will provide significant insight in the commonalities of different ADs in genetics, and contribute to the discovery of novel biomarkers and the development of new therapeutic methods for ADs.
Collapse
Affiliation(s)
- Yanjun Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Department of Microbiology, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Mintian Cui
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jun Qian
- Department of Microbiology, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Chao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qi Shen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongbiao Ren
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liangshuang Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Fengmin Zhang
- Department of Microbiology, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Ruijie Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
56
|
A Bioinformatics Approach to Identifying Potential Biomarkers for Cryptosporidium parvum: A Coccidian Parasite Associated with Fetal Diarrhea. Vaccines (Basel) 2021; 9:vaccines9121427. [PMID: 34960172 PMCID: PMC8705633 DOI: 10.3390/vaccines9121427] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/25/2021] [Accepted: 11/27/2021] [Indexed: 01/07/2023] Open
Abstract
Cryptosporidium parvum (C. parvum) is a protozoan parasite known for cryptosporidiosis in pre-weaned calves. Animals and patients with immunosuppression are at risk of developing the disease, which can cause potentially fatal diarrhoea. The present study aimed to construct a network biology framework based on the differentially expressed genes (DEGs) of C. parvum infected subjects. In this way, the gene expression profiling analysis of C. parvum infected individuals can give us a snapshot of actively expressed genes and transcripts under infection conditions. In the present study, we have analyzed microarray data sets and compared the gene expression profiles of the patients with the different data sets of the healthy control. Using a network medicine approach to identify the most influential genes in the gene interaction network, we uncovered essential genes and pathways related to C. parvum infection. We identified 164 differentially expressed genes (109 up- and 54 down-regulated DEGs) and allocated them to pathway and gene set enrichment analysis. The results underpin the identification of seven significant hub genes with high centrality values: ISG15, MX1, IFI44L, STAT1, IFIT1, OAS1, IFIT3, RSAD2, IFITM1, and IFI44. These genes are associated with diverse biological processes not limited to host interaction, type 1 interferon production, or response to IL-gamma. Furthermore, four genes (IFI44, IFIT3, IFITM1, and MX1) were also discovered to be involved in innate immunity, inflammation, apoptosis, phosphorylation, cell proliferation, and cell signaling. In conclusion, these results reinforce the development and implementation of tools based on gene profiles to identify and treat Cryptosporidium parvum-related diseases at an early stage.
Collapse
|
57
|
Burns JJR, Shealy BT, Greer MS, Hadish JA, McGowan MT, Biggs T, Smith MC, Feltus FA, Ficklin SP. Addressing noise in co-expression network construction. Brief Bioinform 2021; 23:6446269. [PMID: 34850822 PMCID: PMC8769892 DOI: 10.1093/bib/bbab495] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Collapse
Affiliation(s)
- Joshua J R Burns
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Benjamin T Shealy
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - Mitchell S Greer
- School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| | - John A Hadish
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Matthew T McGowan
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Tyler Biggs
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Melissa C Smith
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, 130 McGinty Court. Clemson University, Clemson, SC 29634. USA.,Biomedical Data Science & Informatics Program, 100 McAdams Hall. Clemson University, Clemson, SC 29634. USA.,Clemson Center for Human Genetics, 114 Gregor Mendel Circle, Greenwood, SC 29646. USA
| | - Stephen P Ficklin
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA.,School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| |
Collapse
|
58
|
Kotlyar M, Pastrello C, Ahmed Z, Chee J, Varyova Z, Jurisica I. IID 2021: towards context-specific protein interaction analyses by increased coverage, enhanced annotation and enrichment analysis. Nucleic Acids Res 2021; 50:D640-D647. [PMID: 34755877 PMCID: PMC8728267 DOI: 10.1093/nar/gkab1034] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 11/03/2021] [Indexed: 01/02/2023] Open
Abstract
Improved bioassays have significantly increased the rate of identifying new protein-protein interactions (PPIs), and the number of detected human PPIs has greatly exceeded early estimates of human interactome size. These new PPIs provide a more complete view of disease mechanisms but precise understanding of how PPIs affect phenotype remains a challenge. It requires knowledge of PPI context (e.g. tissues, subcellular localizations), and functional roles, especially within pathways and protein complexes. The previous IID release focused on PPI context, providing networks with comprehensive tissue, disease, cellular localization, and druggability annotations. The current update adds developmental stages to the available contexts, and provides a way of assigning context to PPIs that could not be previously annotated due to insufficient data or incompatibility with available context categories (e.g. interactions between membrane and cytoplasmic proteins). This update also annotates PPIs with conservation across species, directionality in pathways, membership in large complexes, interaction stability (i.e. stable or transient), and mutation effects. Enrichment analysis is now available for all annotations, and includes multiple options; for example, context annotations can be analyzed with respect to PPIs or network proteins. In addition to tabular view or download, IID provides online network visualization. This update is available at http://ophid.utoronto.ca/iid.
Collapse
Affiliation(s)
- Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Zuhaib Ahmed
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Justin Chee
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Zofia Varyova
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada.,Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, ON M5S 1A4, Canada.,Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| |
Collapse
|
59
|
Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Res 2021; 50:D632-D639. [PMID: 34747468 PMCID: PMC8728227 DOI: 10.1093/nar/gkab1048] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 10/06/2021] [Accepted: 10/18/2021] [Indexed: 01/02/2023] Open
Abstract
Network medicine has proven useful for dissecting genetic organization of complex human diseases. We have previously published HumanNet, an integrated network of human genes for disease studies. Since the release of the last version of HumanNet, many large-scale protein–protein interaction datasets have accumulated in public depositories. Additionally, the numbers of research papers and functional annotations for gene–phenotype associations have increased significantly. Therefore, updating HumanNet is a timely task for further improvement of network-based research into diseases. Here, we present HumanNet v3 (https://www.inetbio.org/humannet/, covering 99.8% of human protein coding genes) constructed by means of the expanded data with improved network inference algorithms. HumanNet v3 supports a three-tier model: HumanNet-PI (a protein–protein physical interaction network), HumanNet-FN (a functional gene network), and HumanNet-XC (a functional network extended by co-citation). Users can select a suitable tier of HumanNet for their study purpose. We showed that on disease gene predictions, HumanNet v3 outperforms both the previous HumanNet version and other integrated human gene networks. Furthermore, we demonstrated that HumanNet provides a feasible approach for selecting host genes likely to be associated with COVID-19.
Collapse
Affiliation(s)
- Chan Yeong Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Seungbyn Baek
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Junha Cha
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Eiru Kim
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Traver Hart
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
60
|
Mohsen H, Gunasekharan V, Qing T, Seay M, Surovtseva Y, Negahban S, Szallasi Z, Pusztai L, Gerstein MB. Network propagation-based prioritization of long tail genes in 17 cancer types. Genome Biol 2021; 22:287. [PMID: 34620211 PMCID: PMC8496153 DOI: 10.1186/s13059-021-02504-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 09/17/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The diversity of genomic alterations in cancer poses challenges to fully understanding the etiologies of the disease. Recent interest in infrequent mutations, in genes that reside in the "long tail" of the mutational distribution, uncovered new genes with significant implications in cancer development. The study of cancer-relevant genes often requires integrative approaches pooling together multiple types of biological data. Network propagation methods demonstrate high efficacy in achieving this integration. Yet, the majority of these methods focus their assessment on detecting known cancer genes or identifying altered subnetworks. In this paper, we introduce a network propagation approach that entirely focuses on prioritizing long tail genes with potential functional impact on cancer development. RESULTS We identify sets of often overlooked, rarely to moderately mutated genes whose biological interactions significantly propel their mutation-frequency-based rank upwards during propagation in 17 cancer types. We call these sets "upward mobility genes" and hypothesize that their significant rank improvement indicates functional importance. We report new cancer-pathway associations based on upward mobility genes that are not previously identified using driver genes alone, validate their role in cancer cell survival in vitro using extensive genome-wide RNAi and CRISPR data repositories, and further conduct in vitro functional screenings resulting in the validation of 18 previously unreported genes. CONCLUSION Our analysis extends the spectrum of cancer-relevant genes and identifies novel potential therapeutic targets.
Collapse
Affiliation(s)
- Hussein Mohsen
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, 06511, USA.
| | | | - Tao Qing
- Breast Medical Oncology, Yale School of Medicine, New Haven, CT, 06511, USA
| | - Montrell Seay
- Yale Center for Molecular Discovery, Yale University, West Haven, CT, 06516, USA
| | - Yulia Surovtseva
- Yale Center for Molecular Discovery, Yale University, West Haven, CT, 06516, USA
| | - Sahand Negahban
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06511, USA
| | - Zoltan Szallasi
- Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, 02115, USA
| | - Lajos Pusztai
- Breast Medical Oncology, Yale School of Medicine, New Haven, CT, 06511, USA.
| | - Mark B Gerstein
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, 06511, USA.
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06511, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06511, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
61
|
Zheng F, Kelly MR, Ramms DJ, Heintschel ML, Tao K, Tutuncuoglu B, Lee JJ, Ono K, Foussard H, Chen M, Herrington KA, Silva E, Liu S, Chen J, Churas C, Wilson N, Kratz A, Pillich RT, Patel DN, Park J, Kuenzi B, Yu MK, Licon K, Pratt D, Kreisberg JF, Kim M, Swaney DL, Nan X, Fraley SI, Gutkind JS, Krogan NJ, Ideker T. Interpretation of cancer mutations using a multiscale map of protein systems. Science 2021; 374:eabf3067. [PMID: 34591613 PMCID: PMC9126298 DOI: 10.1126/science.abf3067] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Marcus R. Kelly
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dana J. Ramms
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Marissa L. Heintschel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kai Tao
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Beril Tutuncuoglu
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - John J. Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Helene Foussard
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Michael Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Kari A. Herrington
- Department of Biochemistry and Biophysics Center for Advanced Light Microscopy at UCSF, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Erica Silva
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Nicholas Wilson
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Kratz
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Rudolf T. Pillich
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Devin N. Patel
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Brent Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Michael K. Yu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Licon
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F. Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Minkyu Kim
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Danielle L. Swaney
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Xiaolin Nan
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
- Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Stephanie I. Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - J. Silvio Gutkind
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Nevan J. Krogan
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| |
Collapse
|
62
|
Li H, Xiao X, Wu X, Ye L, Ji G. scLINE: A multi-network integration framework based on network embedding for representation of single-cell RNA-seq data. J Biomed Inform 2021; 122:103899. [PMID: 34481921 DOI: 10.1016/j.jbi.2021.103899] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 01/18/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is fast becoming a powerful technology that revolutionizes biomedical studies related to development, immunology and cancer by providing genome-scale transcriptional profiles at unprecedented throughput and resolution. However, due to the low capture rate and frequent drop-out events in the sequencing process, scRNA-seq data suffer from extremely high sparsity and variability, challenging the data analysis. Here we proposed a novel method called scLINE for learning low dimensional representations of scRNA-seq data. scLINE is based on the network embedding model that jointly considers multiple gene-gene interaction networks, facilitating the incorporation of prior biological knowledge for signal extraction. We comprehensively evaluated scLINE on eight single-cell datasets. Results show that scLINE achieved comparable or higher performance than competing methods, including PCA, t-SNE and Isomap, in terms of internal validation metrics and clustering accuracy. The low dimensional representations learned by scLINE are effective for downstream single-cell analysis, such as visualization, clustering and cell typing. We have implemented scLINE as an easy-to-use R package, which can be incorporated in other existing scRNA-seq analysis pipelines or tools for data preprocessing.
Collapse
Affiliation(s)
- Huoyou Li
- School of Mathematics and Information Engineering, Longyan University, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, China.
| | - Lishan Ye
- Xiamen Health and Medical Big Data Center, XiaMen, Fujian, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, China.
| |
Collapse
|
63
|
Pathak GA, Wendt FR, Goswami A, Koller D, De Angelis F, Polimanti R. ACE2 Netlas: In silico Functional Characterization and Drug-Gene Interactions of ACE2 Gene Network to Understand Its Potential Involvement in COVID-19 Susceptibility. Front Genet 2021; 12:698033. [PMID: 34512723 PMCID: PMC8429844 DOI: 10.3389/fgene.2021.698033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/29/2021] [Indexed: 12/15/2022] Open
Abstract
Angiotensin-converting enzyme-2 (ACE2) receptor has been identified as the key adhesion molecule for the transmission of the SARS-CoV-2. However, there is no evidence that human genetic variation in ACE2 is singularly responsible for COVID-19 susceptibility. Therefore, we performed an integrative multi-level characterization of genes that interact with ACE2 (ACE2-gene network) for their statistically enriched biological properties in the context of COVID-19. The phenome-wide association of 51 genes including ACE2 with 4,756 traits categorized into 26 phenotype categories, showed enrichment of immunological, respiratory, environmental, skeletal, dermatological, and metabolic domains (p < 4e-4). Transcriptomic regulation of ACE2-gene network was enriched for tissue-specificity in kidney, small intestine, and colon (p < 4.7e-4). Leveraging the drug-gene interaction database we identified 47 drugs, including dexamethasone and spironolactone, among others. Considering genetic variants within ± 10 kb of ACE2-network genes we identified miRNAs whose binding sites may be altered as a consequence of genetic variation. The identified miRNAs revealed statistical over-representation of inflammation, aging, diabetes, and heart conditions. The genetic variant associations in RORA, SLC12A6, and SLC6A19 genes were observed in genome-wide association study (GWAS) of COVID-19 susceptibility. We also report the GWAS-identified variant in 3p21.31 locus, serves as trans-QTL for RORA and RORC genes. Overall, functional characterization of ACE2-gene network highlights several potential mechanisms in COVID-19 susceptibility. The data can also be accessed at https://gpwhiz.github.io/ACE2Netlas/.
Collapse
Affiliation(s)
- Gita A. Pathak
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| | - Frank R. Wendt
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| | - Aranyak Goswami
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| | - Dora Koller
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| | - Flavio De Angelis
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| | | | - Renato Polimanti
- Division of Human Genetics, Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
- Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| |
Collapse
|
64
|
Gunning M, Pavlidis P. "Guilt by association" is not competitive with genetic association for identifying autism risk genes. Sci Rep 2021; 11:15950. [PMID: 34354131 PMCID: PMC8342445 DOI: 10.1038/s41598-021-95321-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/16/2021] [Indexed: 12/25/2022] Open
Abstract
Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.
Collapse
Affiliation(s)
- Margot Gunning
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
65
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
66
|
Xie X, Kendzior MC, Ge X, Mainzer LS, Sinha S. VarSAn: associating pathways with a set of genomic variants using network analysis. Nucleic Acids Res 2021; 49:8471-8487. [PMID: 34313777 PMCID: PMC8421213 DOI: 10.1093/nar/gkab624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 05/18/2021] [Accepted: 07/20/2021] [Indexed: 02/01/2023] Open
Abstract
There is a pressing need today to mechanistically interpret sets of genomic variants associated with diseases. Here we present a tool called ‘VarSAn’ that uses a network analysis algorithm to identify pathways relevant to a given set of variants. VarSAn analyzes a configurable network whose nodes represent variants, genes and pathways, using a Random Walk with Restarts algorithm to rank pathways for relevance to the given variants, and reports P-values for pathway relevance. It treats non-coding and coding variants differently, properly accounts for the number of pathways impacted by each variant and identifies relevant pathways even if many variants do not directly impact genes of the pathway. We use VarSAn to identify pathways relevant to variants related to cancer and several other diseases, as well as drug response variation. We find VarSAn's pathway ranking to be complementary to the standard approach of enrichment tests on genes related to the query set. We adopt a novel benchmarking strategy to quantify its advantage over this baseline approach. Finally, we use VarSAn to discover key pathways, including the VEGFA-VEGFR2 pathway, related to de novo variants in patients of Hypoplastic Left Heart Syndrome, a rare and severe congenital heart defect.
Collapse
Affiliation(s)
- Xiaoman Xie
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Matthew C Kendzior
- National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Xiyu Ge
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Liudmila S Mainzer
- National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.,Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.,Cancer Center of Illinois, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
67
|
Shu J, Li Y, Wang S, Xi B, Ma J. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics 2021; 37:i410-i417. [PMID: 34252957 PMCID: PMC8275341 DOI: 10.1093/bioinformatics/btab310] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
Collapse
Affiliation(s)
- Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bowei Xi
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
68
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
69
|
Muhammad SA, Qousain Naqvi ST, Nguyen T, Wu X, Munir F, Jamshed MB, Zhang Q. Cisplatin's potential for type 2 diabetes repositioning by inhibiting CDKN1A, FAS, and SESN1. Comput Biol Med 2021; 135:104640. [PMID: 34261004 DOI: 10.1016/j.compbiomed.2021.104640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/06/2021] [Accepted: 07/06/2021] [Indexed: 12/16/2022]
Abstract
Cisplatin is a DNA-damaging chemotherapeutic agent used for treating cancer. Based on cDNA dataset analysis, we investigated how cisplatin modified gene expression and observed cisplatin-induced dysregulation and system-level variations relating to insulin resistance and type 2 diabetes mellitus (T2DM). T2DM is a multifactorial disease affecting 462 million people in the world, and drug-induced T2DM is a serious issue. To understand this etiology, we designed an integrative, system-level study to identify associations between cisplatin-induced differentially expressed genes (DEGs) and T2DM. From a list of differential expressed genes, cisplatin downregulated the cyclin-dependent kinase inhibitor 1 (CDKN1A), tumor necrosis factor (FAS), and sestrin-1 (SESN1) genes responsible for modifying signaling pathways, including the p53, JAK-STAT, FOXO, MAPK, mTOR, P13-AKT, Toll-like receptor (TLR), adipocytokine, and insulin signaling pathways. These enriched pathways were expressively associated with the disease. We observed significant gene signatures, including SMAD3, IRS, PDK1, PRKAA1, AKT, SOS, RAS, GRB2, MEK1/2, and ERK, interacting with source genes. This study revealed the value of system genetics for identifying the cisplatin-induced genetic variants responsible for the progression of T2DM. Also, by cross-validating gene expression data for T2DM islets, we found that downregulating IRS and PRK families is critical in insulin and T2DM signaling pathways. Cisplatin, by inhibiting CDKN1A, FAS, and SESN1, promotes IRS and PRK activity in a similar way to rosiglitazone (a popular drug used for T2DM treatment). Our integrative, network-based approach can help in understanding the drug-induced pathophysiological mechanisms of diabetes.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan.
| | | | - Thanh Nguyen
- Informatics Institute, School of Medicine, The University of Alabama, Birmingham, AL, USA
| | - Xiaogang Wu
- The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Fahad Munir
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China; Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Muhammad Babar Jamshed
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China; Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - QiYu Zhang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
70
|
Genome-wide discovery of hidden genes mediating known drug-disease association using KDDANet. NPJ Genom Med 2021; 6:50. [PMID: 34131148 PMCID: PMC8206141 DOI: 10.1038/s41525-021-00216-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/25/2021] [Indexed: 11/09/2022] Open
Abstract
Many of genes mediating Known Drug-Disease Association (KDDA) are escaped from experimental detection. Identifying of these genes (hidden genes) is of great significance for understanding disease pathogenesis and guiding drug repurposing. Here, we presented a novel computational tool, called KDDANet, for systematic and accurate uncovering the hidden genes mediating KDDA from the perspective of genome-wide functional gene interaction network. KDDANet demonstrated the competitive performances in both sensitivity and specificity of identifying genes in mediating KDDA in comparison to the existing state-of-the-art methods. Case studies on Alzheimer's disease (AD) and obesity uncovered the mechanistic relevance of KDDANet predictions. Furthermore, when applied with multiple types of cancer-omics datasets, KDDANet not only recapitulated known genes mediating KDDAs related to cancer, but also revealed novel candidates that offer new biological insights. Importantly, KDDANet can be used to discover the shared genes mediating multiple KDDAs. KDDANet can be accessed at http://www.kddanet.cn and the code can be freely downloaded at https://github.com/huayu1111/KDDANet .
Collapse
|
71
|
Peng W, Du J, Dai W, Lan W. Predicting miRNA-Disease Association Based on Modularity Preserving Heterogeneous Network Embedding. Front Cell Dev Biol 2021; 9:603758. [PMID: 34178973 PMCID: PMC8223753 DOI: 10.3389/fcell.2021.603758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/23/2021] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are a category of small non-coding RNAs that profoundly impact various biological processes related to human disease. Inferring the potential miRNA-disease associations benefits the study of human diseases, such as disease prevention, disease diagnosis, and drug development. In this work, we propose a novel heterogeneous network embedding-based method called MDN-NMTF (Module-based Dynamic Neighborhood Non-negative Matrix Tri-Factorization) for predicting miRNA-disease associations. MDN-NMTF constructs a heterogeneous network of disease similarity network, miRNA similarity network and a known miRNA-disease association network. After that, it learns the latent vector representation for miRNAs and diseases in the heterogeneous network. Finally, the association probability is computed by the product of the latent miRNA and disease vectors. MDN-NMTF not only successfully integrates diverse biological information of miRNAs and diseases to predict miRNA-disease associations, but also considers the module properties of miRNAs and diseases in the course of learning vector representation, which can maximally preserve the heterogeneous network structural information and the network properties. At the same time, we also extend MDN-NMTF to a new version (called MDN-NMTF2) by using modular information to improve the miRNA-disease association prediction ability. Our methods and the other four existing methods are applied to predict miRNA-disease associations in four databases. The prediction results show that our methods can improve the miRNA-disease association prediction to a high level compared with the four existing methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Jielin Du
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China
| |
Collapse
|
72
|
Picart-Armada S, Thompson WK, Buil A, Perera-Lluna A. The effect of statistical normalization on network propagation scores. Bioinformatics 2021; 37:845-852. [PMID: 33070187 PMCID: PMC8097756 DOI: 10.1093/bioinformatics/btaa896] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/18/2020] [Accepted: 10/07/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. RESULTS Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. AVAILABILITY The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sergio Picart-Armada
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, 08028, Spain.,Esplugues de Llobregat, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Barcelona, 08950, Spain
| | - Wesley K Thompson
- Mental Health Center Sct. Hans, 4000 Roskilde, Denmark.,Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA, USA
| | - Alfonso Buil
- Mental Health Center Sct. Hans, 4000 Roskilde, Denmark
| | - Alexandre Perera-Lluna
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, 08028, Spain.,Esplugues de Llobregat, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Barcelona, 08950, Spain
| |
Collapse
|
73
|
Li Y, Wang K, Wang G. Evaluating Disease Similarity Based on Gene Network Reconstruction and Representation. Bioinformatics 2021; 37:3579-3587. [PMID: 33978702 DOI: 10.1093/bioinformatics/btab252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/01/2021] [Accepted: 04/28/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Quantifying the associations between diseases is of great significance in increasing our understanding of disease biology, improving disease diagnosis, re-positioning, and developing drugs. Therefore, in recent years, the research of disease similarity has received a lot of attention in the field of bioinformatics. Previous work has shown that the combination of the ontology (such as disease ontology and gene ontology) and disease-gene interactions are worthy to be regarded to elucidate diseases and disease associations. However, most of them are either based on the overlap between disease-related gene sets or distance within the ontology's hierarchy. The diseases in these methods are represented by discrete or sparse feature vectors, which cannot grasp the deep semantic information of diseases. Recently, deep representation learning has been widely studied and gradually applied to various fields of bioinformatics. Based on the hypothesis that disease representation depends on its related gene representations, we propose a disease representation model using two most representative gene resources HumanNet and Gene Ontology to construct a new gene network and learn gene (disease) representations. The similarity between two diseases is computed by the cosine similarity of their corresponding representations. RESULTS We propose a novel approach to compute disease similarity, which integrates two important factors disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. Under the same experimental settings, the AUC value of our method is 0.8074, which improves the most competitive baseline method by 10.1%. The quantitative and qualitative experimental results show that our model can learn effective disease representations and improve the accuracy of disease similarity computation significantly. AVAILABILITY The research shows that this method has certain applicability in the prediction of gene-related diseases, the migration of disease treatment methods, drug development, and so on. SUPPLEMENTARY INFORMATION Supplementary data are available at https://github.com/catly/disease_similarity.
Collapse
Affiliation(s)
- Yang Li
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| | - Keqi Wang
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| | - Guohua Wang
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| |
Collapse
|
74
|
Drew K, Wallingford JB, Marcotte EM. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol Syst Biol 2021; 17:e10016. [PMID: 33973408 PMCID: PMC8111494 DOI: 10.15252/msb.202010016] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 12/30/2022] Open
Abstract
A general principle of biology is the self-assembly of proteins into functional complexes. Characterizing their composition is, therefore, required for our understanding of cellular functions. Unfortunately, we lack knowledge of the comprehensive set of identities of protein complexes in human cells. To address this gap, we developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies. We show our resource, hu.MAP 2.0, is more accurate and comprehensive than previous state of the art high-throughput protein complex resources and gives rise to many new hypotheses, including for 274 completely uncharacterized proteins. Further, we identify 253 promiscuous proteins that participate in multiple complexes pointing to possible moonlighting roles. We have made hu.MAP 2.0 easily searchable in a web interface (http://humap2.proteincomplexes.org/), which will be a valuable resource for researchers across a broad range of interests including systems biology, structural biology, and molecular explanations of disease.
Collapse
Affiliation(s)
- Kevin Drew
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
- Present address:
Department of Biological SciencesUniversity of Illinois at ChicagoChicagoILUSA
| | - John B Wallingford
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| | - Edward M Marcotte
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| |
Collapse
|
75
|
Liu M, Liu Y, Wu MC, Hsu L, He Q. A method for subtype analysis with somatic mutations. Bioinformatics 2021; 37:50-56. [PMID: 33416828 PMCID: PMC11394914 DOI: 10.1093/bioinformatics/btaa1090] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/15/2020] [Accepted: 12/22/2020] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Cancer is a highly heterogeneous disease, and virtually all types of cancer have subtypes. Understanding the association between cancer subtypes and genetic variations is fundamental to the development of targeted therapies for patients. Somatic mutation plays important roles in tumor development and has emerged as a new type of genetic variations for studying the association with cancer subtypes. However, the low prevalence of individual mutations poses a tremendous challenge to the related statistical analysis. RESULTS In this article, we propose an approach, subtype analysis with somatic mutations (SASOM), for the association analysis of cancer subtypes with somatic mutations. Our approach tests the association between a set of somatic mutations (from a genetic pathway) and subtypes, while incorporating functional information of the mutations into the analysis. We further propose a robust p-value combination procedure, DAPC, to synthesize statistical significance from different sources. Simulation studies show that the proposed approach has correct type I error and tends to be more powerful than possible alternative methods. In a real data application, we examine the somatic mutations from a cutaneous melanoma dataset, and identify a genetic pathway that is associated with immune-related subtypes. AVAILABILITY AND IMPLEMENTATION The SASOM R package is available at https://github.com/rksyouyou/SASOM-pkg. R scripts and data are available at https://github.com/rksyouyou/SASOM-analysis. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, OH 45435, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
76
|
Yan C, Duan G, Wu FX, Pan Y, Wang J. MCHMDA:Predicting Microbe-Disease Associations Based on Similarities and Low-Rank Matrix Completion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:611-620. [PMID: 31295117 DOI: 10.1109/tcbb.2019.2926716] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With the development of high-through sequencing technology and microbiology, many studies have evidenced that microbes are associated with human diseases, such as obesity, liver cancer, and so on. Therefore, identifying the association between microbes and diseases has become an important study topic in current bioinformatics. The emergence of microbe-disease association database has provided an unprecedented opportunity to develop computational method for predicting microbe-disease associations. In the study, we propose a low-rank matrix completion method (called MCHMDA) to predict microbe-disease associations by integrating similarities of microbes and diseases and known microbe-disease associations into a heterogeneous network. The microbe similarity is computed from Gaussian Interaction Profile (GIP) kernel similarity based on the known microbe-disease associations. Then, we further improve the microbe similarity by taking into account the inhabiting organs of these microbes in human body. The disease similarity is computed by the average of disease GIP similarity, disease symptom-based similarity, and disease functional similarity. Then, we construct a heterogeneous microbe-disease association network by integrating the microbe similarity network, disease similarity network, and known microbe-disease association network. Finally, a matrix completion method is used to calculate the association scores of unknown microbe-disease pairs by the fast Singular Value Thresholding (SVT) algorithm. Via 5-fold Cross Validation (5CV) and Leave-One-Out Cross Validation (LOOCV), we evaluate the prediction performances of MCHMDA and other state-of-the-art methods which include BRWMDA, NGRHMDA, LRLSHMDA, and KATZHMDA. On benchmark dataset HMDAD, the experimental results show that MCHMDA outperforms other methods in terms of area under the receiver operating characteristic curve (AUC). MCHMDA achieves the AUC values of 0.9251 and 0.9495 in 5CV and LOOCV, respectively, which are the highest values among the competing methods. In addition, we also further indicate the prediction generality of MCHMDA on an expanded microbe-disease associations dataset (HMDAD-SUP). Finally, case studies prove the prediction ability in practical applications.
Collapse
|
77
|
Zhang W, Li Z, Guo W, Yang W, Huang F. A Fast Linear Neighborhood Similarity-Based Network Link Inference Method to Predict MicroRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:405-415. [PMID: 31369383 DOI: 10.1109/tcbb.2019.2931546] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Increasing evidences revealed that microRNAs (miRNAs) play critical roles in important biological processes. The identification of disease-related miRNAs is critical to understand the molecular mechanisms of human diseases. Most existing computational methods require diverse features to predict miRNA-disease associations. However, diverse features are not available for all miRNAs or diseases. In addition, most methods can't predict links for miRNAs or diseases without association information. In this paper, we propose a fast linear neighborhood similarity-based network link inference method, named FLNSNLI, to predict miRNA-disease associations. First, known miRNA-disease associations are formulated as a bipartite network, and miRNAs (or diseases) are expressed as association profiles. Second, miRNA-miRNA similarity and disease-disease similarity are calculated by fast linear neighborhood similarity measure and association profiles. Third, the label propagation algorithm is respectively implemented on two sides to score candidate miRNA-disease associations. Finally, FLNSNLI adopts the weighted average strategy and makes predictions. Moreover, we develop a link complementing approach, and extend FLNSNLI to predict links for miRNAs (or diseases) without known associations. In computational experiments, FLNSNLI produces high-accuracy performances, and outperforms other state-of-the-art methods. More importantly, FLNSNLI requires less information but performs well. Case studies on three popular diseases show that FLNSNLI is useful for the microRNA-disease association prediction.
Collapse
|
78
|
Prediction of miRNA-Disease Association Using Deep Collaborative Filtering. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6652948. [PMID: 33681362 PMCID: PMC7929672 DOI: 10.1155/2021/6652948] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/01/2021] [Accepted: 02/10/2021] [Indexed: 12/12/2022]
Abstract
The existing studies have shown that miRNAs are related to human diseases by regulating gene expression. Identifying miRNA association with diseases will contribute to diagnosis, treatment, and prognosis of diseases. The experimental identification of miRNA-disease associations is time-consuming, tremendously expensive, and of high-failure rate. In recent years, many researchers predicted potential associations between miRNAs and diseases by computational approaches. In this paper, we proposed a novel method using deep collaborative filtering called DCFMDA to predict miRNA-disease potential associations. To improve prediction performance, we integrated neural network matrix factorization (NNMF) and multilayer perceptron (MLP) in a deep collaborative filtering framework. We utilized known miRNA-disease associations to capture miRNA-disease interaction features by NNMF and utilized miRNA similarity and disease similarity to extract miRNA feature vector and disease feature vector, respectively, by MLP. At last, we merged outputs of the NNMF and MLP to obtain the prediction matrix. The experimental results indicate that compared with other existing computational methods, our method can achieve the AUC of 0.9466 based on 10-fold cross-validation. In addition, case studies show that the DCFMDA can effectively predict candidate miRNAs for breast neoplasms, colon neoplasms, kidney neoplasms, leukemia, and lymphoma.
Collapse
|
79
|
Joodaki M, Ghadiri N, Maleki Z, Lotfi Shahreza M. A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion. J Biomed Inform 2021; 115:103688. [PMID: 33545331 DOI: 10.1016/j.jbi.2021.103688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/10/2021] [Accepted: 01/23/2021] [Indexed: 12/11/2022]
Abstract
One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
| | - Zeinab Maleki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | | |
Collapse
|
80
|
Broyde J, Simpson DR, Murray D, Paull EO, Chu BW, Tagore S, Jones SJ, Griffin AT, Giorgi FM, Lachmann A, Jackson P, Sweet-Cordero EA, Honig B, Califano A. Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses. Nat Biotechnol 2021; 39:215-224. [PMID: 32929263 PMCID: PMC7878435 DOI: 10.1038/s41587-020-0652-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 07/23/2020] [Indexed: 02/08/2023]
Abstract
Tumor-specific elucidation of physical and functional oncoprotein interactions could improve tumorigenic mechanism characterization and therapeutic response prediction. Current interaction models and pathways, however, lack context specificity and are not oncoprotein specific. We introduce SigMaps as context-specific networks, comprising modulators, effectors and cognate binding-partners of a specific oncoprotein. SigMaps are reconstructed de novo by integrating diverse evidence sources-including protein structure, gene expression and mutational profiles-via the OncoSig machine learning framework. We first generated a KRAS-specific SigMap for lung adenocarcinoma, which recapitulated published KRAS biology, identified novel synthetic lethal proteins that were experimentally validated in three-dimensional spheroid models and established uncharacterized crosstalk with RAB/RHO. To show that OncoSig is generalizable, we first inferred SigMaps for the ten most mutated human oncoproteins and then for the full repertoire of 715 proteins in the COSMIC Cancer Gene Census. Taken together, these SigMaps show that the cell's regulatory and signaling architecture is highly tissue specific.
Collapse
Affiliation(s)
- Joshua Broyde
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - David R Simpson
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, UCSF Benioff Children's Hospital, San Francisco, CA, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Evan O Paull
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Brennan W Chu
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Somnath Tagore
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Sunny J Jones
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Aaron T Griffin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Federico M Giorgi
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Alexander Lachmann
- Mount Sinai Center for Bioinformatics; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Peter Jackson
- Baxter Laboratory, Department of Microbiology & Immunology, Stanford University, Palo Alto, CA, USA
- Department of Pathology, Stanford University, Palo Alto, CA, USA
| | - E Alejandro Sweet-Cordero
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, UCSF Benioff Children's Hospital, San Francisco, CA, USA.
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Department of Medicine, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, NY, USA.
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Department of Medicine, Columbia University, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
- Institute for Cancer Genetics, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA.
- Motor Neuron Center and Columbia Initiative in Stem Cells, Columbia University, New York, NY, USA.
| |
Collapse
|
81
|
|
82
|
Wang H, Tang J, Ding Y, Guo F. Exploring associations of non-coding RNAs in human diseases via three-matrix factorization with hypergraph-regular terms on center kernel alignment. Brief Bioinform 2021; 22:6095847. [PMID: 33443536 DOI: 10.1093/bib/bbaa409] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 11/05/2020] [Accepted: 12/11/2020] [Indexed: 12/25/2022] Open
Abstract
Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA-disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA-disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of $0.9832$, $0.9775$, $0.9023$, $0.8809$ and $0.9185$ via 5-fold cross-validation and $0.9832$, $0.9836$, $0.9198$, $0.9459$ and $0.9275$ via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact: fguo@tju.edu.cn.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
83
|
Scelsi MA, Napolioni V, Greicius MD, Altmann A. Network propagation of rare variants in Alzheimer's disease reveals tissue-specific hub genes and communities. PLoS Comput Biol 2021; 17:e1008517. [PMID: 33411734 PMCID: PMC7817020 DOI: 10.1371/journal.pcbi.1008517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 01/20/2021] [Accepted: 11/10/2020] [Indexed: 11/18/2022] Open
Abstract
State-of-the-art rare variant association testing methods aggregate the contribution of rare variants in biologically relevant genomic regions to boost statistical power. However, testing single genes separately does not consider the complex interaction landscape of genes, nor the downstream effects of non-synonymous variants on protein structure and function. Here we present the NETwork Propagation-based Assessment of Genetic Events (NETPAGE), an integrative approach aimed at investigating the biological pathways through which rare variation results in complex disease phenotypes. We applied NETPAGE to sporadic, late-onset Alzheimer's disease (AD), using whole-genome sequencing from the AD Neuroimaging Initiative (ADNI) cohort, as well as whole-exome sequencing from the AD Sequencing Project (ADSP). NETPAGE is based on network propagation, a framework that models information flow on a graph and simulates the percolation of genetic variation through tissue-specific gene interaction networks. The result of network propagation is a set of smoothed gene scores that can be tested for association with disease status through sparse regression. The application of NETPAGE to AD enabled the identification of a set of connected genes whose smoothed variation profile was robustly associated to case-control status, based on gene interactions in the hippocampus. Additionally, smoothed scores significantly correlated with risk of conversion to AD in Mild Cognitive Impairment (MCI) subjects. Lastly, we investigated tissue-specific transcriptional dysregulation of the core genes in two independent RNA-seq datasets, as well as significant enrichments in terms of gene sets with known connections to AD. We present a framework that enables enhanced genetic association testing for a wide range of traits, diseases, and sample sizes.
Collapse
Affiliation(s)
- Marzia Antonella Scelsi
- Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
| | - Valerio Napolioni
- Functional Imaging in Neuropsychiatric Disorders (FIND) Lab, Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Michael D Greicius
- Functional Imaging in Neuropsychiatric Disorders (FIND) Lab, Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Andre Altmann
- Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
| | | |
Collapse
|
84
|
Reyna MA, Chitra U, Elyanow R, Raphael BJ. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks. J Comput Biol 2021; 28:469-484. [PMID: 33400606 DOI: 10.1089/cmb.2020.0435] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Rebecca Elyanow
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Department of Computer Science, Brown University, Providence, Rhode Island, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
85
|
Qin R, Duan L, Zheng H, Li-Ling J, Song K, Zhang Y. An Ontology-Independent Representation Learning for Similar Disease Detection Based on Multi-Layer Similarity Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:183-193. [PMID: 31536013 DOI: 10.1109/tcbb.2019.2941475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To identify similar diseases has significant implications for revealing the etiology and pathogenesis of diseases and further research in the domain of biomedicine. Currently, most methods for the measurement of disease similarity utilize either associations of ontological disease concepts or functional interactions between disease-related genes. These methods are heavily dependent on the ontology, which are not always available, and the selection of datasets. Moreover, many methods suffer from a drawback that they only use a single metric to evaluate disease similarity from an individual data source, which may result in biased conclusions without consideration of other aspects. In this study, we proposed a novel ontology-independent framework, namely RADAR, for learning representations for diseases to deduce their similarities from an integrative perspective. By leveraging the associations between diseases and disease-related biomedical entities, a disease similarity network was built under various metrics. Then, a multi-layer disease similarity network was constructed by integrating multiple disease similarity networks derived from multiple data sources, where the representation learning was derived to provide a comprehensive evaluation of disease similarities. The performance of RADAR was assessed by a benchmark disease set and 100 random disease sets. Experimental results demonstrated that RADAR can detect similar diseases effectively.
Collapse
|
86
|
Liu Y, Guo Y, Liu X, Wang C, Guo M. Pathogenic gene prediction based on network embedding. Brief Bioinform 2020; 22:6053103. [PMID: 33367541 DOI: 10.1093/bib/bbaa353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.
Collapse
Affiliation(s)
- Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuchen Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
87
|
Barel G, Herwig R. NetCore: a network propagation approach using node coreness. Nucleic Acids Res 2020; 48:e98. [PMID: 32735660 PMCID: PMC7515737 DOI: 10.1093/nar/gkaa639] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/22/2020] [Accepted: 07/21/2020] [Indexed: 02/07/2023] Open
Abstract
We present NetCore, a novel network propagation approach based on node coreness, for phenotype–genotype associations and module identification. NetCore addresses the node degree bias in PPI networks by using node coreness in the random walk with restart procedure, and achieves improved re-ranking of genes after propagation. Furthermore, NetCore implements a semi-supervised approach to identify phenotype-associated network modules, which anchors the identification of novel candidate genes at known genes associated with the phenotype. We evaluated NetCore on gene sets from 11 different GWAS traits and showed improved performance compared to the standard degree-based network propagation using cross-validation. Furthermore, we applied NetCore to identify disease genes and modules for Schizophrenia GWAS data and pan-cancer mutation data. We compared the novel approach to existing network propagation approaches and showed the benefits of using NetCore in comparison to those. We provide an easy-to-use implementation, together with a high confidence PPI network extracted from ConsensusPathDB, which can be applied to various types of genomics data in order to obtain a re-ranking of genes and functionally relevant network modules.
Collapse
Affiliation(s)
- Gal Barel
- Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| |
Collapse
|
88
|
Pathak GA, Wendt FR, Goswami A, Angelis FD, Polimanti R. ACE2 Netlas: In-silico functional characterization and drug-gene interactions of ACE2 gene network to understand its potential involvement in COVID-19 susceptibility. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.10.27.20220665. [PMID: 33140059 PMCID: PMC7605570 DOI: 10.1101/2020.10.27.20220665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Angiotensin-converting enzyme-2 ( ACE2 ) receptor has been identified as the key adhesion molecule for the transmission of the SARS-CoV-2. However, there is no evidence that human genetic variation in ACE2 is singularly responsible for COVID-19 susceptibility. Therefore, we performed a multi-level characterization of genes that interact with ACE2 (ACE2-gene network) for their over-represented biological properties in the context of COVID-19. The phenome-wide association of 51 genes including ACE2 with 4,756 traits categorized into 26 phenotype categories, showed enrichment of immunological, respiratory, environmental, skeletal, dermatological, and metabolic domains (p<4e-4). Transcriptomic regulation of ACE2-gene network was enriched for tissue-specificity in kidney, small intestine, and colon (p<4.7e-4). Leveraging the drug-gene interaction database we identified 47 drugs, including dexamethasone and spironolactone, among others. Considering genetic variants within ± 10 kb of ACE2-network genes we characterized functional consequences (among others) using miRNA binding-site targets. MiRNAs affected by ACE2-network variants revealed statistical over-representation of inflammation, aging, diabetes, and heart conditions. With respect to variants mapped to the ACE2-network, we observed COVID-19 related associations in RORA, SLC12A6 and SLC6A19 genes. Overall, functional characterization of ACE2-gene network highlights several potential mechanisms in COVID-19 susceptibility. The data can also be accessed at https://gpwhiz.github.io/ACE2Netlas/.
Collapse
Affiliation(s)
- Gita A Pathak
- Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT
| | - Frank R Wendt
- Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT
| | - Aranyak Goswami
- Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT
| | - Flavio De Angelis
- Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT
| | - Renato Polimanti
- Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT
| |
Collapse
|
89
|
Guerra C, Joshi S, Lu Y, Palini F, Ferraro Petrillo U, Rossignac J. Rank-Similarity Measures for Comparing Gene Prioritizations: A Case Study in Autism. J Comput Biol 2020; 28:283-295. [PMID: 33103913 DOI: 10.1089/cmb.2020.0244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We discuss the challenge of comparing three gene prioritization methods: network propagation, integer linear programming rank aggregation (RA), and statistical RA. These methods are based on different biological categories and estimate disease-gene association. Previously proposed comparison schemes are based on three measures of performance: receiver operating curve, area under the curve, and median rank ratio. Although they may capture important aspects of gene prioritization performance, they may fail to capture important differences in the rankings of individual genes. We suggest that comparison schemes could be improved by also considering recently proposed measures of similarity between gene rankings. We tested this suggestion on comparison schemes for prioritizations of genes associated with autism that were obtained using brain- and tissue-specific data. Our results show the effectiveness of our measures of similarity in clustering brain regions based on their relevance to autism.
Collapse
Affiliation(s)
- Concettina Guerra
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Sarang Joshi
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Yinquan Lu
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| | - Francesco Palini
- Dipartimento di Scienze Statistiche, Università di Roma-La Sapienza, Rome, Italy
| | | | - Jarek Rossignac
- Georgia Institute of Technology College of Computing, School of Interactive Computing, Atlanta, Georgia, USA
| |
Collapse
|
90
|
Dozmorov MG, Cresswell KG, Bacanu SA, Craver C, Reimers M, Kendler KS. A method for estimating coherence of molecular mechanisms in major human disease and traits. BMC Bioinformatics 2020; 21:473. [PMID: 33087046 PMCID: PMC7579960 DOI: 10.1186/s12859-020-03821-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 10/15/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Phenotypes such as height and intelligence, are thought to be a product of the collective effects of multiple phenotype-associated genes and interactions among their protein products. High/low degree of interactions is suggestive of coherent/random molecular mechanisms, respectively. Comparing the degree of interactions may help to better understand the coherence of phenotype-specific molecular mechanisms and the potential for therapeutic intervention. However, direct comparison of the degree of interactions is difficult due to different sizes and configurations of phenotype-associated gene networks. METHODS We introduce a metric for measuring coherence of molecular-interaction networks as a slope of internal versus external distributions of the degree of interactions. The internal degree distribution is defined by interaction counts within a phenotype-specific gene network, while the external degree distribution counts interactions with other genes in the whole protein-protein interaction (PPI) network. We present a novel method for normalizing the coherence estimates, making them directly comparable. RESULTS Using STRING and BioGrid PPI databases, we compared the coherence of 116 phenotype-associated gene sets from GWAScatalog against size-matched KEGG pathways (the reference for high coherence) and random networks (the lower limit of coherence). We observed a range of coherence estimates for each category of phenotypes. Metabolic traits and diseases were the most coherent, while psychiatric disorders and intelligence-related traits were the least coherent. We demonstrate that coherence and modularity measures capture distinct network properties. CONCLUSIONS We present a general-purpose method for estimating and comparing the coherence of molecular-interaction gene networks that accounts for the network size and shape differences. Our results highlight gaps in our current knowledge of genetics and molecular mechanisms of complex phenotypes and suggest priorities for future GWASs.
Collapse
Affiliation(s)
- Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA USA
| | - Kellen G. Cresswell
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| | - Carl Craver
- Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO USA
| | - Mark Reimers
- Department Physiology, Michigan State University, East Lansing, MI USA
- Department Biomedical Engineering, Michigan State University, East Lansing, MI USA
| | - Kenneth S. Kendler
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
91
|
Discover novel disease-associated genes based on regulatory networks of long-range chromatin interactions. Methods 2020; 189:22-33. [PMID: 33096239 DOI: 10.1016/j.ymeth.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 08/29/2020] [Accepted: 10/18/2020] [Indexed: 02/01/2023] Open
Abstract
Identifying genes and non-coding genetic variants that are genetically associated with complex diseases and the underlying mechanisms is one of the most important questions in functional genomics. Due to the limited statistical power and the lack of mechanistic modeling, traditional genome-wide association studies (GWAS) is restricted to fully address this question. Based on multi-omics data integration, cell-type specific regulatory networks can be built to improve GWAS analysis. In this study, we developed a new computational infrastructure, APRIL, to incorporate 3D chromatin interactions into regulatory network construction, which can extend the networks to include long-range cis-regulatory links between non-coding GWAS SNPs and target genes. Combinatorial transcription factors that co-regulate groups of genes are also inferred to further expand the networks with trans-regulation. A suite of machine learning predictions and statistical tests are incorporated in APRIL to predict novel disease-associated genes based on the expanded regulatory networks. Important features of non-coding regulatory elements and genetic variants are prioritized in network-based predictions, providing systems-level insights on the mechanisms of transcriptional dysregulation associated with complex diseases.
Collapse
|
92
|
Ruan P, Wang S. DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes. Brief Bioinform 2020; 22:5925270. [PMID: 33064143 DOI: 10.1093/bib/bbaa241] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 07/25/2020] [Accepted: 08/29/2020] [Indexed: 12/27/2022] Open
Abstract
Biological network-based strategies are useful in prioritizing genes associated with diseases. Several comprehensive human gene networks such as STRING, GIANT and HumanNet were developed and used in network-assisted algorithms to identify disease-associated genes. However, none of these networks are disease-specific and may not accurately reflect gene interactions for a specific disease. Aiming to improve disease gene prioritization using networks, we propose a Disease-Specific Network Enhancement Prioritization (DiSNEP) framework. DiSNEP first enhances a comprehensive gene network specifically for a disease through a diffusion process on a gene-gene similarity matrix derived from disease omics data. The enhanced disease-specific gene network thus better reflects true gene interactions for the disease and may improve prioritizing disease-associated genes subsequently. In simulations, DiSNEP that uses an enhanced disease-specific network prioritizes more true signal genes than comparison methods using a general gene network or without prioritization. Applications to prioritize cancer-associated gene expression and DNA methylation signal genes for five cancer types from The Cancer Genome Atlas (TCGA) project suggest that more prioritized candidate genes by DiSNEP are cancer-related according to the DisGeNET database than those prioritized by the comparison methods, consistently across all five cancer types considered, and for both gene expression and DNA methylation signal genes.
Collapse
|
93
|
Chen YX, Rong Y, Jiang F, Chen JB, Duan YY, Dong SS, Zhu DL, Chen H, Yang TL, Dai Z, Guo Y. An integrative multi-omics network-based approach identifies key regulators for breast cancer. Comput Struct Biotechnol J 2020; 18:2826-2835. [PMID: 33133424 PMCID: PMC7585874 DOI: 10.1016/j.csbj.2020.10.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 09/13/2020] [Accepted: 10/01/2020] [Indexed: 02/06/2023] Open
Abstract
Although genome-wide association studies (GWASs) have successfully identified thousands of risk variants for human complex diseases, understanding the biological function and molecular mechanisms of the associated SNPs involved in complex diseases is challenging. Here we developed a framework named integrative multi-omics network-based approach (IMNA), aiming to identify potential key genes in regulatory networks by integrating molecular interactions across multiple biological scales, including GWAS signals, gene expression-based signatures, chromatin interactions and protein interactions from the network topology. We applied this approach to breast cancer, and prioritized key genes involved in regulatory networks. We also developed an abnormal gene expression score (AGES) signature based on the gene expression deviation of the top 20 rank-ordered genes in breast cancer. The AGES values are associated with genetic variants, tumor properties and patient survival outcomes. Among the top 20 genes, RNASEH2A was identified as a new candidate gene for breast cancer. Thus, our integrative network-based approach provides a genetic-driven framework to unveil tissue-specific interactions from multiple biological scales and reveal potential key regulatory genes for breast cancer. This approach can also be applied in other complex diseases such as ovarian cancer to unravel underlying mechanisms and help for developing therapeutic targets.
Collapse
Affiliation(s)
- Yi-Xiao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Yu Rong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Feng Jiang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Jia-Bin Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Yuan-Yuan Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Dong-Li Zhu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
- Research Institute of Xi'an Jiaotong University, Zhejiang Province 311215, PR China
| | - Hao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
- Research Institute of Xi'an Jiaotong University, Zhejiang Province 311215, PR China
| | - Zhijun Dai
- Department of Breast Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province 310003, PR China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi Province 710049, PR China
| |
Collapse
|
94
|
Ye P, Ye W, Ye C, Li S, Ye L, Ji G, Wu X. scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size. Bioinformatics 2020; 36:789-797. [PMID: 31392316 DOI: 10.1093/bioinformatics/btz627] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 07/18/2019] [Accepted: 08/06/2019] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. RESULTS We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. AVAILABILITY AND IMPLEMENTATION Freely available for download at https://github.com/BMILAB/scHinter. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengchao Ye
- Department of Automation, Fujian 361005, China.,National Institute for Data Science in Health and Medicine, Fujian 361005, China
| | - Wenbin Ye
- Department of Automation, Fujian 361005, China.,National Institute for Data Science in Health and Medicine, Fujian 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361005, China
| | - Shuchao Li
- Department of Automation, Fujian 361005, China.,National Institute for Data Science in Health and Medicine, Fujian 361005, China
| | - Lishan Ye
- Zhongshan Hospital of Xiamen University, Xiamen, Fujian 361004, China
| | - Guoli Ji
- Department of Automation, Fujian 361005, China.,National Institute for Data Science in Health and Medicine, Fujian 361005, China
| | - Xiaohui Wu
- Department of Automation, Fujian 361005, China.,National Institute for Data Science in Health and Medicine, Fujian 361005, China
| |
Collapse
|
95
|
Zeng X, Lin Y, He Y, Lu L, Min X, Rodriguez-Paton A. Deep Collaborative Filtering for Prediction of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1639-1647. [PMID: 30932845 DOI: 10.1109/tcbb.2019.2907536] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate prioritization of potential disease genes is a fundamental challenge in biomedical research. Various algorithms have been developed to solve such problems. Inductive Matrix Completion (IMC) is one of the most reliable models for its well-established framework and its superior performance in predicting gene-disease associations. However, the IMC method does not hierarchically extract deep features, which might limit the quality of recovery. In this case, the architecture of deep learning, which obtains high-level representations and handles noises and outliers presented in large-scale biological datasets, is introduced into the side information of genes in our Deep Collaborative Filtering (DCF) model. Further, for lack of negative examples, we also exploit Positive-Unlabeled (PU) learning formulation to low-rank matrix completion. Our approach achieves substantially improved performance over other state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database. Our approach is 10 percent more efficient than standard IMC in detecting a true association, and significantly outperforms other alternatives in terms of the precision-recall metric at the top-k predictions. Moreover, we also validate the disease with no previously known gene associations and newly reported OMIM associations. The experimental results show that DCF is still satisfactory for ranking novel disease phenotypes as well as mining unexplored relationships. The source code and the data are available at https://github.com/xzenglab/DCF.
Collapse
|
96
|
Hernaez M, Blatti C, Gevaert O. Comparison of single and module-based methods for modeling gene regulatory networks. Bioinformatics 2020; 36:558-567. [PMID: 31287491 DOI: 10.1093/bioinformatics/btz549] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 06/11/2019] [Accepted: 07/06/2019] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Gene regulatory networks describe the regulatory relationships among genes, and developing methods for reverse engineering these networks is an ongoing challenge in computational biology. The majority of the initially proposed methods for gene regulatory network discovery create a network of genes and then mine it in order to uncover previously unknown regulatory processes. More recent approaches have focused on inferring modules of co-regulated genes, linking these modules with regulatory genes and then mining them to discover new molecular biology. RESULTS In this work we analyze module-based network approaches to build gene regulatory networks, and compare their performance to single gene network approaches. In the process, we propose a novel approach to estimate gene regulatory networks drawing from the module-based methods. We show that generating modules of co-expressed genes which are predicted by a sparse set of regulators using a variational Bayes method, and then building a bipartite graph on the generated modules using sparse regression, yields more informative networks than previous single and module-based network approaches as measured by: (i) the rate of enriched gene sets, (ii) a network topology assessment, (iii) ChIP-Seq evidence and (iv) the KnowEnG Knowledge Network collection of previously characterized gene-gene interactions. AVAILABILITY AND IMPLEMENTATION The code is written in R and can be downloaded from https://github.com/mikelhernaez/linker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Olivier Gevaert
- The Stanford Center of Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
97
|
Biswas S, Pal S, Majumder PP, Bhattacharjee S. A framework for pathway knowledge driven prioritization in genome-wide association studies. Genet Epidemiol 2020; 44:841-853. [PMID: 32779262 PMCID: PMC7116354 DOI: 10.1002/gepi.22345] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/18/2020] [Accepted: 07/10/2020] [Indexed: 12/27/2022]
Abstract
Many variants with low frequencies or with low to modest effects likely remain unidentified in genome-wide association studies (GWAS) because of stringent genome-wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene-level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTest-Genomic Knowledge-guided Multiplte Testing) to enable a prioritized pathway-guided GWAS scan with a very large number of gene-level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome-wide summary level data and a user-specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as "adaptively learned" from the data using a cross-validation technique to avoid overfitting. We used whole-genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user-friendly open-source R package.
Collapse
Affiliation(s)
| | - Soumen Pal
- National Institute of Biomedical Genomics, Kalyani, India
| | | | | |
Collapse
|
98
|
Perscheid C. Integrative biomarker detection on high-dimensional gene expression data sets: a survey on prior knowledge approaches. Brief Bioinform 2020; 22:5881664. [PMID: 32761115 DOI: 10.1093/bib/bbaa151] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 06/15/2020] [Accepted: 06/16/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, University of Potsdam, Potsdam, 14482, Germany
| |
Collapse
|
99
|
Polak D, Sanui T, Nishimura F, Shapira L. Diabetes as a risk factor for periodontal disease-plausible mechanisms. Periodontol 2000 2020; 83:46-58. [PMID: 32385872 DOI: 10.1111/prd.12298] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The present narrative review examines the scientific evidence of the biological mechanisms that may link periodontitis and diabetes, as a source of comorbidity. Publications regarding periodontitis and diabetes, in human, animals, and in vitro were screened for their relevance. Periodontal microbiome studies indicate a possible association between altered glucose metabolism in prediabetes and diabetes and changes in the periodontal microbiome. Coinciding with this, hyperglycemia enhances expression of pathogen receptors, which enhance host response to the dysbiotic microbiome. Hyperglycemia also promotes pro-inflammatory response independently or via the advanced glycation end product/receptor for advanced glycation end product pathway. These processes excite cellular tissue destruction functions, which further enhance pro-inflammatory cytokines expression and alteration in the RANKL/osteoprotegerin ratio, promoting formation and activation of osteoclasts. The evidence supports the role of several pathogenic mechanisms in the path of true causal comorbidity between poorly controlled diabetes and periodontitis. However, further research is needed to better understand these mechanisms and to explore other mechanisms.
Collapse
Affiliation(s)
- David Polak
- Department of Periodontology, Hebrew University-Hadassah Faculty of Dental Medicine, Jerusalem, Israel
| | - Terukazu Sanui
- Section of Periodontology, Division of Oral Rehabilitation, Kyushu University Faculty of Dental Science, Fukuoka, Japan
| | - Fusanori Nishimura
- Section of Periodontology, Division of Oral Rehabilitation, Kyushu University Faculty of Dental Science, Fukuoka, Japan
| | - Lior Shapira
- Department of Periodontology, Hebrew University-Hadassah Faculty of Dental Medicine, Jerusalem, Israel
| |
Collapse
|
100
|
Rong G, Zhang Y, Ma Y, Chen S, Wang Y. The Clinical and Molecular Characterization of Gastric Cancer Patients in Qinghai-Tibetan Plateau. Front Oncol 2020; 10:1033. [PMID: 32695679 PMCID: PMC7339979 DOI: 10.3389/fonc.2020.01033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 05/26/2020] [Indexed: 01/05/2023] Open
Abstract
Gastric cancer was the fifth most common malignancy and the third deadliest cancer (738,000 deaths in 2018) in the world. The analysis of its molecular characteristics has been complicated by histological and intratumor heterogeneity. Furthermore, the previous studies indicate that the incidence of gastric cancer shows wide geographical variation. As the largest and highest region in China, Qinghai-Tibetan Plateau (QTP) is one of the important global biodiversity hotspots. Here, to better understand the mechanism of gastric cancer and offer the targeted therapeutic strategies specially designed for patients in QTP, we collect tumor and blood samples from 30 primary gastric adenocarcinoma cancer patients at Qinghai Provincial People's Hospital. We discuss the clinical and molecular characteristics for these patients that have been ascribed to the unique features in this place, including high altitude (the average height above sea level is around 4,000 m), multi-ethnic groups, and the specific ways of life or habits (such as eating too much beef and mutton, have alcohol and cigarette problem, et al.). By comparing with the western gastric cancer patients collected from TCGA data portal, some unique characteristics for patients in QTP are suggested. They include high incidence in younger people, most of tumor are located in body, most of SNP are detected in chromosome 7, and the very different molecular atlas in minor ethnic groups and Han Chinese. These characteristics will provide the unprecedented opportunity to increase the efficacy for diagnosis and prognosis of gastric cancer in QTP. Furthermore, to suggest the targeted therapeutics specially designed for these 30 patients, an adapted kernel-based learning model and a compilation of pharmacogenomics data of 462 patient-derived tumor cells (PDCs) that illustrate the diverse genetic and molecular backgrounds of cancer patients, were introduced. In conclusion, our study offers a big opportunity to better understand the mechanism of gastric cancer in QTP and guide the optimal patient-tailored therapy for them.
Collapse
Affiliation(s)
- Guanghong Rong
- Department of Gastroenterology, Qinghai Provincial People's Hospital, Xining, China
| | - Yongxia Zhang
- Department of Gynecology, Qinghai Provincial People's Hospital, Xining, China
| | - Yingcai Ma
- Department of Gastroenterology, Qinghai Provincial People's Hospital, Xining, China
| | - Shilong Chen
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Institute of Sanjiangyuan National Park, Chinese Academy of Sciences, Xining, China
| | - Yongcui Wang
- Institute of Sanjiangyuan National Park, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Crop Molecular Breeding, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| |
Collapse
|