1
|
Laman Trip DS, van Oostrum M, Memon D, Frommelt F, Baptista D, Panneerselvam K, Bradley G, Licata L, Hermjakob H, Orchard S, Trynka G, McDonagh EM, Fossati A, Aebersold R, Gstaiger M, Wollscheid B, Beltrao P. A tissue-specific atlas of protein-protein associations enables prioritization of candidate disease genes. Nat Biotechnol 2025:10.1038/s41587-025-02659-z. [PMID: 40316700 DOI: 10.1038/s41587-025-02659-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 03/28/2025] [Indexed: 05/04/2025]
Abstract
Despite progress in mapping protein-protein interactions, their tissue specificity is understudied. Here, given that protein coabundance is predictive of functional association, we compiled and analyzed protein abundance data of 7,811 proteomic samples from 11 human tissues to produce an atlas of tissue-specific protein associations. We find that this method recapitulates known protein complexes and the larger structural organization of the cell. Interactions of stable protein complexes are well preserved across tissues, while cell-type-specific cellular structures, such as synaptic components, are found to represent a substantial driver of differences between tissues. Over 25% of associations are tissue specific, of which <7% are because of differences in gene expression. We validate protein associations for the brain through cofractionation experiments in synaptosomes, curation of brain-derived pulldown data and AlphaFold2 modeling. We also construct a network of brain interactions for schizophrenia-related genes, indicating that our approach can functionally prioritize candidate disease genes in loci linked to brain disorders.
Collapse
Affiliation(s)
- Diederik S Laman Trip
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Marc van Oostrum
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Department of Health Sciences and Technology, Institute of Translational Medicine, ETH Zurich, Zurich, Switzerland
- Biozentrum, University of Basel, Basel, Switzerland
| | - Danish Memon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Fabian Frommelt
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Delora Baptista
- Gulbenkian Institute for Molecular Medicine, Oeiras, Portugal
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Glyn Bradley
- Computational Biology, Functional Genomics, GSK, Stevenage, UK
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Gosia Trynka
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Andrea Fossati
- Science for Life Laboratory, Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Solna, Sweden
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Matthias Gstaiger
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Department of Health Sciences and Technology, Institute of Translational Medicine, ETH Zurich, Zurich, Switzerland
| | - Pedro Beltrao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK.
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
- Gulbenkian Institute for Molecular Medicine, Oeiras, Portugal.
| |
Collapse
|
2
|
Karathia H, Hannenhalli S, Alves R. The Functional Comparison of Eukaryotic Proteomes: Implications for Choosing an Appropriate Model Organism to Probe Human Biology. Methods Mol Biol 2025; 2859:163-179. [PMID: 39436601 DOI: 10.1007/978-1-0716-4152-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Phenotypic differences between species are, in significant part, determined by their proteomic diversity. The link between proteomic and phenotypic diversity can be best understood in the context of the various pathways and biological processes in which proteins participate. While the conservation pattern for individual proteins across species is expected to follow the phylogenetic relationships among the species, the diversity patterns of individual pathways may not: certain pathways may be much more conserved among distantly related species than two closely related species, owing to the ecological histories of the species. Thus, a pathway-centric analysis of proteome conservation and diversity has important implications for the appropriate choice of a model organism when investigating specific aspects of human biology. Exploiting the complete genome sequences and protein-coding gene annotations, here we perform a comprehensive gene-set-centric analysis of proteomic diversity between humans and 54 eukaryotic organisms, resulting in a catalog of organisms that are most similar to humans in terms of specific pathways, processes, expression patterns, and diseases. We corroborate our findings using species-specific mass spectrometry data.Our analysis provides a general framework to identify conserved and unique pathways in a group of organisms and a resource to prioritize appropriate model systems to study a specific biological system in a reference organism such as humans.
Collapse
Affiliation(s)
- Hiren Karathia
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research (FNLCR), Frederick, MD, USA.
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
| | - Sridhar Hannenhalli
- Cancer Data Science Lab, CCR, National Cancer Institute (NIH), Bethesda, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Rui Alves
- Ciencies Mediques Basiques, University of Lleida, Lleida, Catalonia, Spain
- IRBLleida, Lleida, Spain
| |
Collapse
|
3
|
Mousavi Z, Arvanitis M, Duong T, Brody JA, Battle A, Sotoodehnia N, Shojaie A, Arking DE, Bader JS. Prioritization of causal genes from genome-wide association studies by Bayesian data integration across loci. PLoS Comput Biol 2025; 21:e1012725. [PMID: 39774334 PMCID: PMC11741684 DOI: 10.1371/journal.pcbi.1012725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 01/17/2025] [Accepted: 12/16/2024] [Indexed: 01/11/2025] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have identified genetic variants, usually single-nucleotide polymorphisms (SNPs), associated with human traits, including disease and disease risk. These variants (or causal variants in linkage disequilibrium with them) usually affect the regulation or function of a nearby gene. A GWAS locus can span many genes, however, and prioritizing which gene or genes in a locus are most likely to be causal remains a challenge. Better prioritization and prediction of causal genes could reveal disease mechanisms and suggest interventions. RESULTS We describe a new Bayesian method, termed SigNet for significance networks, that combines information both within and across loci to identify the most likely causal gene at each locus. The SigNet method builds on existing methods that focus on individual loci with evidence from gene distance and expression quantitative trait loci (eQTL) by sharing information across loci using protein-protein and gene regulatory interaction network data. In an application to cardiac electrophysiology with 226 GWAS loci, only 46 (20%) have within-locus evidence from Mendelian genes, protein-coding changes, or colocalization with eQTL signals. At the remaining 180 loci lacking functional information, SigNet selects 56 genes other than the minimum distance gene, equal to 31% of the information-poor loci and 25% of the GWAS loci overall. Assessment by pathway enrichment demonstrates improved performance by SigNet. Review of individual loci shows literature evidence for genes selected by SigNet, including PMP22 as a novel causal gene candidate.
Collapse
Affiliation(s)
- Zeinab Mousavi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Marios Arvanitis
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - ThuyVy Duong
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Dan E. Arking
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Joel S. Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
4
|
Zhang C, Li W, Deng M, Jiang Y, Cui X, Chen P. SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1752-1764. [PMID: 38875076 DOI: 10.1109/tcbb.2024.3414498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets.
Collapse
|
5
|
Hernández-Lemus E, Ochoa S. Methods for multi-omic data integration in cancer research. Front Genet 2024; 15:1425456. [PMID: 39364009 PMCID: PMC11446849 DOI: 10.3389/fgene.2024.1425456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/28/2024] [Indexed: 10/05/2024] Open
Abstract
Multi-omics data integration is a term that refers to the process of combining and analyzing data from different omic experimental sources, such as genomics, transcriptomics, methylation assays, and microRNA sequencing, among others. Such data integration approaches have the potential to provide a more comprehensive functional understanding of biological systems and has numerous applications in areas such as disease diagnosis, prognosis and therapy. However, quantitative integration of multi-omic data is a complex task that requires the use of highly specialized methods and approaches. Here, we discuss a number of data integration methods that have been developed with multi-omics data in view, including statistical methods, machine learning approaches, and network-based approaches. We also discuss the challenges and limitations of such methods and provide examples of their applications in the literature. Overall, this review aims to provide an overview of the current state of the field and highlight potential directions for future research.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
6
|
Wang J, Hancock ER. The Ihara zeta function as a partition function for network structure characterisation. Sci Rep 2024; 14:18386. [PMID: 39117698 PMCID: PMC11310400 DOI: 10.1038/s41598-024-68882-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 07/29/2024] [Indexed: 08/10/2024] Open
Abstract
Statistical characterizations of complex network structures can be obtained from both the Ihara Zeta function (in terms of prime cycle frequencies) and the partition function from statistical mechanics. However, these two representations are usually regarded as separate tools for network analysis, without exploiting the potential synergies between them. In this paper, we establish a link between the Ihara Zeta function from algebraic graph theory and the partition function from statistical mechanics, and exploit this relationship to obtain a deeper structural characterisation of network structure. Specifically, the relationship allows us to explore the connection between the microscopic structure and the macroscopic characterisation of a network. We derive thermodynamic quantities describing the network, such as entropy, and show how these are related to the frequencies of prime cycles of various lengths. In particular, the n-th order partial derivative of the Ihara Zeta function can be used to compute the number of prime cycles in a network, which in turn is related to the partition function of Bose-Einstein statistics. The corresponding derived entropy allows us to explore a phase transition in the network structure with critical points at high and low-temperature limits. Numerical experiments and empirical data are presented to evaluate the qualitative and quantitative performance of the resulting structural network characterisations.
Collapse
Affiliation(s)
- Jianjia Wang
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou, 215412, China.
| | - Edwin R Hancock
- Department of Computer Science, University of York, York, YO10 5GH, UK
| |
Collapse
|
7
|
Chen J, Zhang CH, Tao T, Zhang X, Lin Y, Wang FB, Liu HF, Liu J. A-to-I RNA co-editing predicts clinical outcomes and is associated with immune cells infiltration in hepatocellular carcinoma. Commun Biol 2024; 7:838. [PMID: 38982182 PMCID: PMC11233613 DOI: 10.1038/s42003-024-06520-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/28/2024] [Indexed: 07/11/2024] Open
Abstract
Aberrant RNA editing has emerged as a pivotal factor in the pathogenesis of hepatocellular carcinoma (HCC), but the impact of RNA co-editing within HCC remains underexplored. We used a multi-step algorithm to construct an RNA co-editing network in HCC, and found that HCC-related RNA editings are predominantly centralized within the network. Furthermore, five pairs of risk RNA co-editing events were significantly correlated with the overall survival in HCC. Based on presence of risk RNA co-editings resulted in the categorization of HCC patients into high-risk and low-risk groups. Disparities in immune cell infiltrations were observed between the two groups, with the high-risk group exhibiting a greater abundance of exhausted T cells. Additionally, seven genes associated with risk RNA co-editing pairs were identified, whose expression effectively differentiates HCC tumor samples from normal ones. Our research offers an innovative perspective on the etiology and potential therapeutics for HCC.
Collapse
Affiliation(s)
- Juan Chen
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Cheng-Hui Zhang
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Tao Tao
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Xian Zhang
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Yan Lin
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Fang-Bin Wang
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Hui-Fang Liu
- Department of Endocrinology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430014, Hubei, China.
| | - Jian Liu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, 230009, China.
- Engineering Research Center of Bio-process, Ministry of Education, Hefei University of Technology, Hefei, 230009, China.
| |
Collapse
|
8
|
Lu D, Li J, Zheng C, Liu J, Zhang Q. HGTMDA: A Hypergraph Learning Approach with Improved GCN-Transformer for miRNA-Disease Association Prediction. Bioengineering (Basel) 2024; 11:680. [PMID: 39061762 PMCID: PMC11273495 DOI: 10.3390/bioengineering11070680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/14/2024] [Accepted: 07/02/2024] [Indexed: 07/28/2024] Open
Abstract
Accumulating scientific evidence highlights the pivotal role of miRNA-disease association research in elucidating disease pathogenesis and developing innovative diagnostics. Consequently, accurately identifying disease-associated miRNAs has emerged as a prominent research topic in bioinformatics. Advances in graph neural networks (GNNs) have catalyzed methodological breakthroughs in this field. However, existing methods are often plagued by data noise and struggle to effectively integrate local and global information, which hinders their predictive performance. To address this, we introduce HGTMDA, an innovative hypergraph learning framework that incorporates random walk with restart-based association masking and an enhanced GCN-Transformer model to infer miRNA-disease associations. HGTMDA starts by constructing multiple homogeneous similarity networks. A novel enhancement of our approach is the introduction of a restart-based random walk association masking strategy. By stochastically masking a subset of association data and integrating it with a GCN enhanced by an attention mechanism, this strategy enables better capture of key information, leading to improved information utilization and reduced impact of noisy data. Next, we build an miRNA-disease heterogeneous hypergraph and adopt an improved GCN-Transformer encoder to effectively solve the effective extraction of local and global information. Lastly, we utilize a combined Dice cross-entropy (DCE) loss function to guide the model training and optimize its performance. To evaluate the performance of HGTMDA, comprehensive comparisons were conducted with state-of-the-art methods. Additionally, in-depth case studies on lung cancer and colorectal cancer were performed. The results demonstrate HGTMDA's outstanding performance across various metrics and its exceptional effectiveness in real-world application scenarios, highlighting the advantages and value of this method.
Collapse
Affiliation(s)
- Daying Lu
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China
| | | | | | | | | |
Collapse
|
9
|
Evans LW, Durbin-Johnson B, Sutton KJ, Yam P, Bouzid YY, Cervantes E, Bonnel E, Stephenson CB, Bennett BJ. Specific circulating miRNAs are associated with plasma lipids in a healthy American cohort. Physiol Genomics 2024; 56:492-505. [PMID: 38557280 PMCID: PMC11368566 DOI: 10.1152/physiolgenomics.00087.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/20/2024] [Accepted: 03/27/2024] [Indexed: 04/04/2024] Open
Abstract
Low-density lipoprotein cholesterol (LDL-c) is both a therapeutic target and a risk factor for cardiovascular disease (CVD). MicroRNA (miRNA) has been shown to regulate cholesterol homeostasis, and miRNA in blood circulation has been linked to hypercholesterolemia. However, few studies to date have associated miRNA with phenotypes like LDL-c in a healthy population. To this end, we analyzed circulating miRNA in relation to LDL-c in a healthy cohort of 353 participants using two separate bioinformatic approaches. The first approach found that miR-15b-5p and miR-16-5p were upregulated in individuals with at-risk levels of LDL-c. The second approach identified two miRNA clusters, one that positively and a second that negatively correlated with LDL-c. Included in the cluster that positively correlated with LDL-c were miR-15b-5p and miR-16-5p, as well as other miRNA from the miR-15/107, miR-30, and let-7 families. Cross-species analyses suggested that several miRNAs that associated with LDL-c are conserved between mice and humans. Finally, we examined the influence of diet on circulating miRNA. Our results robustly linked circulating miRNA with LDL-c, suggesting that miRNA could be used as biomarkers for hypercholesterolemia or targets for developing cholesterol-lowering drugs.NEW & NOTEWORTHY This study explored the association between circulating microRNA (miRNA) and low-density lipoprotein cholesterol (LDL-c) in a healthy population of 353 participants. Two miRNAs, miR-15b-5p and miR-16-5p, were upregulated in individuals with at-risk LDL-c levels. Several miRNA clusters were positively and negatively correlated with LDL-c and are known to target mRNA involved in lipid metabolism. The study also investigated the influence of diet on circulating miRNA, suggesting potential biomarkers for hypercholesterolemia.
Collapse
Affiliation(s)
- Levi W Evans
- USDA-ARS-Western Human Nutrition Research Center, Davis, California, United States
| | - Blythe Durbin-Johnson
- Division of Biostatistics, University of California, Davis, California, United States
| | - Kristen J Sutton
- Department of Nutrition, University of California, Davis, California, United States
| | - Phoebe Yam
- Department of Nutrition, University of California, Davis, California, United States
| | - Yasmine Y Bouzid
- Department of Nutrition, University of California, Davis, California, United States
| | - Eduardo Cervantes
- Department of Nutrition, University of California, Davis, California, United States
| | - Ellen Bonnel
- Department of Nutrition, University of California, Davis, California, United States
| | - Charles B Stephenson
- USDA-ARS-Western Human Nutrition Research Center, Davis, California, United States
- Department of Nutrition, University of California, Davis, California, United States
| | - Brian J Bennett
- USDA-ARS-Western Human Nutrition Research Center, Davis, California, United States
- Department of Nutrition, University of California, Davis, California, United States
| |
Collapse
|
10
|
Xie J, Rao J, Xie J, Zhao H, Yang Y. Predicting disease-gene associations through self-supervised mutual infomax graph convolution network. Comput Biol Med 2024; 170:108048. [PMID: 38310804 DOI: 10.1016/j.compbiomed.2024.108048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/19/2023] [Accepted: 01/26/2024] [Indexed: 02/06/2024]
Abstract
Illuminating associations between diseases and genes can help reveal the pathogenesis of syndromes and contribute to treatments, but a large number of associations remained unexplored. To identify novel disease-gene associations, many computational methods have been developed using disease and gene-related prior knowledge. However, these methods remain of relatively inferior performance due to the limited external data sources and the inevitable noise among the prior knowledge. In this study, we have developed a new method, Self-Supervised Mutual Infomax Graph Convolution Network (MiGCN), to predict disease-gene associations under the guidance of external disease-disease and gene-gene collaborative graphs. The noises within the collaborative graphs were eliminated by maximizing the mutual information between nodes and neighbors through a graphical mutual infomax layer. In parallel, the node interactions were strengthened by a novel informative message passing layer to improve the learning ability of graph neural network. The extensive experiments showed that our model achieved performance improvement over the state-of-art method by more than 8 % on AUC. The datasets, source codes and trained models of MiGCN are available at https://github.com/biomed-AI/MiGCN.
Collapse
Affiliation(s)
- Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China
| | - Huiying Zhao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.
| |
Collapse
|
11
|
Chang Z, Zhu R, Liu J, Shang J, Dai L. HGSMDA: miRNA-Disease Association Prediction Based on HyperGCN and Sørensen-Dice Loss. Noncoding RNA 2024; 10:9. [PMID: 38392964 PMCID: PMC10893088 DOI: 10.3390/ncrna10010009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/19/2024] [Accepted: 01/24/2024] [Indexed: 02/25/2024] Open
Abstract
Biological research has demonstrated the significance of identifying miRNA-disease associations in the context of disease prevention, diagnosis, and treatment. However, the utilization of experimental approaches involving biological subjects to infer these associations is both costly and inefficient. Consequently, there is a pressing need to devise novel approaches that offer enhanced accuracy and effectiveness. Presently, the predominant methods employed for predicting disease associations rely on Graph Convolutional Network (GCN) techniques. However, the Graph Convolutional Network algorithm, which is locally aggregated, solely incorporates information from the immediate neighboring nodes of a given node at each layer. Consequently, GCN cannot simultaneously aggregate information from multiple nodes. This constraint significantly impacts the predictive efficacy of the model. To tackle this problem, we propose a novel approach, based on HyperGCN and Sørensen-Dice loss (HGSMDA), for predicting associations between miRNAs and diseases. In the initial phase, we developed multiple networks to represent the similarity between miRNAs and diseases and employed GCNs to extract information from diverse perspectives. Subsequently, we draw into HyperGCN to construct a miRNA-disease heteromorphic hypergraph using hypernodes and train GCN on the graph to aggregate information. Finally, we utilized the Sørensen-Dice loss function to evaluate the degree of similarity between the predicted outcomes and the ground truth values, thereby enabling the prediction of associations between miRNAs and diseases. In order to assess the soundness of our methodology, an extensive series of experiments was conducted employing the Human MicroRNA Disease Database (HMDD v3.2) as the dataset. The experimental outcomes unequivocally indicate that HGSMDA exhibits remarkable efficacy when compared to alternative methodologies. Furthermore, the predictive capacity of HGSMDA was corroborated through a case study focused on colon cancer. These findings strongly imply that HGSMDA represents a dependable and valid framework, thereby offering a novel avenue for investigating the intricate association between miRNAs and diseases.
Collapse
Affiliation(s)
| | - Rong Zhu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China; (Z.C.); (J.L.); (J.S.); (L.D.)
| | | | | | | |
Collapse
|
12
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
13
|
Zhang Y, Chu Y, Lin S, Xiong Y, Wei DQ. ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler. Brief Bioinform 2024; 25:bbae103. [PMID: 38517693 PMCID: PMC10959163 DOI: 10.1093/bib/bbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/04/2024] [Accepted: 02/23/2024] [Indexed: 03/24/2024] Open
Abstract
Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
14
|
Dobbs Spendlove M, M. Gibson T, McCain S, Stone BC, Gill T, Pickett BE. Pathway2Targets: an open-source pathway-based approach to repurpose therapeutic drugs and prioritize human targets. PeerJ 2023; 11:e16088. [PMID: 37790614 PMCID: PMC10544355 DOI: 10.7717/peerj.16088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 08/22/2023] [Indexed: 10/05/2023] Open
Abstract
Background Recent efforts to repurpose existing drugs to different indications have been accompanied by a number of computational methods, which incorporate protein-protein interaction networks and signaling pathways, to aid with prioritizing existing targets and/or drugs. However, many of these existing methods are focused on integrating additional data that are only available for a small subset of diseases or conditions. Methods We have designed and implemented a new R-based open-source target prioritization and repurposing method that integrates both canonical intracellular signaling information from five public pathway databases and target information from public sources including OpenTargets.org. The Pathway2Targets algorithm takes a list of significant pathways as input, then retrieves and integrates public data for all targets within those pathways for a given condition. It also incorporates a weighting scheme that is customizable by the user to support a variety of use cases including target prioritization, drug repurposing, and identifying novel targets that are biologically relevant for a different indication. Results As a proof of concept, we applied this algorithm to a public colorectal cancer RNA-sequencing dataset with 144 case and control samples. Our analysis identified 430 targets and ~700 unique drugs based on differential gene expression and signaling pathway enrichment. We found that our highest-ranked predicted targets were significantly enriched in targets with FDA-approved therapeutics for colorectal cancer (p-value < 0.025) that included EGFR, VEGFA, and PTGS2. Interestingly, there was no statistically significant enrichment of targets for other cancers in this same list suggesting high specificity of the results. We also adjusted the weighting scheme to prioritize more novel targets for CRC. This second analysis revealed epidermal growth factor receptor (EGFR), phosphoinositide-3-kinase (PI3K), and two mitogen-activated protein kinases (MAPK14 and MAPK3). These observations suggest that our open-source method with a customizable weighting scheme can accurately prioritize targets that are specific and relevant to the disease or condition of interest, as well as targets that are at earlier stages of development. We anticipate that this method will complement other approaches to repurpose drugs for a variety of indications, which can contribute to the improvement of the quality of life and overall health of such patients.
Collapse
Affiliation(s)
- Mauri Dobbs Spendlove
- Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| | - Trenton M. Gibson
- Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| | - Shaney McCain
- Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| | - Benjamin C. Stone
- Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| | | | - Brett E. Pickett
- Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| |
Collapse
|
15
|
Woicik A, Zhang M, Xu H, Mostafavi S, Wang S. Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling. Bioinformatics 2023; 39:i504-i512. [PMID: 37387142 DOI: 10.1093/bioinformatics/btad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION Gemini can be accessed at: https://github.com/MinxZ/Gemini.
Collapse
Affiliation(s)
- Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Mingxin Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
16
|
Szebényi K, Barrio-Hernandez I, Gibbons GM, Biasetti L, Troakes C, Beltrao P, Lakatos A. A human proteogenomic-cellular framework identifies KIF5A as a modulator of astrocyte process integrity with relevance to ALS. Commun Biol 2023; 6:678. [PMID: 37386082 PMCID: PMC10310856 DOI: 10.1038/s42003-023-05041-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 06/13/2023] [Indexed: 07/01/2023] Open
Abstract
Genome-wide association studies identified several disease-causing mutations in neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS). However, the contribution of genetic variants to pathway disturbances and their cell type-specific variations, especially in glia, is poorly understood. We integrated ALS GWAS-linked gene networks with human astrocyte-specific multi-omics datasets to elucidate pathognomonic signatures. It predicts that KIF5A, a motor protein kinesin-1 heavy-chain isoform, previously detected only in neurons, can also potentiate disease pathways in astrocytes. Using postmortem tissue and super-resolution structured illumination microscopy in cell-based perturbation platforms, we provide evidence that KIF5A is present in astrocyte processes and its deficiency disrupts structural integrity and mitochondrial transport. We show that this may underly cytoskeletal and trafficking changes in SOD1 ALS astrocytes characterised by low KIF5A levels, which can be rescued by c-Jun N-terminal Kinase-1 (JNK1), a kinesin transport regulator. Altogether, our pipeline reveals a mechanism controlling astrocyte process integrity, a pre-requisite for synapse maintenance and suggests a targetable loss-of-function in ALS.
Collapse
Affiliation(s)
- Kornélia Szebényi
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | | | - George M Gibbons
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK
| | - Luca Biasetti
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Claire Troakes
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Pedro Beltrao
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, 8093, Switzerland.
| | - András Lakatos
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK.
- Wellcome Trust-MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, CB2 0AW, UK.
| |
Collapse
|
17
|
Zhao S, Li H, Jing X, Zhang X, Li R, Li Y, Liu C, Chen J, Li G, Zheng W, Li Q, Wang X, Wang L, Sun Y, Xu Y, Wang S. Identifying subgroups of patients with type 2 diabetes based on real-world traditional chinese medicine electronic medical records. Front Pharmacol 2023; 14:1210667. [PMID: 37456755 PMCID: PMC10339739 DOI: 10.3389/fphar.2023.1210667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 06/15/2023] [Indexed: 07/18/2023] Open
Abstract
Introduction: Type 2 diabetes (T2D) is a multifactorial complex chronic disease with a high prevalence worldwide, and Type 2 diabetes patients with different comorbidities often present multiple phenotypes in the clinic. Thus, there is a pressing need to improve understanding of the complexity of the clinical Type 2 diabetes population to help identify more accurate disease subtypes for personalized treatment. Methods: Here, utilizing the traditional Chinese medicine (TCM) clinical electronic medical records (EMRs) of 2137 Type 2 diabetes inpatients, we followed a heterogeneous medical record network (HEMnet) framework to construct heterogeneous medical record networks by integrating the clinical features from the electronic medical records, molecular interaction networks and domain knowledge. Results: Of the 2137 Type 2 diabetes patients, 1347 were male (63.03%), and 790 were female (36.97%). Using the HEMnet method, we obtained eight non-overlapping patient subgroups. For example, in H3, Poria, Astragali Radix, Glycyrrhizae Radix et Rhizoma, Cinnamomi Ramulus, and Liriopes Radix were identified as significant botanical drugs. Cardiovascular diseases (CVDs) were found to be significant comorbidities. Furthermore, enrichment analysis showed that there were six overlapping pathways and eight overlapping Gene Ontology terms among the herbs, comorbidities, and Type 2 diabetes in H3. Discussion: Our results demonstrate that identification of the Type 2 diabetes subgroup based on the HEMnet method can provide important guidance for the clinical use of herbal prescriptions and that this method can be used for other complex diseases.
Collapse
Affiliation(s)
- Shuai Zhao
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Hengfei Li
- Department of Infectious Diseases, Hubei Provincial Hospital of Traditional Chinese Medicine (Affiliated Hospital of Hubei University of Chinese Medicine, Hubei Province Academy of Traditional Chinese Medicine), Wuhan, China
| | - Xuan Jing
- Hebei Provincial Hospital of Traditional Chinese Medicine, Shijiazhuang, China
| | - Xuebin Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Ronghua Li
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yinghao Li
- Institute of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Chenguang Liu
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Jie Chen
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Guoxia Li
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Wenfei Zheng
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Qian Li
- Department of Nursing, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xue Wang
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Letian Wang
- Institute of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yuanyuan Sun
- Department of Obstetrics and Gynecology, Weifang Fangzi District People’s Hospital, Weifang, China
| | - Yunsheng Xu
- Department of Endocrinology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Shihua Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
18
|
García-Cárdenas JM, Armendáriz-Castillo I, García-Cárdenas N, Pesantez-Coronel D, López-Cortés A, Indacochea A, Guerrero S. Data mining identifies novel RNA-binding proteins involved in colon and rectal carcinomas. Front Cell Dev Biol 2023; 11:1088057. [PMID: 37384253 PMCID: PMC10293682 DOI: 10.3389/fcell.2023.1088057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/13/2023] [Indexed: 06/30/2023] Open
Abstract
Colorectal adenocarcinoma (COREAD) is the second most deadly cancer and third most frequently encountered malignancy worldwide. Despite efforts in molecular subtyping and subsequent personalized COREAD treatments, multidisciplinary evidence suggests separating COREAD into colon cancer (COAD) and rectal cancer (READ). This new perspective could improve diagnosis and treatment of both carcinomas. RNA-binding proteins (RBPs), as critical regulators of every hallmark of cancer, could fulfill the need to identify sensitive biomarkers for COAD and READ separately. To detect new RBPs involved in COAD and READ progression, here we used a multidata integration strategy to prioritize tumorigenic RBPs. We analyzed and integrated 1) RBPs genomic and transcriptomic alterations from 488 COAD and 155 READ patients, 2) ∼ 10,000 raw associations between RBPs and cancer genes, 3) ∼ 15,000 immunostainings, and 4) loss-of-function screens performed in 102 COREAD cell lines. Thus, we unraveled new putative roles of NOP56, RBM12, NAT10, FKBP1A, EMG1, and CSE1L in COAD and READ progression. Interestingly, FKBP1A and EMG1 have never been related with any of these carcinomas but presented tumorigenic features in other cancer types. Subsequent survival analyses highlighted the clinical relevance of FKBP1A, NOP56, and NAT10 mRNA expression to predict poor prognosis in COREAD and COAD patients. Further research should be performed to validate their clinical potential and to elucidate their molecular mechanisms underlying these malignancies.
Collapse
Affiliation(s)
- Jennyfer M. García-Cárdenas
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| | - Isaac Armendáriz-Castillo
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
- Facultad de Ingenierías y Ciencias Aplicadas, Universidad Internacional SEK, Quito, Ecuador
| | | | - David Pesantez-Coronel
- Medical Oncology Department Hospital Clinic and Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Andrés López-Cortés
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Alberto Indacochea
- Medical Oncology Department Hospital Clinic and Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Santiago Guerrero
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| |
Collapse
|
19
|
Li B, Altelaar M, van Breukelen B. Identification of Protein Complexes by Integrating Protein Abundance and Interaction Features Using a Deep Learning Strategy. Int J Mol Sci 2023; 24:ijms24097884. [PMID: 37175590 PMCID: PMC10178578 DOI: 10.3390/ijms24097884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/23/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
Many essential cellular functions are carried out by multi-protein complexes that can be characterized by their protein-protein interactions. The interactions between protein subunits are critically dependent on the strengths of their interactions and their cellular abundances, both of which span orders of magnitude. Despite many efforts devoted to the global discovery of protein complexes by integrating large-scale protein abundance and interaction features, there is still room for improvement. Here, we integrated >7000 quantitative proteomic samples with three published affinity purification/co-fractionation mass spectrometry datasets into a deep learning framework to predict protein-protein interactions (PPIs), followed by the identification of protein complexes using a two-stage clustering strategy. Our deep-learning-technique-based classifier significantly outperformed recently published machine learning prediction models and in the process captured 5010 complexes containing over 9000 unique proteins. The vast majority of proteins in our predicted complexes exhibited low or no tissue specificity, which is an indication that the observed complexes tend to be ubiquitously expressed throughout all cell types and tissues. Interestingly, our combined approach increased the model sensitivity for low abundant proteins, which amongst other things allowed us to detect the interaction of MCM10, which connects to the replicative helicase complex via the MCM6 protein. The integration of protein abundances and their interaction features using a deep learning approach provided a comprehensive map of protein-protein interactions and a unique perspective on possible novel protein complexes.
Collapse
Affiliation(s)
- Bohui Li
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| | - Maarten Altelaar
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
- Mass Spectrometry and Proteomics Facility, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Bas van Breukelen
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| |
Collapse
|
20
|
Barrio-Hernandez I, Schwartzentruber J, Shrivastava A, Del-Toro N, Gonzalez A, Zhang Q, Mountjoy E, Suveges D, Ochoa D, Ghoussaini M, Bradley G, Hermjakob H, Orchard S, Dunham I, Anderson CA, Porras P, Beltrao P. Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat Genet 2023; 55:389-398. [PMID: 36823319 PMCID: PMC10011132 DOI: 10.1038/s41588-023-01327-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 01/30/2023] [Indexed: 02/25/2023]
Abstract
Interacting proteins tend to have similar functions, influencing the same organismal traits. Interaction networks can be used to expand the list of candidate trait-associated genes from genome-wide association studies. Here, we performed network-based expansion of trait-associated genes for 1,002 human traits showing that this recovers known disease genes or drug targets. The similarity of network expansion scores identifies groups of traits likely to share an underlying genetic and biological process. We identified 73 pleiotropic gene modules linked to multiple traits, enriched in genes involved in processes such as protein ubiquitination and RNA processing. In contrast to gene deletion studies, pleiotropy as defined here captures specifically multicellular-related processes. We show examples of modules linked to human diseases enriched in genes with known pathogenic variants that can be used to map targets of approved drugs for repurposing. Finally, we illustrate the use of network expansion scores to study genes at inflammatory bowel disease genome-wide association study loci, and implicate inflammatory bowel disease-relevant genes with strong functional and genetic support.
Collapse
Affiliation(s)
- Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Jeremy Schwartzentruber
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Anjali Shrivastava
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Asier Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Qian Zhang
- Wellcome Sanger Institute, Cambridge, UK
| | - Edward Mountjoy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Daniel Suveges
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Maya Ghoussaini
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Glyn Bradley
- Computational Biology, Genomic Sciences, GSK, Stevenage, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Carl A Anderson
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Open Targets, Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
21
|
Wang Z, Gu Y, Zheng S, Yang L, Li J. MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction. Comput Biol Med 2023; 155:106642. [PMID: 36805231 DOI: 10.1016/j.compbiomed.2023.106642] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/15/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The identification of gene-disease associations plays an important role in the exploration of pathogenic mechanisms and therapeutic targets. Computational methods have been regarded as an effective way to discover the potential gene-disease associations in recent years. However, most of them ignored the combination of abundant genetic, therapeutic information, and gene-disease network topology. To this end, we re-organized the current gene-disease association benchmark dataset by extracting the newest gene-disease associations from the OMIM database. Then, we developed a multi-graph representation learning-based ensemble model, named MGREL to predict gene-disease associations. MGREL integrated two feature generation channels to extract gene and disease features, including a knowledge extraction channel which learned high-order representations from genetic and therapeutic information, and a graph learning channel which acquired network topological representations through multiple advanced graph representation learning methods. Then, an ensemble learning method with 5 machine learning models was used as the classifier to predict the gene-disease association. Comprehensive experiments have demonstrated the significant performance achieved by MGREL compared to 5 state-of-the-art methods. For the major measurements (AUC = 0.925, AUPR = 0.935), the relative improvements of MGREL compared to the suboptimal methods are 3.24%, and 2.75%, respectively. MGREL also achieved impressive improvements in the challenging tasks of predicting potential associations for unknown genes/diseases. In addition, case studies implied potential applications for MGREL in the discovery of potential therapeutic targets.
Collapse
Affiliation(s)
- Ziyang Wang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Yaowen Gu
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Lin Yang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Jiao Li
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China.
| |
Collapse
|
22
|
Mapping the common gene networks that underlie related diseases. Nat Protoc 2023:10.1038/s41596-022-00797-1. [PMID: 36653526 DOI: 10.1038/s41596-022-00797-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 11/21/2022] [Indexed: 01/19/2023]
Abstract
A longstanding goal of biomedicine is to understand how alterations in molecular and cellular networks give rise to the spectrum of human diseases. For diseases with shared etiology, understanding the common causes allows for improved diagnosis of each disease, development of new therapies and more comprehensive identification of disease genes. Accordingly, this protocol describes how to evaluate the extent to which two diseases, each characterized by a set of mapped genes, are colocalized in a reference gene interaction network. This procedure uses network propagation to measure the network 'distance' between gene sets. For colocalized diseases, the network can be further analyzed to extract common gene communities at progressive granularities. In particular, we show how to: (1) obtain input gene sets and a reference gene interaction network; (2) identify common subnetworks of genes that encompass or are in close proximity to all gene sets; (3) use multiscale community detection to identify systems and pathways represented by each common subnetwork to generate a network colocalized systems map; (4) validate identified genes and systems using a mouse variant database; and (5) visualize and further investigate select genes, interactions and systems for relevance to phenotype(s) of interest. We demonstrate the utility of this approach by identifying shared biological mechanisms underlying autism and congenital heart disease. However, this protocol is general and can be applied to any gene sets attributed to diseases or other phenotypes with suspected joint association. A typical NetColoc run takes less than an hour. Software and documentation are available at https://github.com/ucsd-ccbb/NetColoc .
Collapse
|
23
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
24
|
Liu Y, Han J, Kong T, Xiao N, Mei Q, Liu J. DriverMP enables improved identification of cancer driver genes. Gigascience 2022; 12:giad106. [PMID: 38091511 PMCID: PMC10716827 DOI: 10.1093/gigascience/giad106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/30/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. FINDINGS We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. CONCLUSIONS The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.
Collapse
Affiliation(s)
- Yangyang Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Tongxin Kong
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Nannan Xiao
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Qinglin Mei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
25
|
Li P, Tiwari P, Xu J, Qian Y, Ai C, Ding Y, Guo F. Sparse regularized joint projection model for identifying associations of non-coding RNAs and human diseases. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
26
|
Barrio-Hernandez I, Beltrao P. Network analysis of genome-wide association studies for drug target prioritisation. Curr Opin Chem Biol 2022; 71:102206. [PMID: 36087372 DOI: 10.1016/j.cbpa.2022.102206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 07/29/2022] [Accepted: 08/05/2022] [Indexed: 01/27/2023]
Abstract
Over the past decades, genome-wide association studies (GWAS) have led to a dramatic expansion of genetic variants implicated with human traits and diseases. These advances are expected to result in new drug targets but the identification of causal genes and the cell biology underlying human diseases from GWAS remains challenging. Here, we review protein interaction network-based methods to analyse GWAS data. These approaches can rank candidate drug targets at GWAS-associated loci or among interactors of disease genes without direct genetic support. These methods identify the cell biology affected in common across diseases, offering opportunities for drug repurposing, as well as be combined with expression data to identify focal tissues and cell types. Going forward, we expect that these methods will further improve from advances in the characterisation of context specific interaction networks and the joint analysis of rare and common genetic signals.
Collapse
Affiliation(s)
- Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, CB10 1SD, UK; Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, CB10 1SD, UK; Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK; Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, 8093, Switzerland.
| |
Collapse
|
27
|
Tan J, Li X, Zhang L, Du Z. Recent advances in machine learning methods for predicting LncRNA and disease associations. Front Cell Infect Microbiol 2022; 12:1071972. [PMID: 36530425 PMCID: PMC9748103 DOI: 10.3389/fcimb.2022.1071972] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/11/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in almost the entire cell life cycle through different mechanisms and play an important role in many key biological processes. Mutations and dysregulation of lncRNAs have been implicated in many complex human diseases. Therefore, identifying the relationship between lncRNAs and diseases not only contributes to biologists' understanding of disease mechanisms, but also provides new ideas and solutions for disease diagnosis, treatment, prognosis and prevention. Since the existing experimental methods for predicting lncRNA-disease associations (LDAs) are expensive and time consuming, machine learning methods for predicting lncRNA-disease associations have become increasingly popular among researchers. In this review, we summarize some of the human diseases studied by LDAs prediction models, association and similarity features of LDAs prediction, performance evaluation methods of models and some advanced machine learning prediction models of LDAs. Finally, we discuss the potential limitations of machine learning-based methods for LDAs prediction and provide some ideas for designing new prediction models.
Collapse
|
28
|
Bi Y, Wang P. Exploring drought-responsive crucial genes in Sorghum. iScience 2022; 25:105347. [PMID: 36325072 PMCID: PMC9619295 DOI: 10.1016/j.isci.2022.105347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/18/2022] [Accepted: 10/11/2022] [Indexed: 12/11/2022] Open
Abstract
Drought severely affects global food production. Sorghum is a typical drought-resistant model crop. Based on RNA-seq data for Sorghum with multiple time points and the gray correlation coefficient, this paper firstly selects candidate genes via mean variance test and constructs weighted gene differential co-expression networks (WGDCNs); then, based on guilt-by-rewiring principle, the WGDCNs and the hidden Markov random field model, drought-responsive crucial genes are identified for five developmental stages respectively. Enrichment and sequence alignment analysis reveal that the screened genes may play critical functional roles in drought responsiveness. A multilayer differential co-expression network for the screened genes reveals that Sorghum is very sensitive to pre-flowering drought. Furthermore, a crucial gene regulatory module is established, which regulates drought responsiveness via plant hormone signal transduction, MAPK cascades, and transcriptional regulations. The proposed method can well excavate crucial genes through RNA-seq data, which have implications in breeding of new varieties with improved drought tolerance. We design a method that unites gene rewiring network and Markov random field model Drought-responsive genes for five developmental stages of Sorghum are explored A multilayer network reveals that Sorghum is very sensitive to pre-flowering drought A drought-responsive crucial gene regulatory module is established for Sorghum
Collapse
|
29
|
COVID-GWAB: A Web-Based Prediction of COVID-19 Host Genes via Network Boosting of Genome-Wide Association Data. Biomolecules 2022; 12:biom12101446. [PMID: 36291657 PMCID: PMC9599684 DOI: 10.3390/biom12101446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/01/2022] [Accepted: 10/02/2022] [Indexed: 11/17/2022] Open
Abstract
Host genetics affect both the susceptibility and response to viral infection. Searching for host genes that contribute to COVID-19, the Host Genetics Initiative (HGI) was formed to investigate the genetic factors involved in COVID-19 via genome-wide association studies (GWAS). The GWAS suffer from limited statistical power and in general, only a few genes can pass the conventional significance thresholds. This statistical limitation may be overcome by boosting weak association signals through integrating independent functional information such as molecular interactions. Additionally, the boosted results can be evaluated by various independent data for further connections to COVID-19. We present COVID-GWAB, a web-based tool to boost original GWAS signals from COVID-19 patients by taking the signals of the interactome neighbors. COVID-GWAB takes summary statistics from the COVID-19 HGI or user input data and reprioritizes candidate host genes for COVID-19 using HumanNet, a co-functional human gene network. The current version of COVID-GWAB provides the pre-processed data of releases 5, 6, and 7 of the HGI. Additionally, COVID-GWAB provides web interfaces for a summary of augmented GWAS signals, prediction evaluations by appearance frequency in COVID-19 literature, single-cell transcriptome data, and associated pathways. The web server also enables browsing the candidate gene networks.
Collapse
|
30
|
Xie X, Wang Y, Sheng N, Zhang S, Cao Y, Fu Y. Predicting miRNA-disease associations based on multi-view information fusion. Front Genet 2022; 13:979815. [PMID: 36238163 PMCID: PMC9552014 DOI: 10.3389/fgene.2022.979815] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
MicroRNAs (miRNAs) play an important role in various biological processes and their abnormal expression could lead to the occurrence of diseases. Exploring the potential relationships between miRNAs and diseases can contribute to the diagnosis and treatment of complex diseases. The increasing databases storing miRNA and disease information provide opportunities to develop computational methods for discovering unobserved disease-related miRNAs, but there are still some challenges in how to effectively learn and fuse information from multi-source data. In this study, we propose a multi-view information fusion based method for miRNA-disease association (MDA)prediction, named MVIFMDA. Firstly, multiple heterogeneous networks are constructed by combining the known MDAs and different similarities of miRNAs and diseases based on multi-source information. Secondly, the topology features of miRNAs and diseases are obtained by using the graph convolutional network to each heterogeneous network view, respectively. Moreover, we design the attention strategy at the topology representation level to adaptively fuse representations including different structural information. Meanwhile, we learn the attribute representations of miRNAs and diseases from their similarity attribute views with convolutional neural networks, respectively. Finally, the complicated associations between miRNAs and diseases are reconstructed by applying a bilinear decoder to the combined features, which combine topology and attribute representations. Experimental results on the public dataset demonstrate that our proposed model consistently outperforms baseline methods. The case studies further show the ability of the MVIFMDA model for inferring underlying associations between miRNAs and diseases.
Collapse
Affiliation(s)
- Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
- *Correspondence: Yan Wang,
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shuangquan Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| |
Collapse
|
31
|
Zhai S, Li X, Wu Y, Shi X, Ji B, Qiu C. Identifying potential microRNA biomarkers for colon cancer and colorectal cancer through bound nuclear norm regularization. Front Genet 2022; 13:980437. [PMID: 36313468 PMCID: PMC9614659 DOI: 10.3389/fgene.2022.980437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Colon cancer and colorectal cancer are two common cancer-related deaths worldwide. Identification of potential biomarkers for the two cancers can help us to evaluate their initiation, progression and therapeutic response. In this study, we propose a new microRNA-disease association identification method, BNNRMDA, to discover potential microRNA biomarkers for the two cancers. BNNRMDA better combines disease semantic similarity and Gaussian Association Profile Kernel (GAPK) similarity, microRNA function similarity and GAPK similarity, and the bound nuclear norm regularization model. Compared to other five classical microRNA-disease association identification methods (MIDPE, MIDP, RLSMDA, GRNMF, AND LPLNS), BNNRMDA obtains the highest AUC of 0.9071, demonstrating its strong microRNA-disease association identification performance. BNNRMDA is applied to discover possible microRNA biomarkers for colon cancer and colorectal cancer. The results show that all 73 known microRNAs associated with colon cancer in the HMDD database have the highest association scores with colon cancer and are ranked as top 73. Among 137 known microRNAs associated with colorectal cancer in the HMDD database, 129 microRNAs have the highest association scores with colorectal cancer and are ranked as top 129. In addition, we predict that hsa-miR-103a could be a potential biomarker of colon cancer and hsa-mir-193b and hsa-mir-7days could be potential biomarkers of colorectal cancer.
Collapse
Affiliation(s)
- Shengyong Zhai
- Department of General Surgery, Weifang People’s Hospital, Shandong, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China,Heilongjiang Second Cancer Hospital, Harbin, China
| | - Yan Wu
- Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoli Shi
- Geneis Beijing Co., Ltd., Beijing, China
| | - Binbin Ji
- Geneis Beijing Co., Ltd., Beijing, China
| | - Chun Qiu
- Department of Oncology, Hainan General Hospital, Haikou, China,*Correspondence: Chun Qiu,
| |
Collapse
|
32
|
Loers JU, Vermeirssen V. SUBATOMIC: a SUbgraph BAsed mulTi-OMIcs clustering framework to analyze integrated multi-edge networks. BMC Bioinformatics 2022; 23:363. [PMID: 36064320 PMCID: PMC9442970 DOI: 10.1186/s12859-022-04908-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/24/2022] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Representing the complex interplay between different types of biomolecules across different omics layers in multi-omics networks bears great potential to gain a deep mechanistic understanding of gene regulation and disease. However, multi-omics networks easily grow into giant hairball structures that hamper biological interpretation. Module detection methods can decompose these networks into smaller interpretable modules. However, these methods are not adapted to deal with multi-omics data nor consider topological features. When deriving very large modules or ignoring the broader network context, interpretability remains limited. To address these issues, we developed a SUbgraph BAsed mulTi-OMIcs Clustering framework (SUBATOMIC), which infers small and interpretable modules with a specific topology while keeping track of connections to other modules and regulators. RESULTS SUBATOMIC groups specific molecular interactions in composite network subgraphs of two and three nodes and clusters them into topological modules. These are functionally annotated, visualized and overlaid with expression profiles to go from static to dynamic modules. To preserve the larger network context, SUBATOMIC investigates statistically the connections in between modules as well as between modules and regulators such as miRNAs and transcription factors. We applied SUBATOMIC to analyze a composite Homo sapiens network containing transcription factor-target gene, miRNA-target gene, protein-protein, homologous and co-functional interactions from different databases. We derived and annotated 5586 modules with diverse topological, functional and regulatory properties. We created novel functional hypotheses for unannotated genes. Furthermore, we integrated modules with condition specific expression data to study the influence of hypoxia in three cancer cell lines. We developed two prioritization strategies to identify the most relevant modules in specific biological contexts: one considering GO term enrichments and one calculating an activity score reflecting the degree of differential expression. Both strategies yielded modules specifically reacting to low oxygen levels. CONCLUSIONS We developed the SUBATOMIC framework that generates interpretable modules from integrated multi-omics networks and applied it to hypoxia in cancer. SUBATOMIC can infer and contextualize modules, explore condition or disease specific modules, identify regulators and functionally related modules, and derive novel gene functions for uncharacterized genes. The software is available at https://github.com/CBIGR/SUBATOMIC .
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium. .,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium. .,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
33
|
Huang Z, Wang Y, Ma X. Clustering of Cancer Attributed Networks by Dynamically and Jointly Factorizing Multi-Layer Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2737-2748. [PMID: 34143738 DOI: 10.1109/tcbb.2021.3090586] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The accumulated omic data provides an opportunity to exploit the mechanisms of cancers and poses a challenge for their integrative analysis. Although extensive efforts have been devoted to address this issue, the current algorithms result in undesirable performance because of the complexity of patterns and heterogeneity of data. In this study, the ultimate goal is to propose an effective and efficient algorithm (called NMF-DEC) to identify clusters by integrating the interactome and transcriptome data. By treating the expression profiles of genes as attributes of vertices in the gene interaction networks, we transform the integrative analysis of omic data into clustering of attributed networks. To circumvent the heterogeneity, we construct a similarity network for the attributes of genes and cast it into the common module detection problem in multi-layer networks. The NMF-DEC explores the relation between attributes and topological structure of networks by jointly factorizing the similarity and interaction networks with the same basis. In this optimization, the interaction network is dynamically updated and the information of attributes is dynamically incorporated, providing a better strategy to characterize the structure of modules in attributed networks. Extensive experiments indicate that compared with state-of-the-art baselines, NMF-DEC is more accurate on social network, and show better performance on cancer attributed networks, implying the superiority of the proposed methods for the integrative analysis of omic data.
Collapse
|
34
|
Dutta T, Mitra S, Saha A, Ganguly K, Pyne T, Sengupta M. A comprehensive meta-analysis and prioritization study to identify vitiligo associated coding and non-coding SNV candidates using web-based bioinformatics tools. Sci Rep 2022; 12:14543. [PMID: 36008553 PMCID: PMC9411560 DOI: 10.1038/s41598-022-18766-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 08/18/2022] [Indexed: 11/19/2022] Open
Abstract
Vitiligo is a prevalent depigmentation disorder affecting around 1% of the general population. So far, various Genome Wide Association Studies (GWAS) and Candidate Gene Association Studies (CGAS) have identified several single nucleotide variants (SNVs) as a risk factor for vitiligo. Nonetheless, little has been discerned regarding their direct functional significance to the disease pathogenesis. In this study, we did extensive data mining and downstream analysis using several experimentally validated datasets like GTEx Portal and web tools like rSNPBase, RegulomeDB, HaploReg and STRING to prioritize 13 SNVs from a set of 291SNVs that have been previously reported to be associated with vitiligo. We also prioritized their underlying/target genes and tried annotating their functional contribution to vitiligo pathogenesis. Our analysis revealed genes like FGFR10P, SUOX, CDK5RAP1 and RERE that have never been implicated in vitiligo previously to have strong potentials to contribute to the disease pathogenesis. The study is the first of its kind to prioritize and functionally annotate vitiligo-associated GWAS and CGAS SNVs and their underlying/target genes, based on functional data available in the public domain database.
Collapse
Affiliation(s)
- Tithi Dutta
- Department of Genetics, University of Calcutta, 35 Ballygunge Circular Road, Kolkata, 700019, India
| | - Sayantan Mitra
- Department of Genetics, CVM University, Aribas, Aribas Campus, New Vallabh Vidyanagar, Anand, Gujarat, 388121, India
| | - Arpan Saha
- Department of Genetics, University of Calcutta, 35 Ballygunge Circular Road, Kolkata, 700019, India
| | - Kausik Ganguly
- Department of Genetics, University of Calcutta, 35 Ballygunge Circular Road, Kolkata, 700019, India
| | - Tushar Pyne
- Department of Genetics, University of Calcutta, 35 Ballygunge Circular Road, Kolkata, 700019, India
| | - Mainak Sengupta
- Department of Genetics, University of Calcutta, 35 Ballygunge Circular Road, Kolkata, 700019, India.
| |
Collapse
|
35
|
Li W, Zhang H, Li M, Han M, Yin Y. MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN. Brief Bioinform 2022; 23:6659744. [PMID: 35947989 DOI: 10.1093/bib/bbac333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/02/2022] [Accepted: 07/21/2022] [Indexed: 11/14/2022] Open
Abstract
In recent years, a number of computational approaches have been proposed to effectively integrate multiple heterogeneous biological networks, and have shown impressive performance for inferring gene function. However, the previous methods do not fully represent the critical neighborhood relationship between genes during the feature learning process. Furthermore, it is difficult to accurately estimate the contributions of different views for multi-view integration. In this paper, we propose MGEGFP, a multi-view graph embedding method based on adaptive estimation with Graph Convolutional Network (GCN), to learn high-quality gene representations among multiple interaction networks for function prediction. First, we design a dual-channel GCN encoder to disentangle the view-specific information and the consensus pattern across diverse networks. By the aid of disentangled representations, we develop a multi-gate module to adaptively estimate the contributions of different views during each reconstruction process and make full use of the multiplexity advantages, where a diversity preservation constraint is designed to prevent the over-fitting problem. To validate the effectiveness of our model, we conduct experiments on networks from the STRING database for both yeast and human datasets, and compare the performance with seven state-of-the-art methods in five evaluation metrics. Moreover, the ablation study manifests the important contribution of the designed dual-channel encoder, multi-gate module and the diversity preservation constraint in MGEGFP. The experimental results confirm the superiority of our proposed method and suggest that MGEGFP can be a useful tool for gene function prediction.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Minghe Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Mingjing Han
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588, Nebraska, USA
| |
Collapse
|
36
|
Wang C, Shi J, Cai J, Zhang Y, Zheng X, Zhang N. DriverRWH: discovering cancer driver genes by random walk on a gene mutation hypergraph. BMC Bioinformatics 2022; 23:277. [PMID: 35831792 PMCID: PMC9281118 DOI: 10.1186/s12859-022-04788-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 06/08/2022] [Indexed: 12/24/2022] Open
Abstract
Background Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurrence mutation information of the individuals, which are deemed to be important in tumorigenesis and tumor progression, resulting in high rate of false positive. Results To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas, DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. Besides, DriverRWH discovers several potential drivers, which are enriched in cancer-related pathways. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data. Conclusion DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04788-7.
Collapse
Affiliation(s)
- Chenye Wang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Junhan Shi
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Jiansheng Cai
- Department of Mathematics, Weifang University, Weifang, 261061, Shandong, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| |
Collapse
|
37
|
Ai C, Yang H, Ding Y, Tang J, Guo F. A multi-layer multi-kernel neural network for determining associations between non-coding RNAs and diseases. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
38
|
Vo DHT, McGleave G, Overton IM. Immune Cell Networks Uncover Candidate Biomarkers of Melanoma Immunotherapy Response. J Pers Med 2022; 12:jpm12060958. [PMID: 35743743 PMCID: PMC9225330 DOI: 10.3390/jpm12060958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/05/2022] [Accepted: 06/08/2022] [Indexed: 11/30/2022] Open
Abstract
The therapeutic activation of antitumour immunity by immune checkpoint inhibitors (ICIs) is a significant advance in cancer medicine, not least due to the prospect of long-term remission. However, many patients are unresponsive to ICI therapy and may experience serious side effects; companion biomarkers are urgently needed to help inform ICI prescribing decisions. We present the IMMUNETS networks of gene coregulation in five key immune cell types and their application to interrogate control of nivolumab response in advanced melanoma cohorts. The results evidence a role for each of the IMMUNETS cell types in ICI response and in driving tumour clearance with independent cohorts from TCGA. As expected, ‘immune hot’ status, including T cell proliferation, correlates with response to first-line ICI therapy. Genes regulated in NK, dendritic, and B cells are the most prominent discriminators of nivolumab response in patients that had previously progressed on another ICI. Multivariate analysis controlling for tumour stage and age highlights CIITA and IKZF3 as candidate prognostic biomarkers. IMMUNETS provide a resource for network biology, enabling context-specific analysis of immune components in orthogonal datasets. Overall, our results illuminate the relationship between the tumour microenvironment and clinical trajectories, with potential implications for precision medicine.
Collapse
Affiliation(s)
- Duong H. T. Vo
- The Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK; (D.H.T.V.); (G.M.)
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK
| | - Gerard McGleave
- The Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK; (D.H.T.V.); (G.M.)
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK
| | - Ian M. Overton
- The Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK; (D.H.T.V.); (G.M.)
- Health Data Research Wales and Northern Ireland, Queen’s University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK
- Correspondence:
| |
Collapse
|
39
|
Network assisted analysis of de novo variants using protein-protein interaction information identified 46 candidate genes for congenital heart disease. PLoS Genet 2022; 18:e1010252. [PMID: 35671298 PMCID: PMC9205499 DOI: 10.1371/journal.pgen.1010252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 06/17/2022] [Accepted: 05/12/2022] [Indexed: 11/19/2022] Open
Abstract
De novo variants (DNVs) with deleterious effects have proved informative in identifying risk genes for early-onset diseases such as congenital heart disease (CHD). A number of statistical methods have been proposed for family-based studies or case/control studies to identify risk genes by screening genes with more DNVs than expected by chance in Whole Exome Sequencing (WES) studies. However, the statistical power is still limited for cohorts with thousands of subjects. Under the hypothesis that connected genes in protein-protein interaction (PPI) networks are more likely to share similar disease association status, we developed a Markov Random Field model that can leverage information from publicly available PPI databases to increase power in identifying risk genes. We identified 46 candidate genes with at least 1 DNV in the CHD study cohort, including 18 known human CHD genes and 35 highly expressed genes in mouse developing heart. Our results may shed new insight on the shared protein functionality among risk genes for CHD. The topologic information in a pathway may be informative to identify functionally interrelated genes and help improve statistical power in DNV studies. Under the hypothesis that connected genes in PPI networks are more likely to share similar disease association status, we developed a novel statistical model that can leverage information from publicly available PPI databases. Through simulation studies under multiple settings, we proved our method can increase statistical power in identifying additional risk genes compared to methods without using the PPI network information. We then applied our method to a real example for CHD DNV data, and then visualized the subnetwork of candidate genes to find potential functional gene clusters for CHD.
Collapse
|
40
|
Zhong J, Zhou W, Kang J, Fang Z, Xie M, Xiao Q, Peng W. DNRLCNN: A CNN Framework for Identifying MiRNA-Disease Associations Using Latent Feature Matrix Extraction with Positive Samples. Interdiscip Sci 2022; 14:607-622. [PMID: 35428965 DOI: 10.1007/s12539-022-00509-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/24/2022] [Accepted: 03/01/2022] [Indexed: 06/14/2023]
Abstract
Emerging evidence indicates that miRNAs have strong relationships with many human diseases. Investigating the associations will contribute to elucidating the activities of miRNAs and pathogenesis mechanisms, and providing new opportunities for disease diagnosis and drug discovery. Therefore, it is of significance to identify potential associations between miRNAs and diseases. The existing databases about the miRNA-disease associations (MDAs) only provide the known MDAs, which can be regarded as positive samples. However, the unknown MDAs are not sufficient to regard as reliable negative samples. To deal with this uncertainty, we proposed a convolutional neural network (CNN) framework, named DNRLCNN, based on a latent feature matrix extracted by only positive samples to predict MDAs. First, by only considering the positive samples into the calculation process, we captured the latent feature matrix for complex interactions between miRNAs and diseases in low-dimensional space. Then, we constructed a feature vector for each miRNA and disease pair based on the feature representation. Finally, we adopted a modified CNN for the feature vector to predict MDAs. As a result, our model achieves better performance than other state-of-the-art methods which based CNN in fivefold cross-validation on both miRNA-disease association prediction task (average AUC of 0.9030) and miRNA-phenotype association prediction task (average AUC of 0. 9442). In addition, we carried out case studies on two human diseases, and all the top-50 predicted miRNAs for lung neoplasms are confirmed by HMDD v3.2 and dbDEMC 2.0 databases, 98% of the top-50 predicted miRNAs for heart failure are confirmed. The experiment results show that our model has the capability of inferring potential disease-related miRNAs.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Wubin Zhou
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Jiedong Kang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Zhuo Fang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China.
| |
Collapse
|
41
|
Sonawane AR, Aikawa E, Aikawa M. Connections for Matters of the Heart: Network Medicine in Cardiovascular Diseases. Front Cardiovasc Med 2022; 9:873582. [PMID: 35665246 PMCID: PMC9160390 DOI: 10.3389/fcvm.2022.873582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 04/19/2022] [Indexed: 01/18/2023] Open
Abstract
Cardiovascular diseases (CVD) are diverse disorders affecting the heart and vasculature in millions of people worldwide. Like other fields, CVD research has benefitted from the deluge of multiomics biomedical data. Current CVD research focuses on disease etiologies and mechanisms, identifying disease biomarkers, developing appropriate therapies and drugs, and stratifying patients into correct disease endotypes. Systems biology offers an alternative to traditional reductionist approaches and provides impetus for a comprehensive outlook toward diseases. As a focus area, network medicine specifically aids the translational aspect of in silico research. This review discusses the approach of network medicine and its application to CVD research.
Collapse
Affiliation(s)
- Abhijeet Rajendra Sonawane
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Elena Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Masanori Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
42
|
Gupta C, Chandrashekar P, Jin T, He C, Khullar S, Chang Q, Wang D. Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases. J Neurodev Disord 2022; 14:28. [PMID: 35501679 PMCID: PMC9059371 DOI: 10.1186/s11689-022-09438-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 04/07/2022] [Indexed: 12/31/2022] Open
Abstract
Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the "big data" revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.
Collapse
Affiliation(s)
- Chirag Gupta
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Pramod Chandrashekar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Ting Jin
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Chenfeng He
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Qiang Chang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Medical Genetics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Neurology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|
43
|
Wang Y, Juan L, Peng J, Wang T, Zang T, Wang Y. Explore potential disease related metabolites based on latent factor model. BMC Genomics 2022; 23:269. [PMID: 35387615 PMCID: PMC8985251 DOI: 10.1186/s12864-022-08504-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/17/2022] Open
Abstract
Background In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. Results In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.
Collapse
Affiliation(s)
- Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China. .,Key Laboratory of Big Data Storage and Management Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, China.
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Key Laboratory of Big Data Storage and Management Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Key Laboratory of Big Data Storage and Management Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, China
| | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
44
|
Erdogan F, Radu TB, Orlova A, Qadree AK, de Araujo ED, Israelian J, Valent P, Mustjoki SM, Herling M, Moriggl R, Gunning PT. JAK-STAT core cancer pathway: An integrative cancer interactome analysis. J Cell Mol Med 2022; 26:2049-2062. [PMID: 35229974 PMCID: PMC8980946 DOI: 10.1111/jcmm.17228] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/14/2021] [Accepted: 12/22/2021] [Indexed: 12/25/2022] Open
Abstract
Through a comprehensive review and in silico analysis of reported data on STAT-linked diseases, we analysed the communication pathways and interactome of the seven STATs in major cancer categories and proposed rational targeting approaches for therapeutic intervention to disrupt critical pathways and addictions to hyperactive JAK/STAT in neoplastic states. Although all STATs follow a similar molecular activation pathway, STAT1, STAT2, STAT4 and STAT6 exert specific biological profiles associated with a more restricted pattern of activation by cytokines. STAT3 and STAT5A as well as STAT5B have pleiotropic roles in the body and can act as critical oncogenes that promote many processes involved in cancer development. STAT1, STAT3 and STAT5 also possess tumour suppressive action in certain mutational and cancer type context. Here, we demonstrated member-specific STAT activity in major cancer types. Through systems biology approaches, we found surprising roles for EGFR family members, sex steroid hormone receptor ESR1 interplay with oncogenic STAT function and proposed new drug targeting approaches of oncogenic STAT pathway addiction.
Collapse
Affiliation(s)
- Fettah Erdogan
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Tudor Bogdan Radu
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Anna Orlova
- Institute of Animal Breeding and GeneticsUniversity of Veterinary MedicineViennaAustria
| | - Abdul Khawazak Qadree
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Elvin Dominic de Araujo
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
| | - Johan Israelian
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Peter Valent
- Division of Hematology and HemostaseologyDepartment of Internal Medicine IMedical University of ViennaViennaAustria
- Ludwig Boltzmann Institute for Hematology and OncologyMedical University of ViennaViennaAustria
| | - Satu M. Mustjoki
- Translational Immunology Research Program and Department of Clinical Chemistry and HematologyUniversity of HelsinkiHelsinkiFinland
- Hematology Research UnitHelsinki University Hospital Comprehensive Cancer CenterHelsinkiFinland
- iCAN Digital Precision Cancer Medicine FlagshipHelsinkiFinland
| | - Marco Herling
- Department of Hematology, Cellular Therapy, and HemostaseologyUniversity of LeipzigLeipzigGermany
| | - Richard Moriggl
- Institute of Animal Breeding and GeneticsUniversity of Veterinary MedicineViennaAustria
| | - Patrick Thomas Gunning
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| |
Collapse
|
45
|
García-Cárdenas JM, Armendáriz-Castillo I, Pérez-Villa A, Indacochea A, Jácome-Alvarado A, López-Cortés A, Guerrero S. Integrated In Silico Analyses Identify PUF60 and SF3A3 as New Spliceosome-Related Breast Cancer RNA-Binding Proteins. BIOLOGY 2022; 11:biology11040481. [PMID: 35453681 PMCID: PMC9030152 DOI: 10.3390/biology11040481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/12/2022] [Accepted: 03/18/2022] [Indexed: 12/24/2022]
Abstract
More women are diagnosed with breast cancer (BC) than any other type of cancer. Although large-scale efforts have completely redefined cancer, a cure remains unattainable. In that respect, new molecular functions of the cell should be investigated, such as post-transcriptional regulation. RNA-binding proteins (RBPs) are emerging as critical post-transcriptional modulators of tumorigenesis, but only a few have clear roles in BC. To recognize new putative breast cancer RNA-binding proteins, we performed integrated in silico analyses of all human RBPs (n = 1392) in three major cancer databases and identified five putative BC RBPs (PUF60, TFRC, KPNB1, NSF, and SF3A3), which showed robust oncogenic features related to their genomic alterations, immunohistochemical changes, high interconnectivity with cancer driver genes (CDGs), and tumor vulnerabilities. Interestingly, some of these RBPs have never been studied in BC, but their oncogenic functions have been described in other cancer types. Subsequent analyses revealed PUF60 and SF3A3 as central elements of a spliceosome-related cluster involving RBPs and CDGs. Further research should focus on the mechanisms by which these proteins could promote breast tumorigenesis, with the potential to reveal new therapeutic pathways along with novel drug-development strategies.
Collapse
Affiliation(s)
- Jennyfer M. García-Cárdenas
- Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito 170113, Ecuador; (J.M.G.-C.); (A.J.-A.)
- Facultade de Ciencias, Universidade da Coruña, 15071 A Coruna, Spain
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28001 Madrid, Spain; (I.A.-C.); (A.P.-V.)
| | - Isaac Armendáriz-Castillo
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28001 Madrid, Spain; (I.A.-C.); (A.P.-V.)
- Instituto Nacional de Investigación en Salud Pública, Quito 170136, Ecuador
- Facultad de Ingenierías y Ciencias Aplicadas, Universidad Internacional SEK, Quito 170302, Ecuador
| | - Andy Pérez-Villa
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28001 Madrid, Spain; (I.A.-C.); (A.P.-V.)
| | - Alberto Indacochea
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, 08003 Barcelona, Spain;
| | - Andrea Jácome-Alvarado
- Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito 170113, Ecuador; (J.M.G.-C.); (A.J.-A.)
| | - Andrés López-Cortés
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28001 Madrid, Spain; (I.A.-C.); (A.P.-V.)
- Programa de Investigación en Salud Global, Facultad de Ciencias de la Salud, Universidad Internacional SEK, Quito 170302, Ecuador
- Facultad de Medicina, Universidad de Las Américas, Quito 170124, Ecuador
- Correspondence: (A.L.-C.); (S.G.)
| | - Santiago Guerrero
- Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito 170113, Ecuador; (J.M.G.-C.); (A.J.-A.)
- Facultade de Ciencias, Universidade da Coruña, 15071 A Coruna, Spain
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28001 Madrid, Spain; (I.A.-C.); (A.P.-V.)
- Correspondence: (A.L.-C.); (S.G.)
| |
Collapse
|
46
|
Shi X, Teng H, Shi L, Bi W, Wei W, Mao F, Sun Z. Comprehensive evaluation of computational methods for predicting cancer driver genes. Brief Bioinform 2022; 23:bbab548. [PMID: 35037014 PMCID: PMC8921613 DOI: 10.1093/bib/bbab548] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 11/19/2021] [Accepted: 11/29/2021] [Indexed: 12/17/2022] Open
Abstract
Optimal methods could effectively improve the accuracy of predicting and identifying candidate driver genes. Various computational methods based on mutational frequency, network and function approaches have been developed to identify mutation driver genes in cancer genomes. However, a comprehensive evaluation of the performance levels of network-, function- and frequency-based methods is lacking. In the present study, we assessed and compared eight performance criteria for eight network-based, one function-based and three frequency-based algorithms using eight benchmark datasets. Under different conditions, the performance of approaches varied in terms of network, measurement and sample size. The frequency-based driverMAPS and network-based HotNet2 methods showed the best overall performance. Network-based algorithms using protein-protein interaction networks outperformed the function- and the frequency-based approaches. Precision, F1 score and Matthews correlation coefficient were low for most approaches. Thus, most of these algorithms require stringent cutoffs to correctly distinguish driver and non-driver genes. We constructed a website named Cancer Driver Catalog (http://159.226.67.237/sun/cancer_driver/), wherein we integrated the gene scores predicted by the foregoing software programs. This resource provides valuable guidance for cancer researchers and clinical oncologists prioritizing cancer driver gene candidates by using an optimal tool.
Collapse
Affiliation(s)
- Xiaohui Shi
- Beijing Institutes of Life Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100080, China
| | - Huajing Teng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education) at Peking University Cancer Hospital and Institute, Beijing 100080, China
| | - Leisheng Shi
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100080, China
| | - Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing 100080, China
| | - Wenqing Wei
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100080, China
| | - Fengbiao Mao
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100080, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, CAS Center for Excellence in Biotic Interactions and State Key Laboratory of Integrated Management of Pest Insects and Rodents, University of Chinese Academy of Sciences, Institute of Genomic Medicine, Wenzhou Medical University, IBMC-BGI Center, the Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Beijing 100080, China
| |
Collapse
|
47
|
Yu L, Zheng Y, Ju B, Ao C, Gao L. Research progress of miRNA-disease association prediction and comparison of related algorithms. Brief Bioinform 2022; 23:6542222. [PMID: 35246678 DOI: 10.1093/bib/bbac066] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/30/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
With an in-depth understanding of noncoding ribonucleic acid (RNA), many studies have shown that microRNA (miRNA) plays an important role in human diseases. Because traditional biological experiments are time-consuming and laborious, new calculation methods have recently been developed to predict associations between miRNA and diseases. In this review, we collected various miRNA-disease association prediction models proposed in recent years and used two common data sets to evaluate the performance of the prediction models. First, we systematically summarized the commonly used databases and similarity data for predicting miRNA-disease associations, and then divided the various calculation models into four categories for summary and detailed introduction. In this study, two independent datasets (D5430 and D6088) were compiled to systematically evaluate 11 publicly available prediction tools for miRNA-disease associations. The experimental results indicate that the methods based on information dissemination and the method based on scoring function require shorter running time. The method based on matrix transformation often requires a longer running time, but the overall prediction result is better than the previous two methods. We hope that the summary of work related to miRNA and disease will provide comprehensive knowledge for predicting the relationship between miRNA and disease and contribute to advanced computation tools in the future.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yujia Zheng
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Bingyi Ju
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
48
|
Ma Y. DeepMNE: Deep Multi-network Embedding for lncRNA-Disease Association prediction. IEEE J Biomed Health Inform 2022; 26:3539-3549. [PMID: 35180094 DOI: 10.1109/jbhi.2022.3152619] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Long non-coding RNA (lncRNA) participates in various biological processes, hence its mutations and disorders play an important role in the pathogenesis of multiple human diseases. Identifying disease-related lncRNAs is crucial for the diagnosis, prevention, and treatment of diseases. Although a large number of computational approaches have been developed, effectively integrating multi-omics data and accurately predicting potential lncRNA-disease associations remains a challenge, especially regarding new lncRNAs and new diseases. In this work, we propose a new method with deep multi-network embedding, called DeepMNE, to discover potential lncRNA disease associations, especially for novel diseases and lncRNAs. DeepMNE extracts multi-omics data to describe diseases and lncRNAs, and proposes a network fusion method based on deep learning to integrate multi-source information. Moreover, DeepMNE complements the sparse association network and uses kernel neighborhood similarity to construct disease similarity and lncRNA similarity networks. Furthermore, A graph embedding method is adopted to predict potential associations. Experimental results demonstrate that compared to other state-of-the-art methods, DeepMNE has a higher predictive performance on new associations, new lncRNAs and new diseases. Besides, DeepMNE also elicits a considerable predictive performance on perturbed datasets. Additionally, the results of two different types of case studies indicate that DeepMNE can be used as an effective tool for disease-related lncRNA prediction. The code of DeepMNE is freely available at https://github.com/Mayingjun20179/ DeepMNE.
Collapse
|
49
|
Genome-Wide Association Study of Root System Architecture in Maize. Genes (Basel) 2022; 13:genes13020181. [PMID: 35205226 PMCID: PMC8872597 DOI: 10.3390/genes13020181] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 01/16/2022] [Accepted: 01/18/2022] [Indexed: 01/05/2023] Open
Abstract
Roots are important plant organs for the absorption of water and nutrients. To date, there have been few genome-wide association studies of maize root system architecture (RSA) in the field. The genetic basis of maize RSA is poorly understood, and the maize RSA-related genes that have been cloned are very limited. Here, 421 maize inbred lines of an association panel were planted to measure the root systems at the maturity stage, and a genome-wide association study was performed. There was a strong correlation among eight RSA traits, and the RSA traits were highly correlated with the aboveground plant architecture traits (e.g., plant height and ear leaf length, r = 0.13–0.25, p < 0.05). The RSA traits of the stiff stalk subgroup (SS) showed lower values than those of the non-stiff stalk subgroup (NSS) and tropical/subtropical subgroup (TST). Using the RSA traits, the genome-wide association study identified 63 SNPs and 189 candidate genes. Among them, nine candidate genes co-localized between RSA and aboveground architecture traits. A further co-expression analysis identified 88 candidate genes having high confidence levels. Furthermore, we identified four highly reliable RSA candidate genes, GRMZM2G099797, GRMZM2G354338, GRMZM2G085042, and GRMZM5G812926. This research provides theoretical support for the genetic improvement of maize root systems, and it identified candidate genes that may act as genetic resources for breeding.
Collapse
|
50
|
Williams EG, Pfister N, Roy S, Statzer C, Haverty J, Ingels J, Bohl C, Hasan M, Čuklina J, Bühlmann P, Zamboni N, Lu L, Ewald CY, Williams RW, Aebersold R. Multiomic profiling of the liver across diets and age in a diverse mouse population. Cell Syst 2022; 13:43-57.e6. [PMID: 34666007 PMCID: PMC8776606 DOI: 10.1016/j.cels.2021.09.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/12/2021] [Accepted: 09/14/2021] [Indexed: 01/21/2023]
Abstract
We profiled the liver transcriptome, proteome, and metabolome in 347 individuals from 58 isogenic strains of the BXD mouse population across age (7 to 24 months) and diet (low or high fat) to link molecular variations to metabolic traits. Several hundred genes are affected by diet and/or age at the transcript and protein levels. Orthologs of two aging-associated genes, St7 and Ctsd, were knocked down in C. elegans, reducing longevity in wild-type and mutant long-lived strains. The multiomics data were analyzed as segregating gene networks according to each independent variable, providing causal insight into dietary and aging effects. Candidates were cross-examined in an independent diversity outbred mouse liver dataset segregating for similar diets, with ∼80%-90% of diet-related candidate genes found in common across datasets. Together, we have developed a large multiomics resource for multivariate analysis of complex traits and demonstrate a methodology for moving from observational associations to causal connections.
Collapse
Affiliation(s)
- Evan G Williams
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Niklas Pfister
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Suheeta Roy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Cyril Statzer
- Department of Health Sciences and Technology, ETH Zürich, Zurich, Switzerland
| | - Jack Haverty
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jesse Ingels
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Casey Bohl
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Moaraj Hasan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Jelena Čuklina
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Peter Bühlmann
- Department of Mathematics, Seminar for Statistics, ETH Zürich, Zurich, Switzerland
| | - Nicola Zamboni
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Collin Y Ewald
- Department of Health Sciences and Technology, ETH Zürich, Zurich, Switzerland
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland; Faculty of Science, University of Zürich, Zurich, Switzerland
| |
Collapse
|