1
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
2
|
Mirzaei G. Constructing gene similarity networks using co-occurrence probabilities. BMC Genomics 2023; 24:697. [PMID: 37990157 PMCID: PMC10662556 DOI: 10.1186/s12864-023-09780-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 11/01/2023] [Indexed: 11/23/2023] Open
Abstract
Gene similarity networks play important role in unraveling the intricate associations within diverse cancer types. Conventionally, gauging the similarity between genes has been approached through experimental methodologies involving chemical and molecular analyses, or through the lens of mathematical techniques. However, in our work, we have pioneered a distinctive mathematical framework, one rooted in the co-occurrence of attribute values and single point mutations, thereby establishing a novel approach for quantifying the dissimilarity or similarity among genes. Central to our approach is the recognition of mutations as key players in the evolutionary trajectory of cancer. Anchored in this understanding, our methodology hinges on the consideration of two categorical attributes: mutation type and nucleotide change. These attributes are pivotal, as they encapsulate the critical variations that can precipitate substantial changes in gene behavior and ultimately influence disease progression. Our study takes on the challenge of formulating similarity measures that are intrinsic to genes' categorical data. Taking into account the co-occurrence probability of attribute values within single point mutations, our innovative mathematical approach surpasses the boundaries of conventional methods. We thereby provide a robust and comprehensive means to assess gene similarity and take a significant step forward in refining the tools available for uncovering the subtle yet impactful associations within the complex realm of gene interactions in cancer.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, The Ohio State University, Marion, USA.
| |
Collapse
|
3
|
Demir Karaman E, Işık Z. Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers. Med Sci (Basel) 2023; 11:44. [PMID: 37489460 PMCID: PMC10366886 DOI: 10.3390/medsci11030044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/18/2023] [Accepted: 06/20/2023] [Indexed: 07/26/2023] Open
Abstract
Combining omics data from different layers using integrative methods provides a better understanding of the biology of a complex disease such as cancer. The discovery of biomarkers related to cancer development or prognosis helps to find more effective treatment options. This study integrates multi-omics data of different cancer types with a network-based approach to explore common gene modules among different tumors by running community detection methods on the integrated network. The common modules were evaluated by several biological metrics adapted to cancer. Then, a new prognostic scoring method was developed by weighting mRNA expression, methylation, and mutation status of genes. The survival analysis pointed out statistically significant results for GNG11, CBX2, CDKN3, ARHGEF10, CLN8, SEC61G and PTDSS1 genes. The literature search reveals that the identified biomarkers are associated with the same or different types of cancers. Our method does not only identify known cancer-specific biomarker genes, but also proposes new potential biomarkers. Thus, this study provides a rationale for identifying new gene targets and expanding treatment options across cancer types.
Collapse
Affiliation(s)
- Ezgi Demir Karaman
- Department of Computer Engineering, Institute of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey
| | - Zerrin Işık
- Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey
| |
Collapse
|
4
|
Shah E, Maji P. Multi-View Kernel Learning for Identification of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2278-2290. [PMID: 37027602 DOI: 10.1109/tcbb.2023.3247033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Gene expression data sets and protein-protein interaction (PPI) networks are two heterogeneous data sources that have been extensively studied, due to their ability to capture the co-expression patterns among genes and their topological connections. Although they depict different traits of the data, both of them tend to group co-functional genes together. This phenomenon agrees with the basic assumption of multi-view kernel learning, according to which different views of the data contain a similar inherent cluster structure. Based on this inference, a new multi-view kernel learning based disease gene identification algorithm, termed as DiGId, is put forward. A novel multi-view kernel learning approach is proposed that aims to learn a consensus kernel, which efficiently captures the heterogeneous information of individual views as well as depicts the underlying inherent cluster structure. Some low-rank constraints are imposed on the learned multi-view kernel, so that it can effectively be partitioned into k or fewer clusters. The learned joint cluster structure is used to curate a set of potential disease genes. Moreover, a novel approach is put forward to quantify the importance of each view. In order to demonstrate the effectiveness of the proposed approach in capturing the relevant information depicted by individual views, an extensive analysis is performed on four different cancer-related gene expression data sets and PPI network, considering different similarity measures.
Collapse
|
5
|
Wang Y, Tang S, Ma R, Zamit I, Wei Y, Pan Y. Multi-modal intermediate integrative methods in neuropsychiatric disorders: A review. Comput Struct Biotechnol J 2022; 20:6149-6162. [PMID: 36420153 PMCID: PMC9674886 DOI: 10.1016/j.csbj.2022.11.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
The etiology of neuropsychiatric disorders involves complex biological processes at different omics layers, such as genomics, transcriptomics, epigenetics, proteomics, and metabolomics. The advent of high-throughput technology, as well as the availability of large open-source datasets, has ushered in a new era in system biology, necessitating the integration of various types of omics data. The complexity of biological mechanisms, the limitations of integrative strategies, and the heterogeneity of multi-omics data have all presented significant challenges to computational scientists. In comparison to early and late integration, intermediate integration may transform each data type into appropriate intermediate representations using various data transformation techniques, allowing it to capture more complementary information contained in each omics and highlight new interactions across omics layers. Here, we reviewed multi-modal intermediate integrative techniques based on component analysis, matrix factorization, similarity network, multiple kernel learning, Bayesian network, artificial neural networks, and graph transformation, as well as their applications in neuropsychiatric domains. We depicted advancements in these approaches and compared the strengths and weaknesses of each method examined. We believe that our findings will aid researchers in their understanding of the transformation and integration of multi-omics data in neuropsychiatric disorders.
Collapse
Affiliation(s)
- Yanlin Wang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Shi Tang
- Li Chiu Kong Family Sleep Assessment Unit, Department of Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong Special Administrative Region
| | - Ruimin Ma
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Ibrahim Zamit
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| |
Collapse
|
6
|
Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree. Sci Rep 2022; 12:10004. [PMID: 35705654 PMCID: PMC9200794 DOI: 10.1038/s41598-022-14127-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
Collapse
|
7
|
Zhang Y, Duan L, Zheng H, Li-Ling J, Qin R, Chen Z, He C, Wang T. Mining Similar Aspects for Gene Similarity Explanation Based on Gene Information Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1734-1746. [PMID: 33259307 DOI: 10.1109/tcbb.2020.3041559] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Analysis of gene similarity not only can provide information on the understanding of the biological roles and functions of a gene, but may also reveal the relationships among various genes. In this paper, we introduce a novel idea of mining similar aspects from a gene information network, i.e., for a given gene pair, we want to know in which aspects (meta paths) they are most similar from the perspective of the gene information network. We defined a similarity metric based on the set of meta paths connecting the query genes in the gene information network and used the rank of similarity of a gene pair in a meta path set to measure the similarity significance in that aspect. A minimal set of gene meta paths where the query gene pair ranks the highest is a similar aspect, and the similar aspect of a query gene pair is far from trivial. We proposed a novel method, SCENARIO, to investigate minimal similar aspects. Our empirical study on the gene information network, constructed from six public gene-related databases, verified that our proposed method is effective, efficient, and useful.
Collapse
|
8
|
Shah E, Maji P. Scalable Non-Linear Graph Fusion for Prioritizing Cancer-Causing Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1130-1143. [PMID: 32966220 DOI: 10.1109/tcbb.2020.3026219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few decades, both gene expression data and protein-protein interaction (PPI)networks have been extensively studied, due to their ability to depict important characteristics of disease-associated genes. In this regard, the paper presents a new gene prioritization algorithm to identify and prioritize cancer-causing genes, integrating judiciously the complementary information obtained from two data sources. The proposed algorithm selects disease-causing genes by maximizing the importance of selected genes and functional similarity among them. A new quantitative index is introduced to evaluate the importance of a gene. It considers whether a gene exhibits a differential expression pattern across sick and healthy individuals, and has a strong connectivity in the PPI network, which are the important characteristics of a potential biomarker. As disease-associated genes are expected to have similar expression profiles and topological structures, a scalable non-linear graph fusion technique, termed as ScaNGraF, is proposed to learn a disease-dependent functional similarity network from the co-expression and common neighbor based similarity networks. The proposed ScaNGraF, which is based on message passing algorithm, efficiently combines the shared and complementary information provided by different data sources with significantly lower computational cost. A new measure, termed as DiCoIN, is introduced to evaluate the quality of a learned affinity network. The performance of the proposed graph fusion technique and gene selection algorithm is extensively compared with that of some existing methods, using several cancer data sets.
Collapse
|
9
|
Acharya S, Cui L, Pan Y. A Refined 3-in-1 Fused Protein Similarity Measure: Application in Threshold-Free Hub Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:192-206. [PMID: 32070994 DOI: 10.1109/tcbb.2020.2973563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
An exhaustive literature survey shows that finding protein/gene similarity is an important step towards solving widespread bioinformatics problems, such as predicting protein-protein interactions, analyzing Protein-Protein Interaction Networks (PPINs), gene prioritization, and disease gene/protein detection. In this article, we have proposed an improved 3-in-1 fused protein similarity measure called FuSim-II. It is built upon combining the weighted average of biological knowledge extracted from three potential genomic/ proteomic resources such as Gene Ontology (GO), PPIN, and protein sequence. Furthermore, we have shown the application of the proposed measure in detecting potential hub-proteins from a given PPIN. Aiming that, we have proposed a multi-objective clustering-based protein hub detection framework with FuSim-II working as the underlying proximity measure. The PPINs of H. Sapiens and M. Musculus organisms are chosen for experimental purposes. Unlike most of the existing hub-detection methods, the proposed technique does not require to follow any protein degree cut-off or threshold to define hubs. A thorough assessment of efficiency between proposed and existing eight protein similarity measures along with eight single/multi-objective clustering methods has been carried out. Internal cluster validity indices like Silhouette and Davies Bouldin (DB) are deployed to accomplish analytical study. Also, a comparative performance analysis between proposed and five existing hub-proteins detection algorithms is conducted through the enrichment of essentiality study. The reported results show the improved performance of FuSim-II over existing protein similarity measures in terms of identifying functionally related proteins as well as relevant hub-proteins. Supplementary material is available at http://csse.szu.edu.cn/staff/cuilz/eng/index.html.
Collapse
|
10
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
11
|
Joodaki M, Ghadiri N, Maleki Z, Lotfi Shahreza M. A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion. J Biomed Inform 2021; 115:103688. [PMID: 33545331 DOI: 10.1016/j.jbi.2021.103688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/10/2021] [Accepted: 01/23/2021] [Indexed: 12/11/2022]
Abstract
One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
| | - Zeinab Maleki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | | |
Collapse
|
12
|
Merid SK, Bustamante M, Standl M, Sunyer J, Heinrich J, Lemonnier N, Aguilar D, Antó JM, Bousquet J, Santa-Marina L, Lertxundi A, Bergström A, Kull I, Wheelock ÅM, Koppelman GH, Melén E, Gruzieva O. Integration of gene expression and DNA methylation identifies epigenetically controlled modules related to PM 2.5 exposure. ENVIRONMENT INTERNATIONAL 2021; 146:106248. [PMID: 33212358 DOI: 10.1016/j.envint.2020.106248] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 09/24/2020] [Accepted: 10/25/2020] [Indexed: 05/28/2023]
Abstract
Air pollution has been associated with adverse health effects across the life-course. Although underlying mechanisms are unclear, several studies suggested pollutant-induced changes in transcriptomic profiles. In this meta-analysis of transcriptome-wide association studies of 656 children and adolescents from three European cohorts participating in the MeDALL Consortium, we found two differentially expressed transcript clusters (FDR p < 0.05) associated with exposure to particulate matter < 2.5 µm in diameter (PM2.5) at birth, one of them mapping to the MIR1296 gene. Further, by integrating gene expression with DNA methylation using Functional Epigenetic Modules algorithms, we identified 9 and 6 modules in relation to PM2.5 exposure at birth and at current address, respectively (including NR1I2, MAPK6, TAF8 and SCARA3). In conclusion, PM2.5 exposure at birth was linked to differential gene expression in children and adolescents. Importantly, we identified several significant interactome hotspots of gene modules of relevance for complex diseases in relation to PM2.5 exposure.
Collapse
Affiliation(s)
- Simon Kebede Merid
- Department of Clinical Sciences and Education, Karolinska Institutet, Södersjukhuset, Stockholm, Sweden
| | - Mariona Bustamante
- ISGlobal, Institute for Global Health, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Marie Standl
- Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Jordi Sunyer
- ISGlobal, Institute for Global Health, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Madrid, Spain; IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
| | - Joachim Heinrich
- Institute and Clinic for Occupational, Social and Environmental Medicine, University Hospital, LMU Munich, Ziemssenstraße 1, 80336 Munich, Germany; Allergy and Lung Health Unit, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - Nathanaël Lemonnier
- Institute for Advanced Biosciences, UGA-INSERM U1209-CNRS UMR5309, Allée des Alpes, France
| | - Daniel Aguilar
- Biomedical Research Networking Center in Hepatic and Digestive Diseases (CIBEREHD), Instituto de Salud Carlos III, Barcelona, Spain
| | - Josep Maria Antó
- ISGlobal, Institute for Global Health, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Madrid, Spain; IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
| | - Jean Bousquet
- Charité, Universitätsmedizin Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Comprehensive Allergy Center, Department of Dermatology and Allergy, Berlin, Germany; University Hospital, Montpellier, France; MACVIA-France, Montpellier, France
| | - Loreto Santa-Marina
- Health Research Institute-BIODONOSTIA, Basque Country, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Spain; Health Department of Basque Government, Sub-directorate of Public Health of Gipuzkoa, 20013 San Sebastian, Spain
| | - Aitana Lertxundi
- Health Research Institute-BIODONOSTIA, Basque Country, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Spain; Preventive Medicine and Public Health Department, University of Basque Country (UPV/EHU), Spain
| | - Anna Bergström
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden; Centre for Occupational and Environmental Medicine, Region Stockholm, Sweden
| | - Inger Kull
- Department of Clinical Sciences and Education, Karolinska Institutet, Södersjukhuset, Stockholm, Sweden; Sachs Children's Hospital, Stockholm, Sweden
| | - Åsa M Wheelock
- Respiratory Medicine Unit, Department of Medicine and Center for Molecular Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Gerard H Koppelman
- University of Groningen, University Medical Center Groningen, Beatrix Children's Hospital, Department of Pediatric Pulmonology and Pediatric Allergology, Groningen, the Netherlands; University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC), Groningen, the Netherlands
| | - Erik Melén
- Department of Clinical Sciences and Education, Karolinska Institutet, Södersjukhuset, Stockholm, Sweden; Sachs Children's Hospital, Stockholm, Sweden
| | - Olena Gruzieva
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden; Centre for Occupational and Environmental Medicine, Region Stockholm, Sweden.
| |
Collapse
|
13
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
14
|
Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity. Sci Rep 2019; 9:13645. [PMID: 31541145 PMCID: PMC6754439 DOI: 10.1038/s41598-019-50121-3] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 09/06/2019] [Indexed: 01/04/2023] Open
Abstract
Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.
Collapse
|
15
|
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet 2019; 10:294. [PMID: 31031797 PMCID: PMC6470635 DOI: 10.3389/fgene.2019.00294] [Citation(s) in RCA: 128] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 03/19/2019] [Indexed: 12/13/2022] Open
Abstract
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
Collapse
Affiliation(s)
- Abhijeet R. Sonawane
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Amitabh Sharma
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
16
|
Predicting disease-genes based on network information loss and protein complexes in heterogeneous network. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.12.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
17
|
Jarrell JT, Gao L, Cohen DS, Huang X. Network Medicine for Alzheimer's Disease and Traditional Chinese Medicine. Molecules 2018; 23:molecules23051143. [PMID: 29751596 PMCID: PMC6099497 DOI: 10.3390/molecules23051143] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 12/20/2022] Open
Abstract
Alzheimer’s Disease (AD) is a neurodegenerative condition that currently has no known cure. The principles of the expanding field of network medicine (NM) have recently been applied to AD research. The main principle of NM proposes that diseases are much more complicated than one mutation in one gene, and incorporate different genes, connections between genes, and pathways that may include multiple diseases to create full scale disease networks. AD research findings as a result of the application of NM principles have suggested that functional network connectivity, myelination, myeloid cells, and genes and pathways may play an integral role in AD progression, and may be integral to the search for a cure. Different aspects of the AD pathology could be potential targets for drug therapy to slow down or stop the disease from advancing, but more research is needed to reach definitive conclusions. Additionally, the holistic approaches of network pharmacology in traditional Chinese medicine (TCM) research may be viable options for the AD treatment, and may lead to an effective cure for AD in the future.
Collapse
Affiliation(s)
- Juliet T Jarrell
- Neurochemistry Laboratory, Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, USA.
| | - Li Gao
- Modern Research Center for Traditional Chinese Medicine, Shanxi University, Taiyuan 030006, China.
| | - David S Cohen
- Neurochemistry Laboratory, Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, USA.
| | - Xudong Huang
- Neurochemistry Laboratory, Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, USA.
| |
Collapse
|