1
|
Zeng P, Huang H, Li D. Combining bioinformatics, network pharmacology, and artificial intelligence to predict the mechanism of resveratrol in the treatment of rheumatoid arthritis. Heliyon 2024; 10:e37371. [PMID: 39309832 PMCID: PMC11416256 DOI: 10.1016/j.heliyon.2024.e37371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 08/07/2024] [Accepted: 09/02/2024] [Indexed: 09/25/2024] Open
Abstract
Background Rheumatoid arthritis (RA) is a chronic autoimmune disorder that causes joint inflammation and destruction, resulting in significant physical and economic burdens. Finding effective and targeted therapy for RA remains a top priority. Resveratrol is a potential candidate with anti-inflammatory and immunomodulatory properties for RA treatment. This study aims to determine the therapeutic targets and signaling pathways of resveratrol in the treatment of RA. Methods The GSE205962 dataset downloaded from The Gene Expression Omnibus (GEO) database was used to obtain the differentially expressed genes (DEGs) in blood samples from the patients and the healthy. PharmMapper database and Cytoscape (v3.9.1) were applied to construct the resveratrol pharmacophore target network. Gene functional enrichment analysis, including the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, was based on the BiNGo plug-in of Cytoscape and David's online tool. The intersection of the target genes of resveratrol and the DEGs were considered potential therapeutic genes (PT-genes). The Protein-Protein Interaction (PPI) network of PT-genes was constructed using the STRING tool, and the key therapeutic genes (KT-genes) were determined using the cytoHubba plug-in based on the Maximal Clique Centrality (MCC) algorithms. Molecular docking validation of resveratrol and therapeutic targets was performed based on the protein structure of KT-genes predicted by AlphaFold. Results A total of 2202 DEGs and 47PT-genes were identified. GO analysis showed that the three groups of genes, the DEGs, the resveratrol target genes, and the PT-genes, have similar results for the top-five gene functional enrichment. PT-genes were closely related to the pathways of metabolic pathways, pathways in cancer, proteoglycans in cancer, insulin signaling pathway, and chemokine signaling pathway. The common pathway enriched by KEGG for the DEGs, and the resveratrol target genes was up to 36 %. The nine KT-genes were ABL1, ANXA5, CASP3, HSP90AA1, LCK, MAP2K1, MAPK1, PIK3R1, and RAC1, and the lowest free energy indicating the resveratrol/protein affinity were -8.4, -7.4, -6.4, -6.7, -8.0, -7.9, -7.4, -6.7, and -7.9, respectively. Conclusion Nine KT-genes were identified and validated as the most potential therapeutic targets in the treatment of RA with resveratrol, which provide new insights into therapeutic mechanisms and may improve the efficiency of drug development.
Collapse
Affiliation(s)
- Piaoqi Zeng
- Department of Rheumatology, Ganzhou People's Hospital, Hongqi Avenue, Zhanggong District, Ganzhou City, 341000, Jiangxi Province, China
| | - Haohan Huang
- Department of Orthopaedics, Gongli Hospital of Shanghai Pudong New Area, 219 Miaopu Rd, Shanghai 200011, China
| | - Dongsheng Li
- Department of Rheumatology, Ganzhou People's Hospital, Hongqi Avenue, Zhanggong District, Ganzhou City, 341000, Jiangxi Province, China
| |
Collapse
|
2
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
3
|
Affiliation(s)
- Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing 100084, China
| |
Collapse
|
4
|
Qumsiyeh E, Showe L, Yousef M. GediNET for discovering gene associations across diseases using knowledge based machine learning approach. Sci Rep 2022; 12:19955. [PMID: 36402891 PMCID: PMC9675776 DOI: 10.1038/s41598-022-24421-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine.
| | - Louise Showe
- The Wistar Institute, Philadelphia, PA, 19104, USA
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel.
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| |
Collapse
|
5
|
Zhou H, Tian J, Sun H, Fu J, Lin N, Yuan D, Zhou L, Xia M, Sun L. Systematic Identification of Genomic Markers for Guiding Iron Oxide Nanoparticles in Cervical Cancer Based on Translational Bioinformatics. Int J Nanomedicine 2022; 17:2823-2841. [PMID: 35791307 PMCID: PMC9250777 DOI: 10.2147/ijn.s361483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 06/07/2022] [Indexed: 12/12/2022] Open
Abstract
Purpose Magnetic iron oxide nanoparticle (MNP) drug delivery system is a novel promising therapeutic option for cancer treatment. Material issues such as fabrication and functionalized modification have been investigated; however, pharmacologic mechanisms of bare MNPs inside cancer cells remain obscure. This study aimed to explore a systems pharmacology approach to understand the reaction of the whole cell to MNPs and suggest drug selection in MNP delivery systems to exert synergetic or additive anti-cancer effects. Methods HeLa and SiHa cell lines were used to estimate the properties of bare MNPs in cervical cancer through 3-[4,5-dimethylthiazol-2-yl]-2,5 diphenyl tetrazolium bromide (MTT) and enzyme activity assays and cellular fluorescence imaging. A systems pharmacology approach was utilized by combining bioinformatics data mining with clinical data analysis and without a predefined hypothesis. Key genes of the MNP onco-pharmacologic mechanism in cervical cancer were identified and further validated through transcriptome analysis with quantitative reverse transcription PCR (qRT-PCR). Results Low cytotoxic activity and cell internalization of MNP in HeLa and SiHa cells were observed. Lysosomal function was found to be impaired after MNP treatment. Protein tyrosine kinase 2 beta (PTK2B), liprin-alpha-4 (PPFIA4), mothers against decapentaplegic homolog 7 (SMAD7), and interleukin (IL) 1B were identified as key genes relevant for MNP pharmacology, clinical features, somatic mutation, and immune infiltration. The four key genes also exhibited significant correlations with the lysosome gene set. The qRT-PCR results showed significant alterations in the expression of the four key genes after MNP treatment in HeLa and SiHa cells. Conclusion Our research suggests that treatment of bare MNPs in HeLa and SiHa cells induced significant expression changes in PTK2B, PPFIA4, SMAD7, and IL1B, which play crucial roles in cervical cancer development and progression. Interactions of the key genes with specific anti-cancer drugs must be considered in the rational design of MNP drug delivery systems.
Collapse
Affiliation(s)
- Haohan Zhou
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China.,Department of Orthopaedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, 200000, People's Republic of China
| | - Jiayi Tian
- First Hospital, Jilin University, Changchun, 130021, People's Republic of China
| | - Hongyu Sun
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China
| | - Jiaying Fu
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China
| | - Nan Lin
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China
| | - Danni Yuan
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China
| | - Li Zhou
- First Hospital, Jilin University, Changchun, 130021, People's Republic of China
| | - Meihui Xia
- First Hospital, Jilin University, Changchun, 130021, People's Republic of China
| | - Liankun Sun
- Key Laboratory of Pathobiology, Ministry of Education, Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, 130021, People's Republic of China
| |
Collapse
|
6
|
Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks. Int J Mol Sci 2022; 23:ijms23137411. [PMID: 35806415 PMCID: PMC9266751 DOI: 10.3390/ijms23137411] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/25/2022] [Accepted: 06/30/2022] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies (GWAS) can be used to infer genome intervals that are involved in genetic diseases. However, investigating a large number of putative mutations for GWAS is resource- and time-intensive. Network-based computational approaches are being used for efficient disease-gene association prediction. Network-based methods are based on the underlying assumption that the genes causing the same diseases are located close to each other in a molecular network, such as a protein-protein interaction (PPI) network. In this survey, we provide an overview of network-based disease-gene association prediction methods based on three categories: graph-theoretic algorithms, machine learning algorithms, and an integration of these two. We experimented with six selected methods to compare their prediction performance using a heterogeneous network constructed by combining a genome-wide weighted PPI network, an ontology-based disease network, and disease-gene associations. The experiment was conducted in two different settings according to the presence and absence of known disease-associated genes. The results revealed that HerGePred, an integrative method, outperformed in the presence of known disease-associated genes, whereas PRINCE, which adopted a network propagation algorithm, was the most competitive in the absence of known disease-associated genes. Overall, the results demonstrated that the integrative methods performed better than the methods using graph-theory only, and the methods using a heterogeneous network performed better than those using a homogeneous PPI network only.
Collapse
|
7
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
8
|
Thummadi NB, Vishnu E, Subbiah EV, Manimaran P. A graph centrality-based approach for candidate gene prediction for type 1 diabetes. Immunol Res 2021; 69:422-428. [PMID: 34297307 DOI: 10.1007/s12026-021-09217-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/15/2021] [Indexed: 10/20/2022]
Abstract
Type 1 diabetes mellitus (T1DM) or insulin-dependent diabetes is an autoimmune disease that may pose life-threatening situations to individuals. In most cases, cytotoxic T lymphocytes (CTLs) promotes killing of islets of Langerhans in the pancreas, which harbour insulin-producing beta cells. The trigger for autoimmune attack is still unclear; therefore, identifying and targeting candidate genes are imperative to hinder its deleterious effects. In the present study, we focused on identification of new candidate genes for T1DM. For our study, we exclusively selected immune-related genes as they play a crucial role in T1DM. We constructed and analysed a human immunome signalling network (directed network) to identify the new candidate genes through various graph centrality measures combining with Gene Ontology (GO). As a result, we identified 4 new candidate genes which may act as potential drug targets for T1DM. We further validated for their disease relevance through literature survey and pathway analysis and found that 3 out of 4 predicted genes mirrored their well-established roles as potential targets for T1DM.
Collapse
Affiliation(s)
- N B Thummadi
- Department of Animal Biology, University of Hyderabad, Gachibowli, Hyderabad, 500046, India
| | - E Vishnu
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, 500046, Telangana, India
| | - E V Subbiah
- Department of Sports Biosciences, Central University of Rajasthan, Kishangarh, Ajmer, 305817, India
| | - P Manimaran
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, 500046, Telangana, India.
| |
Collapse
|
9
|
Hormozdiari F, Jung J, Eskin E, J. Joo JW. MARS: leveraging allelic heterogeneity to increase power of association testing. Genome Biol 2021; 22:128. [PMID: 33931127 PMCID: PMC8086090 DOI: 10.1186/s13059-021-02353-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/15/2021] [Indexed: 11/10/2022] Open
Abstract
In standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115 MA USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Junghyun Jung
- Department of Life Science, Dongguk University-Seoul, Seoul, 04620 South Korea
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095 CA USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, 90095 CA USA
| | - Jong Wha J. Joo
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620 South Korea
| |
Collapse
|
10
|
Le DH. Machine learning-based approaches for disease gene prediction. Brief Funct Genomics 2020; 19:350-363. [PMID: 32567652 DOI: 10.1093/bfgp/elaa013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/30/2020] [Accepted: 05/09/2020] [Indexed: 12/20/2022] Open
Abstract
Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| |
Collapse
|
11
|
Liu HC, Peng YS, Lee HC. miRDRN-miRNA disease regulatory network: a tool for exploring disease and tissue-specific microRNA regulatory networks. PeerJ 2019; 7:e7309. [PMID: 31404401 PMCID: PMC6688598 DOI: 10.7717/peerj.7309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 06/17/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND MicroRNA (miRNA) regulates cellular processes by acting on specific target genes, and cellular processes proceed through multiple interactions often organized into pathways among genes and gene products. Hundreds of miRNAs and their target genes have been identified, as are many miRNA-disease associations. These, together with huge amounts of data on gene annotation, biological pathways, and protein-protein interactions are available in public databases. Here, using such data we built a database and web service platform, miRNA disease regulatory network (miRDRN), for users to construct disease and tissue-specific miRNA-protein regulatory networks, with which they may explore disease related molecular and pathway associations, or find new ones, and possibly discover new modes of drug action. METHODS Data on disease-miRNA association, miRNA-target association and validation, gene-tissue association, gene-tumor association, biological pathways, human protein interaction, gene ID, gene ontology, gene annotation, and product were collected from publicly available databases and integrated. A large set of miRNA target-specific regulatory sub-pathways (RSPs) having the form (T, G 1, G 2) was built from the integrated data and stored, where T is a miRNA-associated target gene, G 1 (G 2) is a gene/protein interacting with T (G 1). Each sequence (T, G 1, G 2) was assigned a p-value weighted by the participation of the three genes in molecular interactions and reaction pathways. RESULTS A web service platform, miRDRN (http://mirdrn.ncu.edu.tw/mirdrn/), was built. The database part of miRDRN currently stores 6,973,875 p-valued RSPs associated with 116 diseases in 78 tissue types built from 207 diseases-associated miRNA regulating 389 genes. miRDRN also provides facilities for the user to construct disease and tissue-specific miRNA regulatory networks from RSPs it stores, and to download and/or visualize parts or all of the product. User may use miRDRN to explore a single disease, or a disease-pair to gain insights on comorbidity. As demonstrations, miRDRN was applied: to explore the single disease colorectal cancer (CRC), in which 26 novel potential CRC target genes were identified; to study the comorbidity of the disease-pair Alzheimer's disease-Type 2 diabetes, in which 18 novel potential comorbid genes were identified; and, to explore possible causes that may shed light on recent failures of late-phase trials of anti-AD, BACE1 inhibitor drugs, in which genes downstream to BACE1 whose suppression may affect signal transduction were identified.
Collapse
Affiliation(s)
- Hsueh-Chuan Liu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
| | - Yi-Shian Peng
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
| | - Hoong-Chien Lee
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
- Department of Physics, Chung Yuan Christian University, Zhongli District, Taoyuan City, Taiwan
| |
Collapse
|
12
|
Hu K, Hu JB, Tang L, Xiang J, Ma JL, Gao YY, Li HJ, Zhang Y. Predicting disease-related genes by path structure and community structure in protein–protein networks. JOURNAL OF STATISTICAL MECHANICS: THEORY AND EXPERIMENT 2018; 2018:100001. [DOI: 10.1088/1742-5468/aae02b] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
13
|
Makarov V, Gorlin A. Computational method for discovery of biomarker signatures from large, complex data sets. Comput Biol Chem 2018; 76:161-168. [DOI: 10.1016/j.compbiolchem.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 07/02/2018] [Accepted: 07/04/2018] [Indexed: 11/30/2022]
|
14
|
Yea SJ, Kim BY, Kim C, Yi MY. A framework for the targeted selection of herbs with similar efficacy by exploiting drug repositioning technique and curated biomedical knowledge. JOURNAL OF ETHNOPHARMACOLOGY 2017; 208:117-128. [PMID: 28687508 DOI: 10.1016/j.jep.2017.06.048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 06/27/2017] [Accepted: 06/27/2017] [Indexed: 06/07/2023]
Abstract
ETHNO PHARMACOLOGICAL RELEVANCE Plants have been the most important natural resources for traditional medicine and for the modern pharmaceutical industry. They have been in demand in regards to finding alternative medicinal herbs with similar efficacy. Due to the very low probability of discovering useful compounds by random screening, researchers have advocated for using targeted selection approaches. Furthermore, because drug repositioning can speed up the process of drug development, an integrated technique that exploits chemical, genetic, and disease information has been recently developed. Building upon these findings, in this paper, we propose a novel framework for the targeted selection of herbs with similar efficacy by exploiting drug repositioning technique and curated modern scientific biomedical knowledge, with the goal of improving the possibility of inferring the traditional empirical ethno-pharmacological knowledge. MATERIALS AND METHODS To rank candidate herbs on the basis of similarities against target herb, we proposed and evaluated a framework that is comprised of the following four layers: links, extract, similarity, and model. In the framework, multiple databases are linked to build an herb-compound-protein-disease network which was composed of one tripartite network and two bipartite networks allowing comprehensive and detailed information to be extracted. Further, various similarity scores between herbs are calculated, and then prediction models are trained and tested on the basis of theses similarity features. RESULTS The proposed framework has been found to be feasible in terms of link loss. Out of the 50 similarities, the best one enhanced the performance of ranking herbs with similar efficacy by about 120-320% compared with our previous study. Also, the prediction model showed improved performance by about 180-480%. While building the prediction model, we identified the compound information as being the most important knowledge source and structural similarity as the most useful measure. CONCLUSIONS In the proposed framework, we took the knowledge of herbal medicine, chemistry, biology, and medicine into consideration to rank herbs with similar efficacy in candidates. The experimental results demonstrated that the performances of framework outperformed the baselines and identified the important knowledge source and useful similarity measure.
Collapse
Affiliation(s)
- Sang-Jun Yea
- Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea; K-herb Research Center, Korea Institute of Oriental Medicine, Republic of Korea
| | - Bu-Yeo Kim
- KM Convergence Research Division, Korea Institute of Oriental Medicine, Republic of Korea
| | - Chul Kim
- K-herb Research Center, Korea Institute of Oriental Medicine, Republic of Korea.
| | - Mun Yong Yi
- Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea.
| |
Collapse
|
15
|
Babbi G, Martelli PL, Profiti G, Bovo S, Savojardo C, Casadio R. eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes. BMC Genomics 2017; 18:554. [PMID: 28812536 PMCID: PMC5558190 DOI: 10.1186/s12864-017-3911-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis. RESULTS We present eDGAR, a database collecting and organizing the data on gene/disease associations as derived from OMIM, Humsavar and ClinVar. For each disease-associated gene, eDGAR collects information on its annotation. Specifically, for lists of genes, eDGAR provides information on: i) interactions retrieved from PDB, BIOGRID and STRING; ii) co-occurrence in stable and functional structural complexes; iii) shared Gene Ontology annotations; iv) shared KEGG and REACTOME pathways; v) enriched functional annotations computed with NET-GE; vi) regulatory interactions derived from TRRUST; vii) localization on chromosomes and/or co-localisation in neighboring loci. The present release of eDGAR includes 2672 diseases, related to 3658 different genes, for a total number of 5729 gene-disease associations. 71% of the genes are linked to 621 multigenic diseases and eDGAR highlights their common GO terms, KEGG/REACTOME pathways, physical and regulatory interactions. eDGAR includes a network based enrichment method for detecting statistically significant functional terms associated to groups of genes. CONCLUSIONS eDGAR offers a resource to analyze disease-gene associations. In multigenic diseases genes can share physical interactions and/or co-occurrence in the same functional processes. eDGAR is freely available at: edgar.biocomp.unibo.it.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | | | - Giuseppe Profiti
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | | | - Rita Casadio
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy.,Interdepartmental Center «Giorgio Prodi» for Cancer Research, University of Bologna, Bologna, Italy
| |
Collapse
|
16
|
Kurowski BG, Treble-Barna A, Pitzer AJ, Wade SL, Martin LJ, Chima RS, Jegga A. Applying Systems Biology Methodology To Identify Genetic Factors Possibly Associated with Recovery after Traumatic Brain Injury. J Neurotrauma 2017; 34:2280-2290. [PMID: 28301983 DOI: 10.1089/neu.2016.4856] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Traumatic brain injury (TBI) is one of the leading causes of morbidity and mortality worldwide. It is linked with a number of medical, neurological, cognitive, and behavioral sequelae. The influence of genetic factors on the biology and related recovery after TBI is poorly understood. Studies that seek to elucidate the impact of genetic influences on neurorecovery after TBI will lead to better individualization of prognosis and inform development of novel treatments, which are considerably lacking. Current genetic studies related to TBI have focused on specific candidate genes. The objectives of this study were to use a system biology-based approach to identify biologic processes over-represented with genetic variants previously implicated in clinical outcomes after TBI and identify unique genes potentially related to recovery after TBI. After performing a systematic review to identify genes in the literature associated with clinical outcomes, we used the genes identified to perform a systems biology-based integrative computational analysis to ascertain the interactions between molecular components and to develop models for regulation and function of genes involved in TBI recovery. The analysis identified over-representation of genetic variants primarily in two biologic processes: response to injury (cell proliferation, cell death, inflammatory response, and cellular metabolism) and neurocognitive and behavioral reserve (brain development, cognition, and behavior). Overall, this study demonstrates the use of a systems biology-based approach to identify unique/novel genes or sets of genes important to the recovery process. Findings from this systems biology-based approach provide additional insight into the potential impact of genetic variants on the underlying complex biological processes important to TBI recovery and may inform the development of empirical genetic-related studies for TBI. Future studies that combine systems biology methodology and genomic, proteomic, and epigenetic approaches are needed in TBI.
Collapse
Affiliation(s)
- Brad G Kurowski
- 1 Department of Pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine , Cincinnati, Ohio
| | - Amery Treble-Barna
- 2 Division of Physical Medicine and Rehabilitation, University of Pittsburgh School of Medicine , Pittsburgh, Pennsylvania
| | - Alexis J Pitzer
- 3 Department of Psychology, Xavier University , Cincinnati, Ohio
| | - Shari L Wade
- 1 Department of Pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine , Cincinnati, Ohio
| | - Lisa J Martin
- 1 Department of Pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine , Cincinnati, Ohio
| | - Ranjit S Chima
- 1 Department of Pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine , Cincinnati, Ohio
| | - Anil Jegga
- 1 Department of Pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine , Cincinnati, Ohio
| |
Collapse
|
17
|
Abstract
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery. Results We propose a new network-based disease gene prediction method called SLN-SRW (Simplified Laplacian Normalization-Supervised Random Walk) to generate and model the edge weights of a new biomedical network that integrates biomedical data from heterogeneous sources, thus far enhancing the disease related gene discovery. Conclusions The experiment results show that SLN-SRW significantly improves the performance of disease gene prediction on both the real and the synthetic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3263-4) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Kaalia R, Ghosh I. Semantics based approach for analyzing disease-target associations. J Biomed Inform 2016; 62:125-35. [PMID: 27349858 DOI: 10.1016/j.jbi.2016.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/23/2016] [Accepted: 06/24/2016] [Indexed: 12/16/2022]
Abstract
BACKGROUND A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. METHODS Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. RESULTS Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. CONCLUSIONS Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations.
Collapse
Affiliation(s)
- Rama Kaalia
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Indira Ghosh
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
19
|
Mullen J, Cockell SJ, Woollard P, Wipat A. An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations. PLoS One 2016; 11:e0155811. [PMID: 27196054 PMCID: PMC4873016 DOI: 10.1371/journal.pone.0155811] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 05/04/2016] [Indexed: 12/31/2022] Open
Abstract
Drug development is both increasing in cost whilst decreasing in productivity. There is a general acceptance that the current paradigm of R&D needs to change. One alternative approach is drug repositioning. With target-based approaches utilised heavily in the field of drug discovery, it becomes increasingly necessary to have a systematic method to rank gene-disease associations. Although methods already exist to collect, integrate and score these associations, they are often not a reliable reflection of expert knowledge. Furthermore, the amount of data available in all areas covered by bioinformatics is increasing dramatically year on year. It thus makes sense to move away from more generalised hypothesis driven approaches to research to one that allows data to generate their own hypothesis. We introduce an integrated, data driven approach to drug repositioning. We first apply a Bayesian statistics approach to rank 309,885 gene-disease associations using existing knowledge. Ranked associations are then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network, we show how our approach identifies diseases of the central nervous system (CNS) to be an area of interest. CNS disorders are identified due to the low numbers of such disorders that currently have marketed treatments, in comparison to other therapeutic areas. We then systematically mine our network for semantic subgraphs that allow us to infer drug-disease relations that are not captured in the network. We identify and rank 275,934 drug-disease has_indication associations after filtering those that are more likely to be side effects, whilst commenting on the top ranked associations in more detail. The dataset has been created in Neo4j and is available for download at https://bitbucket.org/ncl-intbio/genediseaserepositioning along with a Java implementation of the searching algorithm.
Collapse
Affiliation(s)
- Joseph Mullen
- Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Simon J. Cockell
- Bioinformatics Support Unit, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Peter Woollard
- Computational Biology Department, Quantitative Sciences, GlaxoSmithKline Research & Development Ltd, Stevenage, Hertfordshire, United Kingdom
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
20
|
Abstract
MOTIVATION Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. RESULTS To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. AVAILABILITY AND IMPLEMENTATION nlp. CASE edu/public/data/DMN
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Li Li
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Rong Xu
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
21
|
Abascal MF, Besso MJ, Rosso M, Mencucci MV, Aparicio E, Szapiro G, Furlong LI, Vazquez-Levin MH. CDH1/E-cadherin and solid tumors. An updated gene-disease association analysis using bioinformatics tools. Comput Biol Chem 2015; 60:9-20. [PMID: 26674224 DOI: 10.1016/j.compbiolchem.2015.10.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2015] [Revised: 10/17/2015] [Accepted: 10/19/2015] [Indexed: 12/13/2022]
Abstract
Cancer is a group of diseases that causes millions of deaths worldwide. Among cancers, Solid Tumors (ST) stand-out due to their high incidence and mortality rates. Disruption of cell-cell adhesion is highly relevant during tumor progression. Epithelial-cadherin (protein: E-cadherin, gene: CDH1) is a key molecule in cell-cell adhesion and an abnormal expression or/and function(s) contributes to tumor progression and is altered in ST. A systematic study was carried out to gather and summarize current knowledge on CDH1/E-cadherin and ST using bioinformatics resources. The DisGeNET database was exploited to survey CDH1-associated diseases. Reported mutations in specific ST were obtained by interrogating COSMIC and IntOGen tools. CDH1 Single Nucleotide Polymorphisms (SNP) were retrieved from the dbSNP database. DisGeNET analysis identified 609 genes annotated to ST, among which CDH1 was listed. Using CDH1 as query term, 26 disease concepts were found, 21 of which were neoplasms-related terms. Using DisGeNET ALL Databases, 172 disease concepts were identified. Of those, 80 ST disease-related terms were subjected to manual curation and 75/80 (93.75%) associations were validated. On selected ST, 489 CDH1 somatic mutations were listed in COSMIC and IntOGen databases. Breast neoplasms had the highest CDH1-mutation rate. CDH1 was positioned among the 20 genes with highest mutation frequency and was confirmed as driver gene in breast cancer. Over 14,000 SNP for CDH1 were found in the dbSNP database. This report used DisGeNET to gather/compile current knowledge on gene-disease association for CDH1/E-cadherin and ST; data curation expanded the number of terms that relate them. An updated list of CDH1 somatic mutations was obtained with COSMIC and IntOGen databases and of SNP from dbSNP. This information can be used to further understand the role of CDH1/E-cadherin in health and disease.
Collapse
Affiliation(s)
- María Florencia Abascal
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - María José Besso
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - Marina Rosso
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - María Victoria Mencucci
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - Evangelina Aparicio
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - Gala Szapiro
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| | - Laura Inés Furlong
- Research Programme on Biomedical Informatics (GRIB) (IMIM), DCEXS, Universitat Pompeu Fabra, C/Dr Aiguader 88, Zip Code 08003, Barcelona, Spain.
| | - Mónica Hebe Vazquez-Levin
- Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología & Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina; Laboratory of Cell-Cell Interaction in Cancer and Reproduction, Instituto de Biología y Medicina Experimental (IBYME), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Fundación IBYME (FIBYME), Vuelta de Obligado 2490, Zip Code C1428ADN, Buenos Aires, Argentina.
| |
Collapse
|
22
|
Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015; 17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open
Abstract
Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.
Collapse
|
23
|
Network-based ranking methods for prediction of novel disease associated microRNAs. Comput Biol Chem 2015; 58:139-48. [DOI: 10.1016/j.compbiolchem.2015.07.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 06/25/2015] [Accepted: 07/09/2015] [Indexed: 12/18/2022]
|
24
|
Castiblanco J, Anaya JM. Genetics and vaccines in the era of personalized medicine. Curr Genomics 2015; 16:47-59. [PMID: 25937813 PMCID: PMC4412964 DOI: 10.2174/1389202916666141223220551] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Revised: 12/22/2014] [Accepted: 12/23/2014] [Indexed: 12/17/2022] Open
Abstract
Vaccines represent the most successful and sustainable tactic to prevent and counteract infection. A vaccine generally improves immunity to a particular disease upon administration by inducing specific protective and efficient immune responses in all of the receiving population. The main known factors influencing the observed heterogeneity for immune re-sponses induced by vaccines are gender, age, co-morbidity, immune system, and genetic background. This review is mainly focused on the genetic status effect to vaccine immune responses and how this could contribute to the development of novel vaccine candidates that could be better directed and predicted relative to the genetic history of an individual and/or population. The text offers a brief history of vaccinology as a field, a description of the genetic status of the most relevant and studied genes and their functionality and correlation with exposure to specific vaccines; followed by an inside look into autoimmunity as a concern when designing vaccines as well as perspectives and conclusions looking towards an era of personalized and predictive vaccinology instead of a one size fits all approach.
Collapse
Affiliation(s)
- John Castiblanco
- Center for Autoimmune Diseases Research (CREA), School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 #63-C-69, Bogota, Colombia ; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá,Colombia
| | - Juan-Manuel Anaya
- Center for Autoimmune Diseases Research (CREA), School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 #63-C-69, Bogota, Colombia
| |
Collapse
|
25
|
Le DH, Xuan Hoai N, Kwon YK. A Comparative Study of Classification-Based Machine Learning Methods for Novel Disease Gene Prediction. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2015. [DOI: 10.1007/978-3-319-11680-8_46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
26
|
Tsafnat G, Jasch D, Misra A, Choong MK, Lin FPY, Coiera E. Gene-disease association with literature based enrichment. J Biomed Inform 2014; 49:221-6. [PMID: 24681202 DOI: 10.1016/j.jbi.2014.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 02/09/2014] [Accepted: 03/02/2014] [Indexed: 10/25/2022]
Abstract
MOTIVATION Gene set enrichment analysis (GSEA) annotates gene microarray data with functional information from the biomedical literature to improve gene-disease association prediction. We hypothesize that supplementing GSEA with comprehensive gene function catalogs built automatically using information extracted from the scientific literature will significantly enhance GSEA prediction quality. METHODS Gold standard gene sets for breast cancer (BrCa) and colorectal cancer (CRC) were derived from the literature. Two gene function catalogs (CMeSH and CUMLS) were automatically generated. 1. By using Entrez Gene to associate all recorded human genes with PubMed article IDs. 2. Using the genes mentioned in each PubMed article and associating each with the article's MeSH terms (in CMeSH) and extracted UMLS concepts (in CUMLS). Microarray data from the Gene Expression Omnibus for BrCa and CRC was then annotated using CMeSH and CUMLS and for comparison, also with several pre-existing catalogs (C2, C4 and C5 from the Molecular Signatures Database). Ranking was done using, a standard GSEA implementation (GSEA-p). Gene function predictions for enriched array data were evaluated against the gold standard by measuring area under the receiver operating characteristic curve (AUC). RESULTS Comparison of ranking using the literature enrichment catalogs, the pre-existing catalogs as well as five randomly generated catalogs show the literature derived enrichment catalogs are more effective. The AUC for BrCa using the unenriched gene expression dataset was 0.43, increasing to 0.89 after gene set enrichment with CUMLS. The AUC for CRC using the unenriched gene expression dataset was 0.54, increasing to 0.9 after enrichment with CMeSH. C2 increased AUC (BrCa 0.76, CRC 0.71) but C4 and C5 performed poorly (between 0.35 and 0.5). The randomly generated catalogs also performed poorly, equivalent to random guessing. DISCUSSION Gene set enrichment significantly improved prediction of gene-disease association. Selection of enrichment catalog had a substantial effect on prediction accuracy. The literature based catalogs performed better than the MSigDB catalogs, possibly because they are more recent. Catalogs generated automatically from the literature can be kept up to date. CONCLUSION Prediction of gene-disease association is a fundamental task in biomedical research. GSEA provides a promising method when using literature-based enrichment catalogs. AVAILABILITY The literature based catalogs generated and used in this study are available from http://www2.chi.unsw.edu.au/literature-enrichment.
Collapse
Affiliation(s)
- Guy Tsafnat
- Centre for Health Informatics, University of New South Wales, Sydney, Australia.
| | - Dennis Jasch
- Centre for Health Informatics, University of New South Wales, Sydney, Australia
| | - Agam Misra
- Centre for Health Informatics, University of New South Wales, Sydney, Australia
| | - Miew Keen Choong
- Centre for Health Informatics, University of New South Wales, Sydney, Australia
| | - Frank P-Y Lin
- Centre for Health Informatics, University of New South Wales, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, University of New South Wales, Sydney, Australia
| |
Collapse
|
27
|
Guney E, Garcia-Garcia J, Oliva B. GUILDify: a web server for phenotypic characterization of genes through biological data integration and network-based prioritization algorithms. ACTA ACUST UNITED AC 2014; 30:1789-90. [PMID: 24532728 DOI: 10.1093/bioinformatics/btu092] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
SUMMARY Determining genetic factors underlying various phenotypes is hindered by the involvement of multiple genes acting cooperatively. Over the past years, disease-gene prioritization has been central to identify genes implicated in human disorders. Special attention has been paid on using physical interactions between the proteins encoded by the genes to link them with diseases. Such methods exploit the guilt-by-association principle in the protein interaction network to uncover novel disease-gene associations. These methods rely on the proximity of a gene in the network to the genes associated with a phenotype and require a set of initial associations. Here, we present GUILDify, an easy-to-use web server for the phenotypic characterization of genes. GUILDify offers a prioritization approach based on the protein-protein interaction network where the initial phenotype-gene associations are retrieved via free text search on biological databases. GUILDify web server does not restrict the prioritization to any predefined phenotype, supports multiple species and accepts user-specified genes. It also prioritizes drugs based on the ranking of their targets, unleashing opportunities for repurposing drugs for novel therapies. AVAILABILITY AND IMPLEMENTATION Available online at http://sbi.imim.es/GUILDify.php
Collapse
Affiliation(s)
- Emre Guney
- Departament de Ciencies Experimentals i de la Salut, Structural Bioinformatics Group (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona, 08003 Catalonia, Spain
| | - Javier Garcia-Garcia
- Departament de Ciencies Experimentals i de la Salut, Structural Bioinformatics Group (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona, 08003 Catalonia, Spain
| | - Baldo Oliva
- Departament de Ciencies Experimentals i de la Salut, Structural Bioinformatics Group (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona, 08003 Catalonia, Spain
| |
Collapse
|
28
|
Castiblanco J, Arcos-Burgos M, Anaya JM. What is next after the genes for autoimmunity? BMC Med 2013; 11:197. [PMID: 24107170 PMCID: PMC3765994 DOI: 10.1186/1741-7015-11-197] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 08/12/2013] [Indexed: 11/28/2022] Open
Abstract
Clinical pathologies draw us to envisage disease as either an independent entity or a diverse set of traits governed by common physiopathological mechanisms, prompted by environmental assaults throughout life. Autoimmune diseases are not an exception, given they represent a diverse collection of diseases in terms of their demographic profile and primary clinical manifestations. Although they are pleiotropic outcomes of non-specific disease genes underlying similar immunogenetic mechanisms, research generally focuses on a single disease. Drastic technologic advances are leading research to organize clinical genomic multidisciplinary approaches to decipher the nature of human biological systems. Once the currently costly omic-based technologies become universally accessible, the way will be paved for a cleaner picture to risk quantification, prevention, prognosis and diagnosis, allowing us to clearly define better phenotypes always ensuring the integrity of the individuals studied. However, making accurate predictions for most autoimmune diseases is an ambitious challenge, since the understanding of these pathologies is far from complete. Herein, some pitfalls and challenges of the genetics of autoimmune diseases are reviewed, and an approximation to the future of research in this field is presented.
Collapse
Affiliation(s)
- John Castiblanco
- Center for Autoimmune Diseases Research (CREA), School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 #63-C-69, Bogota, Colombia.
| | | | | |
Collapse
|
29
|
Xu R, Li L, Wang Q. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature. Bioinformatics 2013; 29:2186-94. [PMID: 23828786 DOI: 10.1093/bioinformatics/btt359] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease-phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease-manifestation (D-M) pairs (one specific type of disease-phenotype relationship) from the wide body of published biomedical literature. DATA AND METHODS Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M-specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. RESULTS In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. CONCLUSIONS The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. AVAILABILITY http://nlp.case.edu/public/data/DMPatternUMLS/
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, USA.
| | | | | |
Collapse
|
30
|
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet 2013; 14:483-95. [PMID: 23752797 DOI: 10.1038/nrg3461] [Citation(s) in RCA: 745] [Impact Index Per Article: 62.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide association studies have identified many variants that each affects multiple traits, particularly across autoimmune diseases, cancers and neuropsychiatric disorders, suggesting that pleiotropic effects on human complex traits may be widespread. However, systematic detection of such effects is challenging and requires new methodologies and frameworks for interpreting cross-phenotype results. In this Review, we discuss the evidence for pleiotropy in contemporary genetic mapping studies, new and established analytical approaches to identifying pleiotropic effects, sources of spurious cross-phenotype effects and study design considerations. We also outline the molecular and clinical implications of such findings and discuss future directions of research.
Collapse
Affiliation(s)
- Nadia Solovieff
- Center for Human Genetics Research, Massachusetts General Hospital, 185 Cambridge Street, Boston, Massachusetts 02114, USA
| | | | | | | | | |
Collapse
|
31
|
Peterson TA, Park D, Kann MG. A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations. BMC Genomics 2013; 14 Suppl 3:S5. [PMID: 23819456 PMCID: PMC3665522 DOI: 10.1186/1471-2164-14-s3-s5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole-genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way to circumvent this problem, which is critical for the study of rare diseases, is to study the molecular patterns emerging from functional studies of existing disease mutations. Current gene-centric analyses to study mutations in coding regions are limited by their inability to account for the functional modularity of the protein. Previous studies of the functional patterns of known human disease mutations have shown a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. Results The results of this analysis reveal that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. We found over one hundred domain hotspots in yeast with approximately 50% in the exact same domain position as known human disease mutations. Conclusions We describe an analysis using protein domains as a framework for transferring functional information by studying domain hotspots in human and yeast and relating phenotypic changes in yeast to diseases in human. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | | | | |
Collapse
|
32
|
Le DH, Kwon YK. Neighbor-favoring weight reinforcement to improve random walk-based disease gene prioritization. Comput Biol Chem 2013; 44:1-8. [PMID: 23434623 DOI: 10.1016/j.compbiolchem.2013.01.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Revised: 01/11/2013] [Accepted: 01/16/2013] [Indexed: 01/13/2023]
Abstract
BACKGROUND Finding candidate genes associated with a disease is an important issue in biomedical research. Recently, many network-based methods have been proposed that implicitly utilize the modularity principle, which states that genes causing the same or similar diseases tend to form physical or functional modules in gene/protein relationship networks. Of these methods, the random walk with restart (RWR) algorithm is considered to be a state-of-the-art approach, but the modularity principle has not been fully considered in traditional RWR approaches. Therefore, we propose a novel method called ORIENT (neighbor-favoring weight reinforcement) to improve the performance of RWR through proper intensification of the weights of interactions close to the known disease genes. RESULTS Through extensive simulations over hundreds of diseases, we observed that our approach performs better than the traditional RWR algorithm. In particular, our method worked best when the weights of interactions involving only the nearest neighbor genes of the disease genes were intensified. Interestingly, the performance of our approach was negatively related to the probability with which the random walk will restart, whereas the performance of RWR without the weight-reinforcement was positively related in dense gene/protein relationship networks. We further found that the density of the disease gene-projected sub-graph and the number of paths between the disease genes in a gene/protein relationship network may be explanatory variables for the RWR performance. Finally, a comparison with other well-known gene prioritization tools including Endeavour, ToppGene, and BioGraph, revealed that our approach shows significantly better performance. CONCLUSION Taken together, these findings provide insight to efficiently guide RWR in disease gene prioritization.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Electrical Engineering, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan 680-749, Republic of Korea.
| | | |
Collapse
|
33
|
Jimenez-Lopez JC, Gachomo EW, Sharma S, Kotchoni SO. Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/ajmb.2013.32016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
34
|
Piro RM, Molineris I, Di Cunto F, Eils R, König R. Disease-gene discovery by integration of 3D gene expression and transcription factor binding affinities. ACTA ACUST UNITED AC 2012; 29:468-75. [PMID: 23267172 DOI: 10.1093/bioinformatics/bts720] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION The computational evaluation of candidate genes for hereditary disorders is a non-trivial task. Several excellent methods for disease-gene prediction have been developed in the past 2 decades, exploiting widely differing data sources to infer disease-relevant functional relationships between candidate genes and disorders. We have shown recently that spatially mapped, i.e. 3D, gene expression data from the mouse brain can be successfully used to prioritize candidate genes for human Mendelian disorders of the central nervous system. RESULTS We improved our previous work 2-fold: (i) we demonstrate that condition-independent transcription factor binding affinities of the candidate genes' promoters are relevant for disease-gene prediction and can be integrated with our previous approach to significantly enhance its predictive power; and (ii) we define a novel similarity measure-termed Relative Intensity Overlap-for both 3D gene expression patterns and binding affinity profiles that better exploits their disease-relevant information content. Finally, we present novel disease-gene predictions for eight loci associated with different syndromes of unknown molecular basis that are characterized by mental retardation.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), University of Heidelberg, Im 69120 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
35
|
Abstract
Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.
Collapse
|
36
|
|
37
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
38
|
Dannenfelser R, Clark NR, Ma'ayan A. Genes2FANs: connecting genes through functional association networks. BMC Bioinformatics 2012; 13:156. [PMID: 22748121 PMCID: PMC3472228 DOI: 10.1186/1471-2105-13-156] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2011] [Accepted: 05/25/2012] [Indexed: 01/04/2023] Open
Abstract
Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in many cancers are mostly connected through PPIs whereas other complex diseases, such as autism and type-2 diabetes, are mostly connected through FANs without PPIs, can guide better strategies for disease gene discovery. Genes2FANs is available at:
http://actin.pharm.mssm.edu/genes2FANs.
Collapse
Affiliation(s)
- Ruth Dannenfelser
- Department of Pharmacology and Systems Therapeutics, Systems Biology Center of New York, Mount Sinai School of Medicine, New York, NY 10029, USA
| | | | | |
Collapse
|
39
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
40
|
Kou Y, Betancur C, Xu H, Buxbaum JD, Ma'ayan A. Network- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2012; 160C:130-42. [PMID: 22499558 DOI: 10.1002/ajmg.c.31330] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Autism spectrum disorders (ASD) are a group of related neurodevelopmental disorders with significant combined prevalence (∼1%) and high heritability. Dozens of individually rare genes and loci associated with high-risk for ASD have been identified, which overlap extensively with genes for intellectual disability (ID). However, studies indicate that there may be hundreds of genes that remain to be identified. The advent of inexpensive massively parallel nucleotide sequencing can reveal the genetic underpinnings of heritable complex diseases, including ASD and ID. However, whole exome sequencing (WES) and whole genome sequencing (WGS) provides an embarrassment of riches, where many candidate variants emerge. It has been argued that genetic variation for ASD and ID will cluster in genes involved in distinct pathways and protein complexes. For this reason, computational methods that prioritize candidate genes based on additional functional information such as protein-protein interactions or association with specific canonical or empirical pathways, or other attributes, can be useful. In this study we applied several supervised learning approaches to prioritize ASD or ID disease gene candidates based on curated lists of known ASD and ID disease genes. We implemented two network-based classifiers and one attribute-based classifier to show that we can rank and classify known, and predict new, genes for these neurodevelopmental disorders. We also show that ID and ASD share common pathways that perturb an overlapping synaptic regulatory subnetwork. We also show that features relating to neuronal phenotypes in mouse knockouts can help in classifying neurodevelopmental genes. Our methods can be applied broadly to other diseases helping in prioritizing newly identified genetic variation that emerge from disease gene discovery based on WES and WGS.
Collapse
Affiliation(s)
- Yan Kou
- Department of Psychiatry, Mount Sinai School of Medicine, One Gustave L Levy Place, Box 1668, New York, NY 10029, USA
| | | | | | | | | |
Collapse
|
41
|
Le DH, Kwon YK. GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Comput Biol Chem 2012; 37:17-23. [PMID: 22430954 DOI: 10.1016/j.compbiolchem.2012.02.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Revised: 01/10/2012] [Accepted: 02/20/2012] [Indexed: 11/18/2022]
Abstract
Finding genes associated with a disease is an important issue in the biomedical area and many gene prioritization methods have been proposed for this goal. Among these, network-based approaches are recently proposed and outperformed functional annotation-based ones. Here, we introduce a novel Cytoscape plug-in, GPEC, to help identify putative genes likely to be associated with specific diseases or pathways. In the plug-in, gene prioritization is performed through a random walk with restart algorithm, a state-of-the art network-based method, along with a gene/protein relationship network. The plug-in also allows users efficiently collect biomedical evidence for highly ranked candidate genes. A set of known genes, candidate genes and a gene/protein relationship network can be provided in a flexible way.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Water Resources University, 175 Tay Son, Dong Da, Hanoi, Vietnam.
| | | |
Collapse
|
42
|
Xia J, Sun J, Jia P, Zhao Z. Do cancer proteins really interact strongly in the human protein-protein interaction network? Comput Biol Chem 2012; 35:121-5. [PMID: 21666777 DOI: 10.1016/j.compbiolchem.2011.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Protein-protein interaction (PPI) network analysis has been widely applied in the investigation of the mechanisms of diseases, especially cancer. Recent studies revealed that cancer proteins tend to interact more strongly than other categories of proteins, even essential proteins, in the human interactome. However, it remains unclear whether this observation was introduced by the bias towards more cancer studies in humans. Here, we examined this important issue by uniquely comparing network characteristics of cancer proteins with three other sets of proteins in four organisms, three of which (fly, worm, and yeast) whose interactomes are essentially not biased towards cancer or other diseases. We confirmed that cancer proteins had stronger connectivity, shorter distance, and larger betweenness centrality than non-cancer disease proteins, essential proteins, and control proteins. Our statistical evaluation indicated that such observations were overall unlikely attributed to random events. Considering the large size and high quality of the PPI data in the four organisms, the conclusion that cancer proteins interact strongly in the PPI networks is reliable and robust. This conclusion suggests that perturbation of cancer proteins might cause major changes of cellular systems and result in abnormal cell function leading to cancer.
Collapse
Affiliation(s)
- Junfeng Xia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | | | | |
Collapse
|
43
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
44
|
Hsu CL, Huang YH, Hsu CT, Yang UC. Prioritizing disease candidate genes by a gene interconnectedness-based approach. BMC Genomics 2011; 12 Suppl 3:S25. [PMID: 22369140 PMCID: PMC3333184 DOI: 10.1186/1471-2164-12-s3-s25] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Genome-wide disease-gene finding approaches may sometimes provide us with a long list of candidate genes. Since using pure experimental approaches to verify all candidates could be expensive, a number of network-based methods have been developed to prioritize candidates. Such tools usually have a set of parameters pre-trained using available network data. This means that re-training network-based tools may be required when existing biological networks are updated or when networks from different sources are to be tried. Results We developed a parameter-free method, interconnectedness (ICN), to rank candidate genes by assessing the closeness of them to known disease genes in a network. ICN was tested using 1,993 known disease-gene associations and achieved a success rate of ~44% using a protein-protein interaction network under a test scenario of simulated linkage analysis. This performance is comparable with those of other well-known methods and ICN outperforms other methods when a candidate disease gene is not directly linked to known disease genes in a network. Interestingly, we show that a combined scoring strategy could enable ICN to achieve an even better performance (~50%) than other methods used alone. Conclusions ICN, a user-friendly method, can well complement other network-based methods in the context of prioritizing candidate disease genes.
Collapse
Affiliation(s)
- Chia-Lang Hsu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan 11221, Republic of China
| | | | | | | |
Collapse
|
45
|
Levy R, Sobolev V, Edelman M. First- and second-shell metal binding residues in human proteins are disproportionately associated with disease-related SNPs. Hum Mutat 2011; 32:1309-18. [DOI: 10.1002/humu.21573] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Accepted: 07/06/2011] [Indexed: 11/10/2022]
|
46
|
Wang X, Gulbahce N, Yu H. Network-based methods for human disease gene prediction. Brief Funct Genomics 2011; 10:280-93. [PMID: 21764832 DOI: 10.1093/bfgp/elr024] [Citation(s) in RCA: 144] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Despite the considerable progress in disease gene discovery, we are far from uncovering the underlying cellular mechanisms of diseases since complex traits, even many Mendelian diseases, cannot be explained by simple genotype-phenotype relationships. More recently, an increasingly accepted view is that human diseases result from perturbations of cellular systems, especially molecular networks. Genes associated with the same or similar diseases commonly reside in the same neighborhood of molecular networks. Such observations have built the basis for a large collection of computational approaches to find previously unknown genes associated with certain diseases. The majority of the methods are based on protein interactome networks, with integration of other large-scale genomic data or disease phenotype information, to infer how likely it is that a gene is associated with a disease. Here, we review recent, state of the art, network-based methods used for prioritizing disease genes as well as unraveling the molecular basis of human diseases.
Collapse
Affiliation(s)
- Xiujuan Wang
- Department of Biological Statistics and Computational Biology and Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14850, USA
| | | | | |
Collapse
|
47
|
Bauer-Mehren A, Bundschus M, Rautschka M, Mayer MA, Sanz F, Furlong LI. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One 2011; 6:e20284. [PMID: 21695124 PMCID: PMC3114846 DOI: 10.1371/journal.pone.0020284] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Accepted: 04/27/2011] [Indexed: 02/05/2023] Open
Abstract
Background Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult. Principal Findings We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell. Conclusions For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases. Availability The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download.
Collapse
Affiliation(s)
- Anna Bauer-Mehren
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Markus Bundschus
- Institute for Computer Science, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Michael Rautschka
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Miguel A. Mayer
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Laura I. Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
- * E-mail:
| |
Collapse
|
48
|
Abstract
Despite increasing sequencing capacity, genetic disease investigation still frequently results in the identification of loci containing multiple candidate disease genes that need to be tested for involvement in the disease. This process can be expedited by prioritizing the candidates prior to testing. Over the last decade, a large number of computational methods and tools have been developed to assist the clinical geneticist in prioritizing candidate disease genes. In this chapter, we give an overview of computational tools that can be used for this purpose, all of which are freely available over the web.
Collapse
Affiliation(s)
- Martin Oti
- Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, 2010, Darlinghurst, NSW, Australia.
| | | | | |
Collapse
|
49
|
Schlicker A, Lengauer T, Albrecht M. Improving disease gene prioritization using the semantic similarity of Gene Ontology terms. ACTA ACUST UNITED AC 2010; 26:i561-7. [PMID: 20823322 PMCID: PMC2935448 DOI: 10.1093/bioinformatics/btq384] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
MOTIVATION Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level. RESULTS Here, we introduce MedSim, a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology. MedSim uses functional annotations of known disease genes for assessing the similarity of diseases as well as the disease relevance of candidate genes. We benchmarked our approach with genes known to be involved in 99 diseases taken from the OMIM database. Using artificial quantitative trait loci, MedSim achieved excellent performance with an area under the ROC curve of up to 0.90 and a sensitivity of over 70% at 90% specificity when classifying gene products according to their disease relatedness. This performance is comparable or even superior to related methods in the field, albeit using less and thus more easily accessible information. AVAILABILITY MedSim is offered as part of our FunSimMat web service (http://www.funsimmat.de).
Collapse
Affiliation(s)
- Andreas Schlicker
- Max Planck Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Saarbrücken, Germany
| | | | | |
Collapse
|
50
|
Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks. ACTA ACUST UNITED AC 2010; 26:2924-6. [PMID: 20861032 DOI: 10.1093/bioinformatics/btq538] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED DisGeNET is a plugin for Cytoscape to query and analyze human gene-disease networks. DisGeNET allows user-friendly access to a new gene-disease database that we have developed by integrating data from several public sources. DisGeNET permits queries restricted to (i) the original data source, (ii) the association type, (iii) the disease class or (iv) specific gene(s)/disease(s). It represents gene-disease associations in terms of bipartite graphs and provides gene centric and disease centric views of the data. It assists the user in the interpretation and exploration of the genetic basis of human diseases by a variety of built-in functions. Moreover, DisGeNET permits multicolouring of nodes (genes/diseases) according to standard disease classification for expedient visualization. AVAILABILITY DisGeNET is compatible with Cytoscape 2.6.3 and 2.7.0, please visit http://ibi.imim.es/DisGeNET/DisGeNETweb.html for installation guide, user tutorial and download.
Collapse
Affiliation(s)
- Anna Bauer-Mehren
- Research Programme on Biomedical Informatics (GRIB) IMIM, DCEXS, Universitat Pompeu Fabra, C/Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | | |
Collapse
|