1
|
Choi E, Song J, Lee Y, Jeong Y, Jang W. Prioritizing susceptibility genes for the prognosis of male-pattern baldness with transcriptome-wide association study. Hum Genomics 2024; 18:34. [PMID: 38566255 PMCID: PMC10985920 DOI: 10.1186/s40246-024-00591-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND Male-pattern baldness (MPB) is the most common cause of hair loss in men. It can be categorized into three types: type 2 (T2), type 3 (T3), and type 4 (T4), with type 1 (T1) being considered normal. Although various MPB-associated genetic variants have been suggested, a comprehensive study for linking these variants to gene expression regulation has not been performed to the best of our knowledge. RESULTS In this study, we prioritized MPB-related tissue panels using tissue-specific enrichment analysis and utilized single-tissue panels from genotype-tissue expression version 8, as well as cross-tissue panels from context-specific genetics. Through a transcriptome-wide association study and colocalization analysis, we identified 52, 75, and 144 MPB associations for T2, T3, and T4, respectively. To assess the causality of MPB genes, we performed a conditional and joint analysis, which revealed 10, 11, and 54 putative causality genes for T2, T3, and T4, respectively. Finally, we conducted drug repositioning and identified potential drug candidates that are connected to MPB-associated genes. CONCLUSIONS Overall, through an integrative analysis of gene expression and genotype data, we have identified robust MPB susceptibility genes that may help uncover the underlying molecular mechanisms and the novel drug candidates that may alleviate MPB.
Collapse
Affiliation(s)
- Eunyoung Choi
- Department of Life Sciences, Dongguk University, Seoul, 04620, Republic of Korea
| | - Jaeseung Song
- Department of Life Sciences, Dongguk University, Seoul, 04620, Republic of Korea
| | - Yubin Lee
- Department of Life Sciences, Dongguk University, Seoul, 04620, Republic of Korea
| | - Yeonbin Jeong
- Department of Life Sciences, Dongguk University, Seoul, 04620, Republic of Korea
| | - Wonhee Jang
- Department of Life Sciences, Dongguk University, Seoul, 04620, Republic of Korea.
| |
Collapse
|
2
|
Tsare EPG, Klapa MI, Moschonas NK. Protein-protein interaction network-based integration of GWAS and functional data for blood pressure regulation analysis. Hum Genomics 2024; 18:15. [PMID: 38326862 DOI: 10.1186/s40246-023-00565-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/12/2023] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND It is valuable to analyze the genome-wide association studies (GWAS) data for a complex disease phenotype in the context of the protein-protein interaction (PPI) network, as the related pathophysiology results from the function of interacting polyprotein pathways. The analysis may include the design and curation of a phenotype-specific GWAS meta-database incorporating genotypic and eQTL data linking to PPI and other biological datasets, and the development of systematic workflows for PPI network-based data integration toward protein and pathway prioritization. Here, we pursued this analysis for blood pressure (BP) regulation. METHODS The relational scheme of the implemented in Microsoft SQL Server BP-GWAS meta-database enabled the combined storage of: GWAS data and attributes mined from GWAS Catalog and the literature, Ensembl-defined SNP-transcript associations, and GTEx eQTL data. The BP-protein interactome was reconstructed from the PICKLE PPI meta-database, extending the GWAS-deduced network with the shortest paths connecting all GWAS-proteins into one component. The shortest-path intermediates were considered as BP-related. For protein prioritization, we combined a new integrated GWAS-based scoring scheme with two network-based criteria: one considering the protein role in the reconstructed by shortest-path (RbSP) interactome and one novel promoting the common neighbors of GWAS-prioritized proteins. Prioritized proteins were ranked by the number of satisfied criteria. RESULTS The meta-database includes 6687 variants linked with 1167 BP-associated protein-coding genes. The GWAS-deduced PPI network includes 1065 proteins, with 672 forming a connected component. The RbSP interactome contains 1443 additional, network-deduced proteins and indicated that essentially all BP-GWAS proteins are at most second neighbors. The prioritized BP-protein set was derived from the union of the most BP-significant by any of the GWAS-based or the network-based criteria. It included 335 proteins, with ~ 2/3 deduced from the BP PPI network extension and 126 prioritized by at least two criteria. ESR1 was the only protein satisfying all three criteria, followed in the top-10 by INSR, PTN11, CDK6, CSK, NOS3, SH2B3, ATP2B1, FES and FINC, satisfying two. Pathway analysis of the RbSP interactome revealed numerous bioprocesses, which are indeed functionally supported as BP-associated, extending our understanding about BP regulation. CONCLUSIONS The implemented workflow could be used for other multifactorial diseases.
Collapse
Affiliation(s)
- Evridiki-Pandora G Tsare
- Department of General Biology, School of Medicine, University of Patras, Patras, Greece
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas (FORTH/ICE-HT), Patras, Greece
| | - Maria I Klapa
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas (FORTH/ICE-HT), Patras, Greece.
| | - Nicholas K Moschonas
- Department of General Biology, School of Medicine, University of Patras, Patras, Greece.
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas (FORTH/ICE-HT), Patras, Greece.
| |
Collapse
|
3
|
Zhu X, Ma S, Wong WH. Genetic effects of sequence-conserved enhancer-like elements on human complex traits. Genome Biol 2024; 25:1. [PMID: 38167462 PMCID: PMC10759394 DOI: 10.1186/s13059-023-03142-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, 16802, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, 201 Huck Life Sciences Building, University Park, 16802, PA, USA.
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
| | - Shining Ma
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA.
| |
Collapse
|
4
|
Tsai MJ, Jeong S, Yu F, Chen TF, Li PH, Juan HF, Huang JH, Hsu YH. Translating GWAS Findings to Inform Drug Repositioning Strategies for COVID-19 Treatment. Res Sq 2023:rs.3.rs-3443080. [PMID: 37886583 PMCID: PMC10602133 DOI: 10.21203/rs.3.rs-3443080/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
We developed a computational framework that integrates Genome-Wide Association Studies (GWAS) and post-GWAS analyses, designed to facilitate drug repurposing for COVID-19 treatment. The comprehensive approach combines transcriptomic-wide associations, polygenic priority scoring, 3D genomics, viral-host protein-protein interactions, and small-molecule docking. Through GWAS, we identified nine druggable host genes associated with COVID-19 severity and SARS-CoV-2 infection, all of which show differential expression in COVID-19 patients. These genes include IFNAR1, IFNAR2, TYK2, IL10RB, CXCR6, CCR9, and OAS1. We performed an extensive molecular docking analysis of these targets using 553 small molecules derived from five therapeutically enriched categories, namely antibacterials, antivirals, antineoplastics, immunosuppressants, and anti-inflammatories. This analysis, which comprised over 20,000 individual docking analyses, enabled the identification of several promising drug candidates. All results are available via the DockCoV2 database (https://dockcov2.org/drugs/). The computational framework ultimately identified nine potential drug candidates: Peginterferon alfa-2b, Interferon alfa-2b, Interferon beta-1b, Ruxolitinib, Dactinomycin, Rolitetracycline, Irinotecan, Vinblastine, and Oritavancin. While its current focus is on COVID-19, our proposed computational framework can be applied more broadly to assist in drug repurposing efforts for a variety of diseases. Overall, this study underscores the potential of human genetic studies and the utility of a computational framework for drug repurposing in the context of COVID-19 treatment, providing a valuable resource for researchers in this field.
Collapse
|
5
|
Stanzick KJ, Stark KJ, Gorski M, Schödel J, Krüger R, Kronenberg F, Warth R, Heid IM, Winkler TW. KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies. BMC Bioinformatics 2023; 24:355. [PMID: 37735349 PMCID: PMC10512588 DOI: 10.1186/s12859-023-05472-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/08/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have identified hundreds of genetic loci associated with kidney function. By combining these findings with post-GWAS information (e.g., statistical fine-mapping to identify independent association signals and to narrow down signals to causal variants; or different sources of annotation data), new hypotheses regarding physiology and disease aetiology can be obtained. These hypotheses need to be tested in laboratory experiments, for example, to identify new therapeutic targets. For this purpose, the evidence obtained from GWAS and post-GWAS analyses must be processed and presented in a way that they are easily accessible to kidney researchers without specific GWAS expertise. MAIN: Here we present KidneyGPS, a user-friendly web-application that combines genetic variant association for estimated glomerular filtration rate (eGFR) from the Chronic Kidney Disease Genetics consortium with annotation of (i) genetic variants with functional or regulatory effects ("SNP-to-gene" mapping), (ii) genes with kidney phenotypes in mice or human ("gene-to-phenotype"), and (iii) drugability of genes (to support re-purposing). KidneyGPS adopts a comprehensive approach summarizing evidence for all 5906 genes in the 424 GWAS loci for eGFR identified previously and the 35,885 variants in the 99% credible sets of 594 independent signals. KidneyGPS enables user-friendly access to the abundance of information by search functions for genes, variants, and regions. KidneyGPS also provides a function ("GPS tab") to generate lists of genes with specific characteristics thus enabling customizable Gene Prioritisation (GPS). These specific characteristics can be as broad as any gene in the 424 loci with a known kidney phenotype in mice or human; or they can be highly focussed on genes mapping to genetic variants or signals with particularly with high statistical support. KidneyGPS is implemented with RShiny in a modularized fashion to facilitate update of input data ( https://kidneygps.ur.de/gps/ ). CONCLUSION With the focus on kidney function related evidence, KidneyGPS fills a gap between large general platforms for accessing GWAS and post-GWAS results and the specific needs of the kidney research community. This makes KidneyGPS an important platform for kidney researchers to help translate in silico research results into in vitro or in vivo research.
Collapse
Affiliation(s)
- Kira J Stanzick
- Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany
| | - Klaus J Stark
- Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany
| | - Mathias Gorski
- Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany
| | - Johannes Schödel
- Department of Nephrology and Hypertension, Friedrich-Alexander Universität Erlangen-Nürnberg, and Uniklinikum Erlangen, Erlangen, Germany
| | - René Krüger
- Department of Nephrology and Hypertension, Friedrich-Alexander Universität Erlangen-Nürnberg, and Uniklinikum Erlangen, Erlangen, Germany
| | - Florian Kronenberg
- Department of Genetics, Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Richard Warth
- Medical Cell Biology, University of Regensburg, Regensburg, Germany
| | - Iris M Heid
- Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany.
| | - Thomas W Winkler
- Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany.
| |
Collapse
|
6
|
Hongyao HE, Chun JI, Xiaoyan G, Fangfang L, Jing Z, Lin Z, Pengxiang Z, Zengchun L. Associative gene networks reveal novel candidates important for ADHD and dyslexia comorbidity. BMC Med Genomics 2023; 16:208. [PMID: 37667328 PMCID: PMC10478365 DOI: 10.1186/s12920-023-01502-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 03/26/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Attention deficit hyperactivity disorder (ADHD) is commonly associated with developmental dyslexia (DD), which are both prevalent and complicated pediatric neurodevelopmental disorders that have a significant influence on children's learning and development. Clinically, the comorbidity incidence of DD and ADHD is between 25 and 48%. Children with DD and ADHD may have more severe cognitive deficiencies, a poorer level of schooling, and a higher risk of social and emotional management disorders. Furthermore, patients with this comorbidity are frequently treated for a single condition in clinical settings, and the therapeutic outcome is poor. The development of effective treatment approaches against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and treatment. In this study, we developed bioinformatical methodology for the analysis of the comorbidity of these two diseases. As such, the search for candidate genes related to the comorbid conditions of ADHD and DD can help in elucidating the molecular mechanisms underlying the comorbid condition, and can also be useful for genotyping and identifying new drug targets. RESULTS Using the ANDSystem tool, the reconstruction and analysis of gene networks associated with ADHD and dyslexia was carried out. The gene network of ADHD included 599 genes/proteins and 148,978 interactions, while that of dyslexia included 167 genes/proteins and 27,083 interactions. When the ANDSystem and GeneCards data were combined, a total of 213 genes/proteins for ADHD and dyslexia were found. An approach for ranking genes implicated in the comorbid condition of the two diseases was proposed. The approach is based on ten criteria for ranking genes by their importance, including relevance scores of association between disease and genes, standard methods of gene prioritization, as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analyzed genes. Among the top 20 genes with the highest priority DRD2, DRD4, CNTNAP2 and GRIN2B are mentioned in the literature as directly linked with the comorbidity of ADHD and dyslexia. According to the proposed approach, the genes OPRM1, CHRNA4 and SNCA had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the most relevant genes are involved in biological processes related to signal transduction, positive regulation of transcription from RNA polymerase II promoters, chemical synaptic transmission, response to drugs, ion transmembrane transport, nervous system development, cell adhesion, and neuron migration. CONCLUSIONS The application of methods of reconstruction and analysis of gene networks is a powerful tool for studying the molecular mechanisms of comorbid conditions. The method put forth to rank genes by their importance for the comorbid condition of ADHD and dyslexia was employed to predict genes that play key roles in the development of the comorbid condition. The results can be utilized to plan experiments for the identification of novel candidate genes and search for novel pharmacological targets.
Collapse
Affiliation(s)
- H E Hongyao
- Medical College of Shihezi University, Shihezi, China
| | - J I Chun
- Medical College of Shihezi University, Shihezi, China
| | - Gao Xiaoyan
- Medical College of Shihezi University, Shihezi, China
| | - Liu Fangfang
- Medical College of Shihezi University, Shihezi, China
| | - Zhang Jing
- Medical College of Shihezi University, Shihezi, China
| | - Zhong Lin
- Medical College of Shihezi University, Shihezi, China
| | - Zuo Pengxiang
- Medical College of Shihezi University, Shihezi, China.
| | - Li Zengchun
- Medical College of Shihezi University, Shihezi, China.
| |
Collapse
|
7
|
Pinakhina D, Loboda A, Sergushichev A, Artomov M. Gene, cell type, and drug prioritization analysis suggest genetic basis for the utility of diuretics in treating Alzheimer disease. HGG Adv 2023; 4:100203. [PMID: 37250495 PMCID: PMC10209737 DOI: 10.1016/j.xhgg.2023.100203] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 04/25/2023] [Indexed: 05/31/2023] Open
Abstract
We introduce a user-friendly tool for risk gene, cell type, and drug prioritization for complex traits: GCDPipe. It uses gene-level GWAS-derived data and gene expression data to train a model for the identification of disease risk genes and relevant cell types. Gene prioritization information is then coupled with known drug target data to search for applicable drug agents based on their estimated functional effects on the identified risk genes. We illustrate the utility of our approach in different settings: identification of the cell types, implicated in disease pathogenesis, was tested in inflammatory bowel disease (IBD) and Alzheimer disease (AD); gene target and drug prioritization was tested in IBD and schizophrenia. The analysis of phenotypes with known disease-affected cell types and/or existing drug candidates shows that GCDPipe is an effective tool to unify genetic risk factors with cellular context and known drug targets. Next, analysis of the AD data with GCDPipe suggested that gene targets of diuretics, as an Anatomical Therapeutic Chemical drug subgroup, are significantly enriched among the genes prioritized by GCDPipe, indicating their possible effect on the course of the disease.
Collapse
Affiliation(s)
- Daria Pinakhina
- ITMO University, 197101 Saint Petersburg, Russia
- Bekhterev National Medical Research Center, 192019 Saint Petersburg, Russia
| | - Alexander Loboda
- ITMO University, 197101 Saint Petersburg, Russia
- Almazov National Medical Research Center, 191014 Saint Petersburg, Russia
| | | | - Mykyta Artomov
- ITMO University, 197101 Saint Petersburg, Russia
- Broad Institute, Cambridge, MA 02142, USA
- Massachusetts General Hospital, Boston, MA 02114, USA
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA
| |
Collapse
|
8
|
Zhang L, Fan S, Vera J, Lai X. A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer. Comput Struct Biotechnol J 2022; 21:34-45. [PMID: 36514340 PMCID: PMC9732137 DOI: 10.1016/j.csbj.2022.11.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer is a heterogeneous disease mainly driven by abnormal gene perturbations in regulatory networks. Therefore, it is appealing to identify the common and specific perturbed genes from multiple cancer networks. We developed an integrative network medicine approach to identify novel biomarkers and investigate drug repurposing across cancer types. We used a network-based method to prioritize genes in cancer-specific networks reconstructed using human transcriptome and interactome data. The prioritized genes show extensive perturbation and strong regulatory interaction with other highly perturbed genes, suggesting their vital contribution to tumorigenesis and tumor progression, and are therefore regarded as cancer genes. The cancer genes detected show remarkable performances in discriminating tumors from normal tissues and predicting survival times of cancer patients. Finally, we developed a network proximity approach to systematically screen drugs and identified dozens of candidates with repurposable potential in several cancer types. Taken together, we demonstrated the power of the network medicine approach to identify novel biomarkers and repurposable drugs in multiple cancer types. We have also made the data and code freely accessible to ensure reproducibility and reusability of the developed computational workflow.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Shiwei Fan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany,BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland,Corresponding author at: Universitätsklinikum Erlangen, Erlangen, Germany; Tampere University, Tampere, Finland.
| |
Collapse
|
9
|
Azadifar S, Ahmadi A. A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning. BMC Bioinformatics 2022; 23:422. [PMID: 36241966 PMCID: PMC9563530 DOI: 10.1186/s12859-022-04954-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/20/2022] [Indexed: 11/18/2022] Open
Abstract
Background Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. Methods In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. Results Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. Conclusion This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.
Collapse
Affiliation(s)
- Saeid Azadifar
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.
| | - Ali Ahmadi
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
10
|
Ji Y, Chen R, Wang Q, Wei Q, Tao R, Li B. A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization. BMC Bioinformatics 2022; 23:146. [PMID: 35459094 PMCID: PMC9034518 DOI: 10.1186/s12859-022-04616-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 02/15/2022] [Indexed: 12/03/2022] Open
Abstract
Background Autism spectrum disorder (ASD) is a group of complex neurodevelopment disorders with a strong genetic basis. Large scale sequencing studies have identified over one hundred ASD risk genes. Nevertheless, the vast majority of ASD risk genes remain to be discovered, as it is estimated that more than 1000 genes are likely to be involved in ASD risk. Prioritization of risk genes is an effective strategy to increase the power of identifying novel risk genes in genetics studies of ASD. As ASD risk genes are likely to exhibit distinct properties from multiple angles, we reason that integrating multiple levels of genomic data is a powerful approach to pinpoint genuine ASD risk genes. Results We present BNScore, a Bayesian model selection framework to probabilistically prioritize ASD risk genes through explicitly integrating evidence from sequencing-identified ASD genes, biological annotations, and gene functional network. We demonstrate the validity of our approach and its improved performance over existing methods by examining the resulting top candidate ASD risk genes against sets of high-confidence benchmark genes and large-scale ASD genome-wide association studies. We assess the tissue-, cell type- and development stage-specific expression properties of top prioritized genes, and find strong expression specificity in brain tissues, striatal medium spiny neurons, and fetal developmental stages. Conclusions In summary, we show that by integrating sequencing findings, functional annotation profiles, and gene-gene functional network, our proposed BNScore provides competitive performance compared to current state-of-the-art methods in prioritizing ASD genes. Our method offers a general and flexible strategy to risk gene prioritization that can potentially be applied to other complex traits as well. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04616-y.
Collapse
Affiliation(s)
- Ying Ji
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Quan Wang
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA. .,Department of Biostatistics, Vanderbilt University, Nashville, TN, 37212, USA.
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA. .,Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA.
| |
Collapse
|
11
|
Dashti H, Dehzangi I, Bayati M, Breen J, Beheshti A, Lovell N, Rabiee HR, Alinejad-Rokny H. Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer. BMC Bioinformatics 2022; 23:138. [PMID: 35439935 PMCID: PMC9017053 DOI: 10.1186/s12859-022-04652-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/24/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. RESULTS In this study, we develop a new pipeline based on a novel concept called 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. CONCLUSION Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Collapse
Affiliation(s)
- Hamed Dashti
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - Iman Dehzangi
- Center for Computational and Integrative Biology (CCIB), Rutgers University, Camden, NJ, 08102, USA
| | - Masroor Bayati
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - James Breen
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.,Robinson Research Institute, University of Adelaide, Adelaide, SA, 5006, Australia.,Bioinformatics Hub, University of Adelaide, Adelaide, SA, 5006, Australia
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, 2109, Australia
| | - Nigel Lovell
- Tyree Institute of Health Engineering and The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Hamid R Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran.
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia. .,UNSW Data Science Hub, The University of New South Wales, Sydney, NSW, 2052, Australia. .,Health Data Analytics Program, AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia.
| |
Collapse
|
12
|
Manshaei R, DeLong S, Andric V, Joshi E, Okello JBA, Dhir P, Somerville C, Farncombe KM, Kalbfleisch K, Jobling RK, Scherer SW, Kim RH, Hosseini SM. GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation. BMC Med Genomics 2022; 15:31. [PMID: 35180879 PMCID: PMC8857790 DOI: 10.1186/s12920-022-01166-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 01/25/2022] [Indexed: 11/16/2022] Open
Abstract
Background Variant interpretation is the main bottleneck in medical genomic sequencing efforts. This usually involves genome analysts manually searching through a multitude of independent databases, often with the aid of several, mostly independent, computational tools. To streamline variant interpretation, we developed the GeneTerpret platform which collates data from current interpretation tools and databases, and applies a phenotype-driven query to categorize the variants identified in the genome(s). The platform assigns quantitative validity scores to genes by query and assembly of the genotype–phenotype data, sequence homology, molecular interactions, expression data, and animal models. It also uses the American College of Medical Genetics and Genomics (ACMG) criteria to categorize variants into five tiers of pathogenicity. The final output is a prioritized list of potentially causal variants/genes.
Results We tested GeneTerpret by comparing its performance to expert-curated genes (ClinGen’s gene-validity database) and variant pathogenicity reports (DECIPHER database). Output from GeneTerpret was 97.2% and 83.5% concordant with the expert-curated sources, respectively. Additionally, similar concordance was observed when GeneTerpret’s performance was compared with our internal expert-interpreted clinical datasets. Conclusions GeneTerpret is a flexible platform designed to streamline the genome interpretation process, through a unique interface, with improved ease, speed and accuracy. This modular and customizable system allows the user to tailor the component-programs in the analysis process to their preference. GeneTerpret is available online at https://geneterpret.com. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01166-3.
Collapse
Affiliation(s)
- Roozbeh Manshaei
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada
| | - Sean DeLong
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
| | - Veronica Andric
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada
| | - Esha Joshi
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - John B A Okello
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,MIT Sloan School of Management, Massachusetts Institute of Technology, 100 Main Street, Cambridge, MA, 02142, USA
| | - Priya Dhir
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,Faculty of Medicine, University of Toronto, Toronto, ON, M5S1A8, Canada
| | - Cherith Somerville
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada
| | - Kirsten M Farncombe
- Ted Rogers Centre for Heart Research, Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada
| | - Kelsey Kalbfleisch
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Rebekah K Jobling
- Ted Rogers Centre for Heart Research, Cardiac Genome Clinic, The Hospital for Sick Children, Toronto, ON, Canada.,Genome Diagnostics, Department of Pediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada.,Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.,Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.,Centre for Genetic Medicine, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Raymond H Kim
- Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada. .,Fred A. Litwin Family Centre in Genetic Medicine, University Health Network, Department of Medicine, University of Toronto, Toronto, ON, Canada.
| | - S Mohsen Hosseini
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
13
|
Novikova G, Andrews SJ, Renton AE, Marcora E. Beyond association: successes and challenges in linking non-coding genetic variation to functional consequences that modulate Alzheimer's disease risk. Mol Neurodegener 2021; 16:27. [PMID: 33882988 PMCID: PMC8061035 DOI: 10.1186/s13024-021-00449-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 04/13/2021] [Indexed: 02/06/2023] Open
Abstract
Alzheimer's disease (AD) is the most common type of dementia, affecting millions of people worldwide; however, no disease-modifying treatments are currently available. Genome-wide association studies (GWASs) have identified more than 40 loci associated with AD risk. However, most of the disease-associated variants reside in non-coding regions of the genome, making it difficult to elucidate how they affect disease susceptibility. Nonetheless, identification of the regulatory elements, genes, pathways and cell type/tissue(s) impacted by these variants to modulate AD risk is critical to our understanding of disease pathogenesis and ability to develop effective therapeutics. In this review, we provide an overview of the methods and approaches used in the field to identify the functional effects of AD risk variants in the causal path to disease risk modification as well as describe the most recent findings. We first discuss efforts in cell type/tissue prioritization followed by recent progress in candidate causal variant and gene nomination. We discuss statistical methods for fine-mapping as well as approaches that integrate multiple levels of evidence, such as epigenomic and transcriptomic data, to identify causal variants and risk mechanisms of AD-associated loci. Additionally, we discuss experimental approaches and data resources that will be needed to validate and further elucidate the effects of these variants and genes on biological pathways, cellular phenotypes and disease risk. Finally, we discuss future steps that need to be taken to ensure that AD GWAS functional mapping efforts lead to novel findings and bring us closer to finding effective treatments for this devastating disease.
Collapse
Affiliation(s)
- Gloriia Novikova
- Ronald M. Loeb Center for Alzheimer's Disease, Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shea J Andrews
- Ronald M. Loeb Center for Alzheimer's Disease, Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alan E Renton
- Ronald M. Loeb Center for Alzheimer's Disease, Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Edoardo Marcora
- Ronald M. Loeb Center for Alzheimer's Disease, Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
14
|
Heidari R, Akbariqomi M, Asgari Y, Ebrahimi D, Alinejad-Rokny H. A systematic review of long non-coding RNAs with a potential role in breast cancer. Mutat Res Rev Mutat Res 2021; 787:108375. [PMID: 34083033 DOI: 10.1016/j.mrrev.2021.108375] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/07/2021] [Accepted: 04/12/2021] [Indexed: 12/13/2022]
Abstract
The human transcriptome contains many non-coding RNAs (ncRNAs), which play important roles in gene regulation. Long noncoding RNAs (lncRNAs) are an important class of ncRNAs with lengths between 200 and 200,000 bases. Unlike mRNA, lncRNA lacks protein-coding features, specifically, open-reading frames, and start and stop codons. LncRNAs have been reported to play a role in the pathogenesis and progression of many cancers, including breast cancer (BC), acting as tumor suppressors or oncogenes. In this review, we systematically mined the literature to identify 65 BC-related lncRNAs. We then perform an integrative bioinformatics analysis to identify 14 lncRNAs with a potential regulatory role in BC. The biological function of these 14 lncRNAs, their regulatory mechanisms, and roles in the initiation and progression of BC are discussed in this review. Additionally, we elaborate on the current and future applications of lncRNAs as diagnostic and/or therapeutic biomarkers in BC.
Collapse
Affiliation(s)
- Reza Heidari
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran; Department of Molecular Medicine, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mostafa Akbariqomi
- Department of Molecular Medicine, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Yazdan Asgari
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Diako Ebrahimi
- Biomedical Informatics Lab, Texas Biomedical Research Institute, San Antonio, TX, 78227, United States
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia; Core Member of UNSW Data Science Hub, The University of New South Wales (UNSW Sydney), Sydney, NSW, 2052, Australia; Health Data Analytics Program Leader, AI-enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia.
| |
Collapse
|
15
|
Khatoon F, Prasad K, Kumar V. Neurological manifestations of COVID-19: available evidences and a new paradigm. J Neurovirol 2020; 26:619-630. [PMID: 32839951 PMCID: PMC7444681 DOI: 10.1007/s13365-020-00895-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 07/17/2020] [Accepted: 08/14/2020] [Indexed: 01/01/2023]
Abstract
The recent pandemic outbreak of coronavirus is pathogenic and a highly transmittable viral infection caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2). In this time of ongoing pandemic, many emerging reports suggested that the SARS-CoV-2 has inimical effects on neurological functions, and even causes serious neurological damage. The neurological symptoms associated with COVID-19 include headache, dizziness, depression, anosmia, encephalitis, stroke, epileptic seizures, and Guillain-Barre syndrome along with many others. The involvement of the CNS may be related with poor prognosis and disease worsening. Here, we review the evidence of nervous system involvement and currently known neurological manifestations in COVID-19 infections caused by SARS-CoV-2. We prioritize the 332 human targets of SARS-CoV-2 according to their association with brain-related disease and identified 73 candidate genes. We prioritize these 73 genes according to their spatio-temporal expression in the different regions of brain and also through evolutionary intolerance analysis. The prioritized genes could be considered potential indicators of COVID-19-associated neurological symptoms and thus act as a possible therapeutic target for the prevention and treatment of CNS manifestations associated with COVID-19 patients.
Collapse
Affiliation(s)
- Fatima Khatoon
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Kartikay Prasad
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vijay Kumar
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, Uttar Pradesh, 201303, India.
| |
Collapse
|
16
|
Seth S, Debnath S, Chakraborty N. In silico analysis of functional linkage among arsenic induced MATE genes in rice. Biotechnol Rep (Amst) 2020; 26:e00390. [PMID: 32435604 PMCID: PMC7231838 DOI: 10.1016/j.btre.2019.e00390] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/21/2019] [Accepted: 10/21/2019] [Indexed: 10/27/2022]
Abstract
MATE genes play an important role in cellular detoxification processes. Nine MATE genes were identified by a transcriptomics study previously. Candidate gene prioritization was done where 29 new genes were found to interact with 09 guide genes. Therefore, a total of 38 genes were analyzed here to predict a concise model by gene prioritization study. Those genes were analyzed further in Rice Interactions Viewer programme, and based on high ICV, 10 new genes were found to interact among themselves at protein level. Surprisingly, only 05 genes were found to play a key role at protein level. These 15 genes were analyzed for their interaction with soil available inorganic arsenic species. Maximum expression levels were found mostly at young inflorescence and seed development stage for those genes. So, these genes may have a direct role in arsenic sequestration from cells and thereby providing safety to the developing embryo within the seed.
Collapse
Affiliation(s)
- Snigdhamayee Seth
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| | - Sandip Debnath
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| | - N.R. Chakraborty
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| |
Collapse
|
17
|
Hur B, Kang D, Lee S, Moon JH, Lee G, Kim S. Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments. BMC Bioinformatics 2019; 20:667. [PMID: 31881980 PMCID: PMC6941187 DOI: 10.1186/s12859-019-3302-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 12/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. RESULTS We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. CONCLUSIONS We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianet.
Collapse
Affiliation(s)
- Benjamin Hur
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Ji Hwan Moon
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Gung Lee
- National Creative Research Initiatives Center for Adipose Tissue Remodeling, Institute of Molecular Biology and Genetics, Department of Biological Sciences, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea. .,Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea. .,Bioinformatics Institute, Seoul National University, 1 Gwanak-ro, Seoul, Korea.
| |
Collapse
|
18
|
Jiang Y, Wu C, Zhang Y, Zhang S, Yu S, Lei P, Lu Q, Xi Y, Wang H, Song Z. GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining. BMC Med Genomics 2019; 12:193. [PMID: 31856831 PMCID: PMC6923899 DOI: 10.1186/s12920-019-0637-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open
Abstract
Background An important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow includes annotation, filtration, manual inspection and literature review. Those steps are time-consuming and error-prone in the absence of systematic support. Therefore, we developed GTX.Digest.VCF, an online DNA sequencing interpretation system, which prioritizes genes and variants for novel disease-gene relation discovery and integrates text mining results to provide literature evidence for the discovery. Its phenotype-driven ranking and biological data mining approach significantly speed up the whole interpretation process. Results The GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes. Conclusions GTX.Digest.VCF provides an intelligent web portal for genomics data interpretation via the integration of bioinformatics tools, distributed parallel computing, biomedical text mining. It can facilitate the application of genomic analytics in clinical research and practices.
Collapse
Affiliation(s)
| | - Chengkun Wu
- State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Yanghui Zhang
- NHC key laboratory of birth defects research, prevention and treatment (Hunan Provincial Maternal and Child Health Care Hospital), NO.53 Xiangchun Road, Changsha, 410008, Hunan, China
| | - Shaowei Zhang
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Shuojun Yu
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Peng Lei
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Qin Lu
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Yanwei Xi
- Cytogenetics and Human Molecular Genetics Laboratories, Royal University Hospital, Saskatoon, SK, Canada
| | - Hua Wang
- NHC key laboratory of birth defects research, prevention and treatment (Hunan Provincial Maternal and Child Health Care Hospital), NO.53 Xiangchun Road, Changsha, 410008, Hunan, China. .,Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410073, China.
| | - Zhuo Song
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China.
| |
Collapse
|
19
|
Tyler AL, Raza A, Krementsov DN, Case LK, Huang R, Ma RZ, Blankenhorn EP, Teuscher C, Mahoney JM. Network-Based Functional Prediction Augments Genetic Association To Predict Candidate Genes for Histamine Hypersensitivity in Mice. G3 (Bethesda) 2019; 9:4223-33. [PMID: 31645420 DOI: 10.1534/g3.119.400740] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Genetic mapping is a primary tool of genetics in model organisms; however, many quantitative trait loci (QTL) contain tens or hundreds of positional candidate genes. Prioritizing these genes for validation is often ad hoc and biased by previous findings. Here we present a technique for prioritizing positional candidates based on computationally inferred gene function. Our method uses machine learning with functional genomic networks, whose links encode functional associations among genes, to identify network-based signatures of functional association to a trait of interest. We demonstrate the method by functionally ranking positional candidates in a large locus on mouse Chr 6 (45.9 Mb to 127.8 Mb) associated with histamine hypersensitivity (Histh). Histh is characterized by systemic vascular leakage and edema in response to histamine challenge, which can lead to multiple organ failure and death. Although Histh risk is strongly influenced by genetics, little is known about its underlying molecular or genetic causes, due to genetic and physiological complexity of the trait. To dissect this complexity, we ranked genes in the Histh locus by predicting functional association with multiple Histh-related processes. We integrated these predictions with new single nucleotide polymorphism (SNP) association data derived from a survey of 23 inbred mouse strains and congenic mapping data. The top-ranked genes included Cxcl12, Ret, Cacna1c, and Cntn3, all of which had strong functional associations and were proximal to SNPs segregating with Histh. These results demonstrate the power of network-based computational methods to nominate highly plausible quantitative trait genes even in challenging cases involving large QTL and extreme trait complexity.
Collapse
|
20
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
21
|
Saik OV, Nimaev VV, Usmonov DB, Demenkov PS, Ivanisenko TV, Lavrik IN, Ivanisenko VA. Prioritization of genes involved in endothelial cell apoptosis by their implication in lymphedema using an analysis of associative gene networks with ANDSystem. BMC Med Genomics 2019; 12:47. [PMID: 30871556 PMCID: PMC6417156 DOI: 10.1186/s12920-019-0492-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Currently, more than 150 million people worldwide suffer from lymphedema. It is a chronic progressive disease characterized by high-protein edema of various parts of the body due to defects in lymphatic drainage. Molecular-genetic mechanisms of the disease are still poorly understood. Beginning of a clinical manifestation of primary lymphedema in middle age and the development of secondary lymphedema after treatment of breast cancer can be genetically determined. Disruption of endothelial cell apoptosis can be considered as one of the factors contributing to the development of lymphedema. However, a study of the relationship between genes associated with lymphedema and genes involved in endothelial apoptosis, in the associative gene network was not previously conducted. METHODS In the current work, we used well-known methods (ToppGene and Endeavour), as well as methods previously developed by us, to prioritize genes involved in endothelial apoptosis and to find potential participants of molecular-genetic mechanisms of lymphedema among them. Original methods of prioritization took into account the overrepresented Gene Ontology biological processes, the centrality of vertices in the associative gene network, describing the interactions of endothelial apoptosis genes with genes associated with lymphedema, and the association of the analyzed genes with diseases that are comorbid to lymphedema. RESULTS An assessment of the quality of prioritization was performed using criteria, which involved an analysis of the enrichment of the top-most priority genes by genes, which are known to have simultaneous interactions with lymphedema and endothelial cell apoptosis, as well as by genes differentially expressed in murine model of lymphedema. In particular, among genes involved in endothelial apoptosis, KDR, TNF, TEK, BMPR2, SERPINE1, IL10, CD40LG, CCL2, FASLG and ABL1 had the highest priority. The identified priority genes can be considered as candidates for genotyping in the studies involving the search for associations with lymphedema. CONCLUSIONS Analysis of interactions of these genes in the associative gene network of lymphedema can improve understanding of mechanisms of interaction between endothelial apoptosis and lymphangiogenesis, and shed light on the role of disturbance of these processes in the development of edema, chronic inflammation and connective tissue transformation during the progression of the disease.
Collapse
Affiliation(s)
- Olga V. Saik
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Vadim V. Nimaev
- Laboratory of Surgical Lymphology and Lymphodetoxication, Research Institute of Clinical and Experimental Lymрhology – Branch of the Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, st. Timakova 2, Novosibirsk, 630117 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Dilovarkhuja B. Usmonov
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
- Department of Neurosurgery, Ya. L. Tsivyan Novosibirsk Research Institute of Traumatology and Orthopedics, Ministry of Health of the Russian Federation, st. Frunze 17, Novosibirsk, 630091 Russia
| | - Pavel S. Demenkov
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Timofey V. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Inna N. Lavrik
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Translational Inflammation Research, Institute of Experimental Internal Medicine, Otto von Guericke University Magdeburg, Medical Faculty, Pfalzer Platz 28, 39106 Magdeburg, Germany
| | - Vladimir A. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| |
Collapse
|
22
|
Abstract
Most biological processes including diseases are multifactorial and determined by a complex interplay of various genetic and environmental factors. This chapter aims to provide a user guide to data querying, analysis, and visualization with TargetMine and the associated auxiliary toolkit. We have also discussed some of the commonly used data queries for the researchers who are interested in gene set analysis within a data warehouse framework. Overall, TargetMine provides a convenient web browser-based interface that enables the discovery of new hypotheses interactively, by performing analysis of omics data using complicated searches without any scripting and programming efforts on the part of the user and also by providing the results in an easy-to-comprehend output format.
Collapse
Affiliation(s)
- Yi-An Chen
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, Japan
| | - Lokesh P Tripathi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, Japan.
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, Japan.
| |
Collapse
|
23
|
Teng Y, Ding Y, Zhang M, Chen X, Wang X, Yu H, Liu C, Lv H, Zhang R. Genome-wide haplotype association study identifies risk genes for non-small cell lung cancer. J Theor Biol 2018; 456:84-90. [PMID: 30096405 DOI: 10.1016/j.jtbi.2018.08.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 08/05/2018] [Accepted: 08/06/2018] [Indexed: 02/07/2023]
Abstract
Lung cancer is the leading cause of cancer-related death worldwide. Most lung cancer is non-small cell lung cancer (NSCLC), in which malignant cells form in the lung epithelium. Mutations in multiple genes and environmental factors both contribute to NSCLC, and although some NSCLC susceptibility genes have been characterized, the pathogenesis of this disease remains unclear. To identify genes conferring NSCLC risk and determine their associated pathological mechanism, we combined genome-wide haplotype associated analysis with gene prioritization using 224,677 SNPs in 37 NSCLC cell lines and 116 unrelated European individuals. Five candidate genes were identified: ESR1, TGFBR1, INSR, CDH3, and MAP3K5. All of these have previously been implicated in NSCLC, with the exception of CDH3, which can therefore be considered a novel indicator of NSCLC risk. Functional annotation confirmed the relationship between these five genes and NSCLC. Our findings are indicative of the underlying pathological mechanisms of NSCLC and provide information to support future directions in diagnosing and treating NSCLC.
Collapse
|
24
|
Rao A, VG S, Joseph T, Kotte S, Sivadasan N, Srinivasan R. Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks. BMC Med Genomics 2018; 11:57. [PMID: 29980210 PMCID: PMC6035401 DOI: 10.1186/s12920-018-0372-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 05/31/2018] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND One of the major goals of genomic medicine is the identification of causal genomic variants in a patient and their relation to the observed clinical phenotypes. Prioritizing the genomic variants by considering only the genotype information usually identifies a few hundred potential variants. Narrowing it down further to find the causal disease genes and relating them to the observed clinical phenotypes remains a significant challenge, especially for rare diseases. METHODS We propose a phenotype-driven gene prioritization approach using heterogeneous networks in the context of rare diseases. Towards this, we first built a heterogeneous network consisting of ontological associations as well as curated associations involving genes, diseases, phenotypes and pathways from multiple sources. Motivated by the recent progress in spectral graph convolutions, we developed a graph convolution based technique to infer new phenotype-gene associations from this initial set of associations. We included these inferred associations in the initial network and termed this integrated network HANRD (Heterogeneous Association Network for Rare Diseases). We validated this approach on 230 recently published rare disease clinical cases using the case phenotypes as input. RESULTS When HANRD was queried with the case phenotypes as input, the causal genes were captured within Top-50 for more than 31% of the cases and within Top-200 for more than 56% of the cases. The results showed improved performance when compared to other state-of-the-art tools. CONCLUSIONS In this study, we showed that the heterogeneous network HANRD, consisting of curated, ontological and inferred associations, helped improve causal gene identification in rare diseases. HANRD allows future enhancements by supporting incorporation of new entity types and additional information sources.
Collapse
Affiliation(s)
- Aditya Rao
- TCS Research and Innovation, Hyderabad, 500081 India
| | - Saipradeep VG
- TCS Research and Innovation, Hyderabad, 500081 India
| | - Thomas Joseph
- TCS Research and Innovation, Hyderabad, 500081 India
| | - Sujatha Kotte
- TCS Research and Innovation, Hyderabad, 500081 India
| | | | | |
Collapse
|
25
|
Su L, Liu G, Bai T, Meng X, Ma Q. MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization. BMC Bioinformatics 2018; 19:215. [PMID: 29871590 DOI: 10.1186/s12859-018-2216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. Results In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. Conclusions This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2216-0) contains supplementary material, which is available to authorized users.
Collapse
|
26
|
Iourov IY, Zelenova MA, Vorsanova SG, Voinova VV, Yurov YB. 4q21.2q21.3 Duplication: Molecular and Neuropsychological Aspects. Curr Genomics 2018; 19:173-178. [PMID: 29606904 PMCID: PMC5850505 DOI: 10.2174/1389202918666170717161426] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 11/20/2016] [Accepted: 01/20/2017] [Indexed: 12/15/2022] Open
Abstract
During the last decades, a large amount of newly described microduplications and microdeletions associated with intellectual disability (ID) and related neuropsychiatric diseases have been discovered. However, due to natural limitations, a significant part of them has not been the focus of multidisciplinary approaches. Here, we address previously undescribed chromosome 4q21.2q21.3 microduplication for gene prioritization, evaluation of cognitive abilities and estimation of genomic mechanisms for brain dysfunction by molecular cytogenetic (cytogenomic) and gene expression (meta-) analyses as well as for neuropsychological assessment. We showed that duplication at 4q21.2q21.3 is associated with moderate ID, cognitive deficits, developmental delay, language impairment, memory and attention problems, facial dysmorphisms, congenital heart defect and dentinogenesis imperfecta. Gene-expression meta-analysis prioritized the following genes: ENOPH1, AFF1, DSPP, SPARCL1, and SPP1. Furthermore, genotype/phenotype correlations allowed the attribution of each gene gain to each phenotypic feature. Neuropsychological testing showed visual-perceptual and fine motor skill deficits, reduced attention span, deficits of the nominative function and problems in processing both visual and aural information. Finally, emerging approaches including molecular cytogenetic, bioinformatic (genome/epigenome meta-analysis) and neuropsychological methods are concluded to be required for comprehensive neurological, genetic and neuropsychological descriptions of new genomic rearrangements/diseases associated with ID.
Collapse
Affiliation(s)
- Ivan Y Iourov
- Mental Health Research Center, Moscow, Russian Federation.,Separated Structural Unit "Clinical Research Institute of Pediatrics named after Y.E Veltishev", Pirogov Russian National Research Medical University, Ministry of Health, Moscow, Russian Federation.,Department of Medical Genetics, Russian Medical Academy of Postgraduate Education, Ministry of Health, Moscow, Russian Federation
| | - Maria A Zelenova
- Mental Health Research Center, Moscow, Russian Federation.,Separated Structural Unit "Clinical Research Institute of Pediatrics named after Y.E Veltishev", Pirogov Russian National Research Medical University, Ministry of Health, Moscow, Russian Federation.,Moscow State University of Psychology and Education, Moscow, Russian Federation
| | - Svetlana G Vorsanova
- Mental Health Research Center, Moscow, Russian Federation.,Separated Structural Unit "Clinical Research Institute of Pediatrics named after Y.E Veltishev", Pirogov Russian National Research Medical University, Ministry of Health, Moscow, Russian Federation.,Moscow State University of Psychology and Education, Moscow, Russian Federation
| | - Victoria V Voinova
- Mental Health Research Center, Moscow, Russian Federation.,Separated Structural Unit "Clinical Research Institute of Pediatrics named after Y.E Veltishev", Pirogov Russian National Research Medical University, Ministry of Health, Moscow, Russian Federation.,Moscow State University of Psychology and Education, Moscow, Russian Federation
| | - Yuri B Yurov
- Mental Health Research Center, Moscow, Russian Federation.,Separated Structural Unit "Clinical Research Institute of Pediatrics named after Y.E Veltishev", Pirogov Russian National Research Medical University, Ministry of Health, Moscow, Russian Federation.,Moscow State University of Psychology and Education, Moscow, Russian Federation
| |
Collapse
|
27
|
Saik OV, Demenkov PS, Ivanisenko TV, Bragina EY, Freidin MB, Goncharova IA, Dosenko VE, Zolotareva OI, Hofestaedt R, Lavrik IN, Rogaev EI, Ivanisenko VA. Novel candidate genes important for asthma and hypertension comorbidity revealed from associative gene networks. BMC Med Genomics 2018; 11:15. [PMID: 29504915 PMCID: PMC6389037 DOI: 10.1186/s12920-018-0331-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Hypertension and bronchial asthma are a major issue for people's health. As of 2014, approximately one billion adults, or ~ 22% of the world population, have had hypertension. As of 2011, 235-330 million people globally have been affected by asthma and approximately 250,000-345,000 people have died each year from the disease. The development of the effective treatment therapies against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and their treatment. Hence, in this study the bioinformatical methodology for the analysis of the comorbidity of these two diseases have been developed. As such, the search for candidate genes related to the comorbid conditions of asthma and hypertension can help in elucidating the molecular mechanisms underlying the comorbid condition of these two diseases, and can also be useful for genotyping and identifying new drug targets. RESULTS Using ANDSystem, the reconstruction and analysis of gene networks associated with asthma and hypertension was carried out. The gene network of asthma included 755 genes/proteins and 62,603 interactions, while the gene network of hypertension - 713 genes/proteins and 45,479 interactions. Two hundred and five genes/proteins and 9638 interactions were shared between asthma and hypertension. An approach for ranking genes implicated in the comorbid condition of two diseases was proposed. The approach is based on nine criteria for ranking genes by their importance, including standard methods of gene prioritization (Endeavor, ToppGene) as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analysed genes. According to the proposed approach, the genes IL10, TLR4, and CAT had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the list of top genes is enriched with apoptotic genes and genes involved in biological processes related to the functioning of central nervous system. CONCLUSIONS The application of methods of reconstruction and analysis of gene networks is a productive tool for studying the molecular mechanisms of comorbid conditions. The method put forth to rank genes by their importance to the comorbid condition of asthma and hypertension was employed that resulted in prediction of 10 genes, playing the key role in the development of the comorbid condition. The results can be utilised to plan experiments for identification of novel candidate genes along with searching for novel pharmacological targets.
Collapse
Affiliation(s)
- Olga V. Saik
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
| | - Pavel S. Demenkov
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
| | - Timofey V. Ivanisenko
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
| | - Elena Yu Bragina
- Research Institute of Medical Genetics, Tomsk NRMC, Tomsk, Russia
| | - Maxim B. Freidin
- Research Institute of Medical Genetics, Tomsk NRMC, Tomsk, Russia
| | | | | | - Olga I. Zolotareva
- Bielefeld University, International Research Training Group “Computational Methods for the Analysis of the Diversity and Dynamics of Genomes”, Bielefeld, Germany
| | - Ralf Hofestaedt
- Bielefeld University, Technical Faculty, AG Bioinformatics and Medical Informatics, Bielefeld, Germany
| | - Inna N. Lavrik
- Department of Translational Inflammation, Institute of Experimental Internal Medicine, Otto von Guericke University, Magdeburg, Germany
| | - Evgeny I. Rogaev
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
- University of Massachusetts Medical School, Worcester, MA USA
- Department of Genomics and Human Genetics, Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Center for Genetics and Genetic Technologies, Faculty of Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Vladimir A. Ivanisenko
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
28
|
Zhang Y, Liu J, Liu X, Fan X, Hong Y, Wang Y, Huang Y, Xie M. Prioritizing disease genes with an improved dual label propagation framework. BMC Bioinformatics 2018; 19:47. [PMID: 29422030 PMCID: PMC5806269 DOI: 10.1186/s12859-018-2040-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 01/24/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. RESULTS A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes. CONCLUSIONS IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes. AVAILABILITY https://github.com/nkiip/IDLP.
Collapse
Affiliation(s)
- Yaogong Zhang
- College of Software, Nankai University, TianJin, 300350, China
| | - Jiahui Liu
- College of Software, Nankai University, TianJin, 300350, China
| | - Xiaohu Liu
- College of Software, Nankai University, TianJin, 300350, China
| | - Xin Fan
- College of Software, Nankai University, TianJin, 300350, China
| | - Yuxiang Hong
- College of Software, Nankai University, TianJin, 300350, China
| | - Yuan Wang
- School of Computer Science and Information Engineering, Tianjin University of Science and Technology, TianJin, 300222, China
| | - YaLou Huang
- College of Software, Nankai University, TianJin, 300350, China
| | - MaoQiang Xie
- College of Software, Nankai University, TianJin, 300350, China.
| |
Collapse
|
29
|
Zampieri G, Tran DV, Donini M, Navarin N, Aiolli F, Sperduti A, Valle G. Scuba: scalable kernel-based gene prioritization. BMC Bioinformatics 2018; 19:23. [PMID: 29370760 PMCID: PMC5785908 DOI: 10.1186/s12859-018-2025-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 01/15/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. RESULTS We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. CONCLUSIONS Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
Collapse
Affiliation(s)
- Guido Zampieri
- CRIBI Biotechnology Center, University of Padova, viale G. Colombo, 3, Padova, Italy.,Department of Women's and Children's Health, University of Padova, via Giustiniani, 3, Padova, Italy
| | - Dinh Van Tran
- Department of Mathematics, University of Padova, via Trieste, 63, Padova, Italy
| | - Michele Donini
- Istituto Italiano di Tecnologia, Via Morego, 30, Genoa, Italy
| | - Nicolò Navarin
- Department of Mathematics, University of Padova, via Trieste, 63, Padova, Italy
| | - Fabio Aiolli
- Department of Mathematics, University of Padova, via Trieste, 63, Padova, Italy
| | - Alessandro Sperduti
- Department of Mathematics, University of Padova, via Trieste, 63, Padova, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Center, University of Padova, viale G. Colombo, 3, Padova, Italy. .,Department of Biology, University of Padova, viale G. Colombo, 3, Padova, Italy.
| |
Collapse
|
30
|
Emad A, Cairns J, Kalari KR, Wang L, Sinha S. Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance. Genome Biol 2017; 18:153. [PMID: 28800781 PMCID: PMC5554409 DOI: 10.1186/s13059-017-1282-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 07/18/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance. RESULTS We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein-protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group. CONCLUSIONS Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.
Collapse
Affiliation(s)
- Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
| | - Junmei Cairns
- Department of Molecular Pharmacology and Experimental Therapeutics, Gonda 19, Mayo Clinic Rochester, 200, 1st St. SW, Rochester, MN 55905 USA
| | - Krishna R. Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905 USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Gonda 19, Mayo Clinic Rochester, 200, 1st St. SW, Rochester, MN 55905 USA
| | - Saurabh Sinha
- Department of Computer Science and Institute of Genomic Biology, University of Illinois at Urbana-Champaign, 2122 Siebel Center, 201N. Goodwin Ave, Urbana, IL 61801 USA
| |
Collapse
|
31
|
González-Pérez S, Pazos F, Chagoyen M. Factors affecting interactome-based prediction of human genes associated with clinical signs. BMC Bioinformatics 2017; 18:340. [PMID: 28715999 PMCID: PMC5514523 DOI: 10.1186/s12859-017-1754-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 07/12/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical signs are a fundamental aspect of human pathologies. While disease diagnosis is problematic or impossible in many cases, signs are easier to perceive and categorize. Clinical signs are increasingly used, together with molecular networks, to prioritize detected variants in clinical genomics pipelines, even if the patient is still undiagnosed. Here we analyze the ability of these network-based methods to predict genes that underlie clinical signs from the human interactome. RESULTS Our analysis reveals that these approaches can locate genes associated with clinical signs with variable performance that depends on the sign and associated disease. We analyzed several clinical and biological factors that explain these variable results, including number of genes involved (mono- vs. oligogenic diseases), mode of inheritance, type of clinical sign and gene product function. CONCLUSIONS Our results indicate that the characteristics of the clinical signs and their related diseases should be considered for interpreting the results of network-prediction methods, such as those aimed at discovering disease-related genes and variants. These results are important due the increasing use of clinical signs as an alternative to diseases for studying the molecular basis of human pathologies.
Collapse
Affiliation(s)
- Sara González-Pérez
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Mónica Chagoyen
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain.
| |
Collapse
|
32
|
Le DH, Pham VH. HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC Syst Biol 2017; 11:61. [PMID: 28619054 PMCID: PMC5472867 DOI: 10.1186/s12918-017-0437-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 05/31/2017] [Indexed: 12/31/2022]
Abstract
Background Finding gene-disease and disease-disease associations play important roles in the biomedical area and many prioritization methods have been proposed for this goal. Among them, approaches based on a heterogeneous network of genes and diseases are considered state-of-the-art ones, which achieve high prediction performance and can be used for diseases with/without known molecular basis. Results Here, we developed a Cytoscape app, namely HGPEC, based on a random walk with restart algorithm on a heterogeneous network of genes and diseases. This app can prioritize candidate genes and diseases by employing a heterogeneous network consisting of a network of genes/proteins and a phenotypic disease similarity network. Based on the rankings, novel disease-gene and disease-disease associations can be identified. These associations can be supported with network- and rank-based visualization as well as evidences and annotations from biomedical data. A case study on prediction of novel breast cancer-associated genes and diseases shows the abilities of HGPEC. In addition, we showed prominence in the performance of HGPEC compared to other tools for prioritization of candidate disease genes. Conclusions Taken together, our app is expected to effectively predict novel disease-gene and disease-disease associations and support network- and rank-based visualization as well as biomedical evidences for such the associations. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0437-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Duc-Hau Le
- Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam.,Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam
| | - Van-Huy Pham
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
33
|
Hur B, Lim S, Chae H, Seo S, Lee S, Kang J, Kim S. CLIP-GENE: a web service of the condition specific context-laid integrative analysis for gene prioritization in mouse TF knockout experiments. Biol Direct 2016; 11:57. [PMID: 27776539 DOI: 10.1186/s13062-016-0158-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/10/2016] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis. However, existing methods require cutoff values to select candidate genes, which can produce either too many false positives or false negatives. This hurdle can be addressed either by improving the accuracy of gene selection or by providing a method to rank candidate genes effectively, or both. Prioritization of candidate genes should consider the goals or context of the knockout experiment. As of now, there are no tools designed for both selecting and prioritizing genes from the mouse knockout data. Hence, the necessity of a new tool arises. RESULTS In this study, we present CLIP-GENE, a web service that selects gene markers by utilizing differentially expressed genes, mouse transcription factor (TF) network, and single nucleotide variant information. Then, protein-protein interaction network and literature information are utilized to find genes that are relevant to the phenotypic differences. One of the novel features is to allow researchers to specify their contexts or hypotheses in a set of keywords to rank genes according to the contexts that the user specify. We believe that CLIP-GENE will be useful in characterizing functions of TFs in mouse experiments. AVAILABILITY http://epigenomics.snu.ac.kr/CLIP-GENE REVIEWERS: This article was reviewed by Dr. Lee and Dr. Pongor.
Collapse
|
34
|
Stelzer G, Plaschkes I, Oz-Levi D, Alkelai A, Olender T, Zimmerman S, Twik M, Belinky F, Fishilevich S, Nudel R, Guan-Golan Y, Warshawsky D, Dahary D, Kohn A, Mazor Y, Kaplan S, Iny Stein T, Baris HN, Rappaport N, Safran M, Lancet D. VarElect: the phenotype-based variation prioritizer of the GeneCards Suite. BMC Genomics 2016; 17 Suppl 2:444. [PMID: 27357693 PMCID: PMC4928145 DOI: 10.1186/s12864-016-2722-2] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Background Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates. Results We describe a novel tool, VarElect (http://ve.genecards.org), a comprehensive phenotype-dependent variant/gene prioritizer, based on the widely-used GeneCards, which helps rapidly identify causal mutations with extensive evidence. The GeneCards suite offers an effective and speedy alternative, whereby >120 gene-centric automatically-mined data sources are jointly available for the task. VarElect cashes on this wealth of information, as well as on GeneCards’ powerful free-text Boolean search and scoring capabilities, proficiently matching variant-containing genes to submitted disease/symptom keywords. The tool also leverages the rich disease and pathway information of MalaCards, the human disease database, and PathCards, the unified pathway (SuperPaths) database, both within the GeneCards Suite. The VarElect algorithm infers direct as well as indirect links between genes and phenotypes, the latter benefitting from GeneCards’ diverse gene-to-gene data links in GenesLikeMe. Finally, our tool offers an extensive gene-phenotype evidence portrayal (“MiniCards”) and hyperlinks to the parent databases. Conclusions We demonstrate that VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient’s disease. VarElect’s capacity to automatically process numerous NGS cases, either in stand-alone format or in VCF-analyzer mode (TGex and VarAnnot), is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2722-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gil Stelzer
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.,LifeMap Sciences Ltd, Tel Aviv, Israel
| | | | - Danit Oz-Levi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Anna Alkelai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Tsviya Olender
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Shahar Zimmerman
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Michal Twik
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Frida Belinky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Simon Fishilevich
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Nudel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | | | | | - Dvir Dahary
- LifeMap Sciences Ltd, Tel Aviv, Israel.,Toldot Genetics Ltd, Hod Hasharon, Israel
| | - Asher Kohn
- LifeMap Sciences Inc, Marshfield, MA, 02050, USA
| | | | | | - Tsippi Iny Stein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Hagit N Baris
- The Genetics Institute, Rambam Health Care Campus, Haifa, Israel.,Rappaport School of Medicine, Technion, Haifa, Israel
| | - Noa Rappaport
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Marilyn Safran
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Doron Lancet
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
35
|
Wang L, Zhang C, Watkins J, Jin Y, McNutt M, Yin Y. SoftPanel: a website for grouping diseases and related disorders for generation of customized panels. BMC Bioinformatics 2016; 17:153. [PMID: 27044653 PMCID: PMC4820874 DOI: 10.1186/s12859-016-0998-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/23/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Targeted next-generation sequencing is playing an increasingly important role in biological research and clinical diagnosis by allowing researchers to sequence high priority genes at much higher depths and at a fraction of the cost of whole genome or exome sequencing. However, in designing the panel of genes to be sequenced, investigators need to consider the tradeoff between the better sensitivity of a broad panel and the higher specificity of a potentially more relevant panel. Although tools to prioritize candidate disease genes have been developed, the great majority of these require prior knowledge and a set of seed genes as input, which is only possible for diseases with a known genetic etiology. RESULTS To meet the demands of both researchers and clinicians, we have developed a user-friendly website called SoftPanel. This website is intended to serve users by allowing them to input a single disorder or a disorder group and generate a panel of genes predicted to underlie the disorder of interest. Various methods of retrieval including a keyword search, browsing of an arborized list of International Classification of Diseases, 10th revision (ICD-10) codes or using disorder phenotypic similarities can be combined to define a group of disorders and the genes known to be associated with them. Moreover, SoftPanel enables users to expand or refine a gene list by utilizing several biological data resources. In addition to providing users with the facility to create a "hard" panel that contains an exact gene list for targeted sequencing, SoftPanel also enables generation of a "soft" panel of genes, which may be used to further filter a significantly altered set of genes identified through whole genome or whole exome sequencing. The service and data provided by SoftPanel can be accessed at http://www.isb.pku.edu.cn/SoftPanel/ . A tutorial page is included for trying out sample data and interpreting results. CONCLUSION SoftPanel provides a convenient and powerful tool for creating a targeted panel of potential disease genes while supporting different forms of input. SoftPanel may be utilized in both genomics research and personalized medicine.
Collapse
Affiliation(s)
- Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Cong Zhang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Johnathan Watkins
- Institute for Mathematical and Molecular Biomedicine, King's College London, Guy's Campus, London, SE1 1UL, UK.,Department of Research Oncology, King's College London, Guy's Campus, Great Maze Pond, London, SE1 9RT, UK
| | - Yan Jin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Michael McNutt
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Yuxin Yin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
36
|
Jayaraman A, Jamil K, Khan HA. Identifying new targets in leukemogenesis using computational approaches. Saudi J Biol Sci 2015; 22:610-22. [PMID: 26288567 PMCID: PMC4537869 DOI: 10.1016/j.sjbs.2015.01.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 01/04/2015] [Accepted: 01/12/2015] [Indexed: 02/08/2023] Open
Abstract
There is a need to identify novel targets in Acute Lymphoblastic Leukemia (ALL), a hematopoietic cancer affecting children, to improve our understanding of disease biology and that can be used for developing new therapeutics. Hence, the aim of our study was to find new genes as targets using in silico studies; for this we retrieved the top 10% overexpressed genes from Oncomine public domain microarray expression database; 530 overexpressed genes were short-listed from Oncomine database. Then, using prioritization tools such as ENDEAVOUR, DIR and TOPPGene online tools, we found fifty-four genes common to the three prioritization tools which formed our candidate leukemogenic genes for this study. As per the protocol we selected thirty training genes from PubMed. The prioritized and training genes were then used to construct STRING functional association network, which was further analyzed using cytoHubba hub analysis tool to investigate new genes which could form drug targets in leukemia. Analysis of the STRING protein network built from these prioritized and training genes led to identification of two hub genes, SMAD2 and CDK9, which were not implicated in leukemogenesis earlier. Filtering out from several hundred genes in the network we also found MEN1, HDAC1 and LCK genes, which re-emphasized the important role of these genes in leukemogenesis. This is the first report on these five additional signature genes in leukemogenesis. We propose these as new targets for developing novel therapeutics and also as biomarkers in leukemogenesis, which could be important for prognosis and diagnosis.
Collapse
Affiliation(s)
- Archana Jayaraman
- Centre for Biotechnology and Bioinformatics, School of Life Sciences, Jawaharlal Nehru Institute of Advanced Studies (JNIAS), Secunderabad, Telangana, India
- Center for Biotechnology, Jawaharlal Nehru Technological University (JNTUH), Kukatpally, Hyderabad, Telangana, India
| | - Kaiser Jamil
- Centre for Biotechnology and Bioinformatics, School of Life Sciences, Jawaharlal Nehru Institute of Advanced Studies (JNIAS), Secunderabad, Telangana, India
- Corresponding author. at: Centre for Biotechnology and Bioinformatics, School of Life Sciences, Jawaharlal Nehru Institute of Advanced Studies (JNIAS), Buddha Bhawan, 6th Floor, M.G. Road, Secunderabad 500003, Telangana, India. Tel.: + 91 9676872626; fax: +91 40 27541551.
| | - Haseeb A. Khan
- Department of Biochemistry, College of Sciences, Bldg. 5, King Saud University, P.O. Box 2455, Riyadh, Saudi Arabia
| |
Collapse
|