1
|
Lin CH, Konecki DM, Liu M, Wilson SJ, Nassar H, Wilkins AD, Gleich DF, Lichtarge O. Multimodal network diffusion predicts future disease-gene-chemical associations. Bioinformatics 2019; 35:1536-1543. [PMID: 30304494 PMCID: PMC6499233 DOI: 10.1093/bioinformatics/bty858] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 09/14/2018] [Accepted: 10/08/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Precision medicine is an emerging field with hopes to improve patient treatment and reduce morbidity and mortality. To these ends, computational approaches have predicted associations among genes, chemicals and diseases. Such efforts, however, were often limited to using just some available association types. This lowers prediction coverage and, since prior evidence shows that integrating heterogeneous data is likely beneficial, it may limit accuracy. Therefore, we systematically tested whether using more association types improves prediction. RESULTS We study multimodal networks linking diseases, genes and chemicals (drugs) by applying three diffusion algorithms and varying information content. Ten-fold cross-validation shows that these networks are internally consistent, both within and across association types. Also, diffusion methods recovered missing edges, even if all the edges from an entire mode of association were removed. This suggests that information is transferable between these association types. As a realistic validation, time-stamped experiments simulated the predictions of future associations based solely on information known prior to a given date. The results show that many future published results are predictable from current associations. Moreover, in most cases, using more association types increases prediction coverage without significantly decreasing sensitivity and specificity. In case studies, literature-supported validation shows that these predictions mimic human-formulated hypotheses. Overall, this study suggests that diffusion over a more comprehensive multimodal network will generate more useful hypotheses of associations among diseases, genes and chemicals, which may guide the development of precision therapies. AVAILABILITY AND IMPLEMENTATION Code and data are available at https://github.com/LichtargeLab/multimodal-network-diffusion. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chih-Hsu Lin
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Daniel M Konecki
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Meng Liu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Stephen J Wilson
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
| | - Huda Nassar
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Angela D Wilkins
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Olivier Lichtarge
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
2
|
Mahfouz A, Huisman SMH, Lelieveldt BPF, Reinders MJT. Brain transcriptome atlases: a computational perspective. Brain Struct Funct 2017; 222:1557-1580. [PMID: 27909802 PMCID: PMC5406417 DOI: 10.1007/s00429-016-1338-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 11/15/2016] [Indexed: 01/31/2023]
Abstract
The immense complexity of the mammalian brain is largely reflected in the underlying molecular signatures of its billions of cells. Brain transcriptome atlases provide valuable insights into gene expression patterns across different brain areas throughout the course of development. Such atlases allow researchers to probe the molecular mechanisms which define neuronal identities, neuroanatomy, and patterns of connectivity. Despite the immense effort put into generating such atlases, to answer fundamental questions in neuroscience, an even greater effort is needed to develop methods to probe the resulting high-dimensional multivariate data. We provide a comprehensive overview of the various computational methods used to analyze brain transcriptome atlases.
Collapse
Affiliation(s)
- Ahmed Mahfouz
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands.
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands.
| | - Sjoerd M H Huisman
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Boudewijn P F Lelieveldt
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Marcel J T Reinders
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
3
|
Cogill S, Wang L. Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates. Bioinformatics 2016; 32:3611-3618. [PMID: 27506227 DOI: 10.1093/bioinformatics/btw498] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 07/20/2016] [Accepted: 07/21/2016] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders with clinical heterogeneity and a substantial polygenic component. High-throughput methods for ASD risk gene identification produce numerous candidate genes that are time-consuming and expensive to validate. Prioritization methods can identify high-confidence candidates. Previous ASD gene prioritization methods have focused on a priori knowledge, which excludes genes with little functional annotation or no protein product such as long non-coding RNAs (lncRNAs). RESULTS We have developed a support vector machine (SVM) model, trained using brain developmental gene expression data, for the classification and prioritization of ASD risk genes. The selected feature model had a mean accuracy of 76.7%, mean specificity of 77.2% and mean sensitivity of 74.4%. Gene lists comprised of an ASD risk gene and adjacent genes were ranked using the model's decision function output. The known ASD risk genes were ranked on average in the 77.4th, 78.4th and 80.7th percentile for sets of 101, 201 and 401 genes respectively. Of 10,840 lncRNA genes, 63 were classified as ASD-associated candidates with a confidence greater than 0.95. Genes previously associated with brain development and neurodevelopmental disorders were prioritized highly within the lncRNA gene list. CONTACT liangjw@clemson.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- S Cogill
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| | - L Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
4
|
Dinai Y, Wolf L, Assaf Y. Combined neuroimaging and gene expression analysis of the genetic basis of brain plasticity indicates across species homology. Hum Brain Mapp 2014; 35:5888-902. [PMID: 25053200 DOI: 10.1002/hbm.22592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Revised: 05/22/2014] [Accepted: 07/14/2014] [Indexed: 12/29/2022] Open
Abstract
Brain plasticity and memory formation depend on the expression of a large number of genes. This relationship had been studied using several experimental approaches and researchers have identified genes regulating plasticity through a variety of mechanisms. Despite this effort, a great deal remains unknown regarding the role of different genes in brain plasticity. Previous studies usually focused on specific brain structures and many of the genes influencing plasticity have yet to be identified. In this work, we integrate results of in vivo neuroimaging studies of plasticity with whole-brain gene expression data for the study of neuroplasticity. Brain regions, found in the imaging study to be involved in plasticity, are first spatially mapped to the anatomical framework of the genetic database. Feature ranking methods are then applied to identify genes that are differentially expressed in these regions. We find that many of our highly ranked genes are involved in synaptic transmission and that some of these genes have been previously associated with learning and memory. We show these results to be consistent when applying our method to gene expression data from four human subjects. Finally, by performing similar experiments in mice, we reveal significant cross species correlation in the ranking of genes. In addition to the identification of plasticity related candidate genes, our results also demonstrate the potential of data integration approaches as a tool to link high level phenomena such as learning and memory to underlying molecular mechanisms.
Collapse
Affiliation(s)
- Yonatan Dinai
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
5
|
Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies. BIOMED RESEARCH INTERNATIONAL 2013; 2013:853043. [PMID: 24371834 PMCID: PMC3859262 DOI: 10.1155/2013/853043] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2013] [Revised: 10/26/2013] [Accepted: 11/10/2013] [Indexed: 01/24/2023]
Abstract
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.
Collapse
|
6
|
Abstract
This community page describes the database and associated Web application that comprise the Allen Human Brain Atlas, an open online resource that integrates genomic and anatomic human brain data.
Collapse
|
7
|
Piro RM, Molineris I, Di Cunto F, Eils R, König R. Disease-gene discovery by integration of 3D gene expression and transcription factor binding affinities. ACTA ACUST UNITED AC 2012; 29:468-75. [PMID: 23267172 DOI: 10.1093/bioinformatics/bts720] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION The computational evaluation of candidate genes for hereditary disorders is a non-trivial task. Several excellent methods for disease-gene prediction have been developed in the past 2 decades, exploiting widely differing data sources to infer disease-relevant functional relationships between candidate genes and disorders. We have shown recently that spatially mapped, i.e. 3D, gene expression data from the mouse brain can be successfully used to prioritize candidate genes for human Mendelian disorders of the central nervous system. RESULTS We improved our previous work 2-fold: (i) we demonstrate that condition-independent transcription factor binding affinities of the candidate genes' promoters are relevant for disease-gene prediction and can be integrated with our previous approach to significantly enhance its predictive power; and (ii) we define a novel similarity measure-termed Relative Intensity Overlap-for both 3D gene expression patterns and binding affinity profiles that better exploits their disease-relevant information content. Finally, we present novel disease-gene predictions for eight loci associated with different syndromes of unknown molecular basis that are characterized by mental retardation.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), University of Heidelberg, Im 69120 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
8
|
Arias CR, Yeh HY, Soo VW. Biomarker identification for prostate cancer and lymph node metastasis from microarray data and protein interaction network using gene prioritization method. ScientificWorldJournal 2012; 2012:842727. [PMID: 22654636 PMCID: PMC3354662 DOI: 10.1100/2012/842727] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 12/26/2011] [Indexed: 01/20/2023] Open
Abstract
Finding a genetic disease-related gene is not a trivial task. Therefore, computational methods are needed to present clues to the biomedical community to explore genes that are more likely to be related to a specific disease as biomarker. We present biomarker identification problem using gene prioritization method called gene prioritization from microarray data based on shortest paths, extended with structural and biological properties and edge flux using voting scheme (GP-MIDAS-VXEF). The method is based on finding relevant interactions on protein interaction networks, then scoring the genes using shortest paths and topological analysis, integrating the results using a voting scheme and a biological boosting. We applied two experiments, one is prostate primary and normal samples and the other is prostate primary tumor with and without lymph nodes metastasis. We used 137 truly prostate cancer genes as benchmark. In the first experiment, GP-MIDAS-VXEF outperforms all the other state-of-the-art methods in the benchmark by retrieving the truest related genes from the candidate set in the top 50 scores found. We applied the same technique to infer the significant biomarkers in prostate cancer with lymph nodes metastasis which is not established well.
Collapse
Affiliation(s)
- Carlos Roberto Arias
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 30013, Taiwan.
| | | | | |
Collapse
|
9
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
10
|
Piro RM, Molineris I, Ala U, Di Cunto F. Evaluation of candidate genes from orphan FEB and GEFS+ loci by analysis of human brain gene expression atlases. PLoS One 2011; 6:e23149. [PMID: 21858011 PMCID: PMC3157479 DOI: 10.1371/journal.pone.0023149] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 07/07/2011] [Indexed: 12/19/2022] Open
Abstract
Febrile seizures, or febrile convulsions (FEB), represent the most common form of childhood seizures and are believed to be influenced by variations in several susceptibility genes. Most of the associated loci, however, remain ‘orphan’, i.e. the susceptibility genes they contain still remain to be identified. Further orphan loci have been mapped for a related disorder, genetic (generalized) epilepsy with febrile seizures plus (GEFS+). We show that both spatially mapped and ‘traditional’ gene expression data from the human brain can be successfully employed to predict the most promising candidate genes for FEB and GEFS+, apply our prediction method to the remaining orphan loci and discuss the validity of the predictions. For several of the orphan FEB/GEFS+ loci we propose excellent, and not always obvious, candidates for mutation screening in order to aid in gaining a better understanding of the genetic origin of the susceptibility to seizures.
Collapse
Affiliation(s)
- Rosario M Piro
- Molecular Biotechnology Center and Department of Genetics, Biology and Biochemistry, University of Torino, Torino, Italy.
| | | | | | | |
Collapse
|
11
|
Differential expression pattern-based prioritization of candidate genes through integrating disease-specific expression data. Genomics 2011; 98:64-71. [DOI: 10.1016/j.ygeno.2011.04.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 03/11/2011] [Accepted: 04/01/2011] [Indexed: 01/30/2023]
|
12
|
A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis. Biol Direct 2011; 6:30. [PMID: 21668950 PMCID: PMC3142252 DOI: 10.1186/1745-6150-6-30] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 06/13/2011] [Indexed: 01/07/2023] Open
Abstract
Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi).
Collapse
|