1
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Background and contributionIn network biology, molecular functions can be characterized by network-based inference, or “guilt-by-associations.” PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process.ResultsWe describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding “non-seed” molecules to the original biomolecular interaction network consisting of “seed” molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree–preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND.ConclusionWINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
|
2
|
Lagisetty Y, Bourquard T, Al-Ramahi I, Mangleburg CG, Mota S, Soleimani S, Shulman JM, Botas J, Lee K, Lichtarge O. Identification of risk genes for Alzheimer's disease by gene embedding. CELL GENOMICS 2022; 2:100162. [PMID: 36268052 PMCID: PMC9581494 DOI: 10.1016/j.xgen.2022.100162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Most disease-gene association methods do not account for gene-gene interactions, even though these play a crucial role in complex, polygenic diseases like Alzheimer's disease (AD). To discover new genes whose interactions may contribute to pathology, we introduce GeneEMBED. This approach compares the functional perturbations induced in gene interaction network neighborhoods by coding variants from disease versus healthy subjects. In two independent AD cohorts of 5,169 exomes and 969 genomes, GeneEMBED identified novel candidates. These genes were differentially expressed in post mortem AD brains and modulated neurological phenotypes in mice. Four that were differentially overexpressed and modified neurodegeneration in vivo are PLEC, UTRN, TP53, and POLD1. Notably, TP53 and POLD1 are involved in DNA break repair and inhibited by approved drugs. While these data show proof of concept in AD, GeneEMBED is a general approach that should be broadly applicable to identify genes relevant to risk mechanisms and therapy of other complex diseases.
Collapse
Affiliation(s)
- Yashwanth Lagisetty
- Department of Biology and Pharmacology, UTHealth McGovern Medical School, Houston, TX 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Thomas Bourquard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ismael Al-Ramahi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Carl Grant Mangleburg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Samantha Mota
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shirin Soleimani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua M. Shulman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neurology, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Juan Botas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA,Corresponding author
| |
Collapse
|
3
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational – The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
|
4
|
Chi X, Sartor MA, Lee S, Anurag M, Patil S, Hall P, Wexler M, Wang XS. Universal concept signature analysis: genome-wide quantification of new biological and pathological functions of genes and pathways. Brief Bioinform 2021; 21:1717-1732. [PMID: 31631213 DOI: 10.1093/bib/bbz093] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/23/2019] [Accepted: 07/05/2019] [Indexed: 12/12/2022] Open
Abstract
Identifying new gene functions and pathways underlying diseases and biological processes are major challenges in genomics research. Particularly, most methods for interpreting the pathways characteristic of an experimental gene list defined by genomic data are limited by their dependence on assessing the overlapping genes or their interactome topology, which cannot account for the variety of functional relations. This is particularly problematic for pathway discovery from single-cell genomics with low gene coverage or interpreting complex pathway changes such as during change of cell states. Here, we exploited the comprehensive sets of molecular concepts that combine ontologies, pathways, interactions and domains to help inform the functional relations. We first developed a universal concept signature (uniConSig) analysis for genome-wide quantification of new gene functions underlying biological or pathological processes based on the signature molecular concepts computed from known functional gene lists. We then further developed a novel concept signature enrichment analysis (CSEA) for deep functional assessment of the pathways enriched in an experimental gene list. This method is grounded on the framework of shared concept signatures between gene sets at multiple functional levels, thus overcoming the limitations of the current methods. Through meta-analysis of transcriptomic data sets of cancer cell line models and single hematopoietic stem cells, we demonstrate the broad applications of CSEA on pathway discovery from gene expression and single-cell transcriptomic data sets for genetic perturbations and change of cell states, which complements the current modalities. The R modules for uniConSig analysis and CSEA are available through https://github.com/wangxlab/uniConSig.
Collapse
Affiliation(s)
- Xu Chi
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, U.S.A.,CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Maureen A Sartor
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, U.S.A
| | - Sanghoon Lee
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, U.S.A
| | - Meenakshi Anurag
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, U.S.A
| | - Snehal Patil
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, U.S.A
| | - Pelle Hall
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, U.S.A
| | - Matthew Wexler
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, U.S.A
| | - Xiao-Song Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15232, U.S.A.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, U.S.A.,Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, U.S.A
| |
Collapse
|
5
|
Kumar G, Kumar R, Pal MK, Pramanik N, Lahiri T, Gupta A, Pandey S. APT: An Automated Probe Tracker From Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1864-1874. [PMID: 31825870 DOI: 10.1109/tcbb.2019.2958345] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Out of currently available semi-automatic tools for detecting diagnostic probes relevant to a pathophysiological condition, ArrayMining and GEO2R of NCBI are most popular. The shortcomings of ArrayMining and GEO2R are that both tools list the probes ordering them on the basis of their individual statistical level of significances with only difference of statistical methods used by them. While the latest tool GEO2R outputs either top 250 or all genes following its own ranking mechanism, ArrayMining requires number of probes to be inputted by the user. This study provided a way for automatic selection of probe-set that can be obtained from the voting of outputs resulted from statistical methods, t-Test, Mann-Whitney Test and Empirical Bayes Moderated t-test. It was also intriguing to find that the parameters of these statistical methods can be represented as a mathematical function of group fisher's discriminant ratio of a disease-control expression data-pair. Result of this fully automatic method, APT shows 88.97 percent success in comparison to 80.40 and 87.60 percent successes of ArrayMining and GEO2R respectively to include reported probes. Furthermore, out of 10 fold cross validation and 5 new test cases, APT shows a better performance than both ArrayMining and GEO2R in regards to sensitivity and specificity.
Collapse
|
6
|
Yue Z, Yan D, Guo G, Chen JY. Biological Network Mining. Methods Mol Biol 2021; 2328:139-151. [PMID: 34251623 DOI: 10.1007/978-1-0716-1534-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this book chapter, we introduce a pipeline to mine significant biomedical entities (or bioentities) in biological networks. Our focus is on prioritizing both bioentities themselves and the associations between bioentities in order to reveal their biological functions. We will introduce three tools BEERE, WIPER, and PAGER 2.0 that can be used together for network analysis and function interpretation: (1) BEERE is a network analysis tool for "Biomedical Entity Expansion, Ranking and Explorations," (2) WIPER is an entity-to-entity association ranking tool, and (3) PAGER 2.0 is a service for gene enrichment analysis.
Collapse
Affiliation(s)
- Zongliang Yue
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Da Yan
- The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Guimu Guo
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y Chen
- The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
7
|
Chang HC, Chu CP, Lin SJ, Hsiao CK. Network hub-node prioritization of gene regulation with intra-network association. BMC Bioinformatics 2020; 21:101. [PMID: 32164570 PMCID: PMC7069025 DOI: 10.1186/s12859-020-3444-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 03/06/2020] [Indexed: 11/10/2022] Open
Abstract
Background To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them. Results Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot. Conclusions The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.
Collapse
Affiliation(s)
- Hung-Ching Chang
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Chiao-Pei Chu
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Shu-Ju Lin
- Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chuhsing Kate Hsiao
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan. .,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, 10055, Taiwan.
| |
Collapse
|
8
|
Biological Network Approaches and Applications in Rare Disease Studies. Genes (Basel) 2019; 10:genes10100797. [PMID: 31614842 PMCID: PMC6827097 DOI: 10.3390/genes10100797] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 09/30/2019] [Accepted: 10/10/2019] [Indexed: 12/26/2022] Open
Abstract
Network biology has the capability to integrate, represent, interpret, and model complex biological systems by collectively accommodating biological omics data, biological interactions and associations, graph theory, statistical measures, and visualizations. Biological networks have recently been shown to be very useful for studies that decipher biological mechanisms and disease etiologies and for studies that predict therapeutic responses, at both the molecular and system levels. In this review, we briefly summarize the general framework of biological network studies, including data resources, network construction methods, statistical measures, network topological properties, and visualization tools. We also introduce several recent biological network applications and methods for the studies of rare diseases.
Collapse
|
9
|
Yue Z, Willey CD, Hjelmeland AB, Chen JY. BEERE: a web server for biomedical entity expansion, ranking and explorations. Nucleic Acids Res 2019; 47:W578-W586. [PMID: 31114876 PMCID: PMC6602520 DOI: 10.1093/nar/gkz428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/04/2019] [Accepted: 05/20/2019] [Indexed: 12/02/2022] Open
Abstract
BEERE (Biomedical Entity Expansion, Ranking and Explorations) is a new web-based data analysis tool to help biomedical researchers characterize any input list of genes/proteins, biomedical terms or their combinations, i.e. ‘biomedical entities’, in the context of existing literature. Specifically, BEERE first aims to help users examine the credibility of known entity-to-entity associative or semantic relationships supported by database or literature references from the user input of a gene/term list. Then, it will help users uncover the relative importance of each entity—a gene or a term—within the user input by computing the ranking scores of all entities. At last, it will help users hypothesize new gene functions or genotype–phenotype associations by an interactive visual interface of constructed global entity relationship network. The output from BEERE includes: a list of the original entities matched with known relationships in databases; any expanded entities that may be generated from the analysis; the ranks and ranking scores reported with statistical significance for each entity; and an interactive graphical display of the gene or term network within data provenance annotations that link to external data sources. The web server is free and open to all users with no login requirement and can be accessed at http://discovery.informatics.uab.edu/beere/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Christopher D Willey
- Department of Radiation Oncology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Anita B Hjelmeland
- Department of Cell, Developmental and Integrative Biology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| |
Collapse
|
10
|
GUILDify v2.0: A Tool to Identify Molecular Networks Underlying Human Diseases, Their Comorbidities and Their Druggable Targets. J Mol Biol 2019; 431:2477-2484. [DOI: 10.1016/j.jmb.2019.02.027] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 02/08/2019] [Accepted: 02/26/2019] [Indexed: 01/24/2023]
|
11
|
Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019; 20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open
Abstract
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine.
Collapse
Affiliation(s)
- Anastasis Oulas
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George Minadakis
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Margarita Zachariou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kleitos Sokratous
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Marilena M Bourdakou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George M Spyrou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
12
|
Su L, Liu G, Wang J, Xu D. A rectified factor network based biclustering method for detecting cancer-related coding genes and miRNAs, and their interactions. Methods 2019; 166:22-30. [PMID: 31121299 DOI: 10.1016/j.ymeth.2019.05.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/14/2019] [Accepted: 05/13/2019] [Indexed: 12/12/2022] Open
Abstract
Detecting cancer-related genes and their interactions is a crucial task in cancer research. For this purpose, we proposed an efficient method, to detect coding genes, microRNAs (miRNAs), and their interactions related to a particular cancer or a cancer subtype using their expression data from the same set of samples. Firstly, biclusters specific to a particular type of cancer are detected based on rectified factor networks and ranked according to their associations with general cancers. Secondly, coding genes and miRNAs in each bicluster are prioritized by considering their differential expression and differential correlation values, protein-protein interaction data, and potential cancer markers. Finally, a rank fusion process is used to obtain the final comprehensive rank by combining multiple ranking results. We applied our proposed method on breast cancer datasets. Results show that our method outperforms other methods in detecting breast cancer-related coding genes and miRNAs. Furthermore, our method is very efficient in computing time, which can handle tens of thousands genes/miRNAs and hundreds of patients in hours on a desktop. This work may aid researchers in studying the genetic architecture of complex diseases, and improving the accuracy of diagnosis.
Collapse
Affiliation(s)
- Lingtao Su
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China; Department of Electrical Engineering & Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Guixia Liu
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Juexin Wang
- Department of Electrical Engineering & Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering & Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
13
|
Ranking Cancer Proteins by Integrating PPI Network and Protein Expression Profiles. BIOMED RESEARCH INTERNATIONAL 2019; 2019:3907195. [PMID: 30723737 PMCID: PMC6339728 DOI: 10.1155/2019/3907195] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 12/06/2018] [Accepted: 12/12/2018] [Indexed: 12/16/2022]
Abstract
Proteomics, the large-scale analysis of proteins, is contributing greatly to understanding gene function in the postgenomic era. However, disease protein ranking using shotgun proteomics data has not been fully evaluated. In this study, we prioritized disease-related proteins by integrating the protein-protein interaction (PPI) network and protein differential expression profiles from colon and rectal cancer (CRC) or breast cancer (BC) proteomics. We applied Local Ranking (LR) and Global Ranking (GR) methods in network with three kinds of protein sets as a priori knowledge, which were known disease proteins (KDPs) that were collected from the Online Mendelian Inheritance in Man (OMIM) database, differentially expressed proteins (DEPs), and the collection of KDPs and their direct neighborhood with differential expression (eKDPs). The cross-validations showed that GR method outperformed LR method while using eKDPs as the initial training showed significantly higher accuracy compared to using the other two a priori sets. And then we validated the top ranked proteins using RNAi-based loss-of-function screens in the DepMap database. The results showed that 75% of top 20 proteins in CRC are necessary for tumor survival. In summary, the network-based Global Ranking with protein differential expression can efficiently prioritize cancer-related proteins and discover new candidate cancer genes or proteins.
Collapse
|
14
|
MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization. BMC Bioinformatics 2018; 19:215. [PMID: 29871590 PMCID: PMC5989416 DOI: 10.1186/s12859-018-2216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. Results In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. Conclusions This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2216-0) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Xie B, Laxman B, Hashemifar S, Stern R, Gilliam TC, Maltsev N, White SR. Chemokine expression in the early response to injury in human airway epithelial cells. PLoS One 2018; 13:e0193334. [PMID: 29534074 PMCID: PMC5849294 DOI: 10.1371/journal.pone.0193334] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 02/08/2018] [Indexed: 12/22/2022] Open
Abstract
Basal airway epithelial cells (AEC) constitute stem/progenitor cells within the central airways and respond to mucosal injury in an ordered sequence of spreading, migration, proliferation, and differentiation to needed cell types. However, dynamic gene transcription in the early events after mucosal injury has not been studied in AEC. We examined gene expression using microarrays following mechanical injury (MI) in primary human AEC grown in submersion culture to generate basal cells and in the air-liquid interface to generate differentiated AEC (dAEC) that include goblet and ciliated cells. A select group of ~150 genes was in differential expression (DE) within 2-24 hr after MI, and enrichment analysis of these genes showed over-representation of functional categories related to inflammatory cytokines and chemokines. Network-based gene prioritization and network reconstruction using the PINTA heat kernel diffusion algorithm demonstrated highly connected networks that were richer in differentiated AEC compared to basal cells. Similar experiments done in basal AEC collected from asthmatic donor lungs demonstrated substantial changes in DE genes and functional categories related to inflammation compared to basal AEC from normal donors. In dAEC, similar but more modest differences were observed. We demonstrate that the AEC transcription signature after MI identifies genes and pathways that are important to the initiation and perpetuation of airway mucosal inflammation. Gene expression occurs quickly after injury and is more profound in differentiated AEC, and is altered in AEC from asthmatic airways. Our data suggest that the early response to injury is substantially different in asthmatic airways, particularly in basal airway epithelial cells.
Collapse
Affiliation(s)
- Bingqing Xie
- Department of Human Genetics, University of Chicago, Chicago, IL, United States of America
- Illinois Institute of Technology, Chicago, IL, United States of America
| | - Bharathi Laxman
- Department of Medicine, University of Chicago, Chicago, IL, United States of America
| | - Somaye Hashemifar
- Department of Human Genetics, University of Chicago, Chicago, IL, United States of America
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
| | - Randi Stern
- Department of Medicine, University of Chicago, Chicago, IL, United States of America
| | - T. Conrad Gilliam
- Department of Human Genetics, University of Chicago, Chicago, IL, United States of America
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, Chicago, IL, United States of America
| | - Steven R. White
- Department of Medicine, University of Chicago, Chicago, IL, United States of America
| |
Collapse
|
16
|
|
17
|
Disease genes prioritizing mechanisms: a comprehensive and systematic literature review. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s13721-017-0154-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
18
|
Guala D, Sonnhammer ELL. A large-scale benchmark of gene prioritization methods. Sci Rep 2017; 7:46598. [PMID: 28429739 PMCID: PMC5399445 DOI: 10.1038/srep46598] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/22/2017] [Indexed: 11/16/2022] Open
Abstract
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Collapse
Affiliation(s)
- Dimitri Guala
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
19
|
GenePANDA-a novel network-based gene prioritizing tool for complex diseases. Sci Rep 2017; 7:43258. [PMID: 28252032 PMCID: PMC5333103 DOI: 10.1038/srep43258] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/23/2017] [Indexed: 02/08/2023] Open
Abstract
Here we describe GenePANDA, a novel network-based tool for prioritizing candidate disease genes. GenePANDA assesses whether a gene is likely a candidate disease gene based on its relative distance to known disease genes in a functional association network. A unique feature of GenePANDA is the introduction of adjusted network distance derived by normalizing the raw network distance between two genes with their respective mean raw network distance to all other genes in the network. The use of adjusted network distance significantly improves GenePANDA’s performance on prioritizing complex disease genes. GenePANDA achieves superior performance over five previously published algorithms for prioritizing disease genes. Finally, GenePANDA can assist in prioritizing functionally important SNPs identified by GWAS.
Collapse
|
20
|
D'Souza M, Sulakhe D, Wang S, Xie B, Hashemifar S, Taylor A, Dubchak I, Conrad Gilliam T, Maltsev N. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
Affiliation(s)
- Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA.
- Argonne National Laboratory, Building 221, Room: A142, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Bing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
| | - Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| |
Collapse
|
21
|
CLIP-GENE: a web service of the condition specific context-laid integrative analysis for gene prioritization in mouse TF knockout experiments. Biol Direct 2016; 11:57. [PMID: 27776539 PMCID: PMC5078909 DOI: 10.1186/s13062-016-0158-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/10/2016] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis. However, existing methods require cutoff values to select candidate genes, which can produce either too many false positives or false negatives. This hurdle can be addressed either by improving the accuracy of gene selection or by providing a method to rank candidate genes effectively, or both. Prioritization of candidate genes should consider the goals or context of the knockout experiment. As of now, there are no tools designed for both selecting and prioritizing genes from the mouse knockout data. Hence, the necessity of a new tool arises. RESULTS In this study, we present CLIP-GENE, a web service that selects gene markers by utilizing differentially expressed genes, mouse transcription factor (TF) network, and single nucleotide variant information. Then, protein-protein interaction network and literature information are utilized to find genes that are relevant to the phenotypic differences. One of the novel features is to allow researchers to specify their contexts or hypotheses in a set of keywords to rank genes according to the contexts that the user specify. We believe that CLIP-GENE will be useful in characterizing functions of TFs in mouse experiments. AVAILABILITY http://epigenomics.snu.ac.kr/CLIP-GENE REVIEWERS: This article was reviewed by Dr. Lee and Dr. Pongor.
Collapse
|
22
|
Cardozo T, Gupta P, Ni E, Young LM, Tivon D, Felsovalyi K. Data sources for in vivo molecular profiling of human phenotypes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2016; 8:472-484. [PMID: 27599755 DOI: 10.1002/wsbm.1354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 06/26/2016] [Accepted: 06/27/2016] [Indexed: 11/08/2022]
Abstract
Molecular profiling of human diseases has been approached at the genetic (DNA), expression (RNA), and proteomic (protein) levels. An important goal of these efforts is to map observed molecular patterns to specific, mechanistic organic entities, such as loci in the genome, individual RNA molecules or defined proteins or protein assemblies. Importantly, such maps have been historically approached in the more intuitive context of a theoretical individual cell, but diseases are better described in reality using an in vivo framework, namely a library of several tissue-specific maps. In this article, we review the existing data atlases that can be used for this purpose and identify critical gaps that could move the field forward from cellular to in vivo dimensions. WIREs Syst Biol Med 2016, 8:472-484. doi: 10.1002/wsbm.1354 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Timothy Cardozo
- Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA.
| | - Priyanka Gupta
- Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA.,GeneCentrix Inc., New York, NY, USA
| | - Eric Ni
- Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA.,GeneCentrix Inc., New York, NY, USA
| | - Lauren M Young
- Department of Pathology, NYU School of Medicine, New York, NY, USA
| | | | | |
Collapse
|
23
|
Jiang J, Li W, Liang B, Xie R, Chen B, Huang H, Li Y, He Y, Lv J, He W, Chen L. A Novel Prioritization Method in Identifying Recurrent Venous Thromboembolism-Related Genes. PLoS One 2016; 11:e0153006. [PMID: 27050193 PMCID: PMC4822849 DOI: 10.1371/journal.pone.0153006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 03/21/2016] [Indexed: 12/13/2022] Open
Abstract
Identifying the genes involved in venous thromboembolism (VTE) recurrence is important not only for understanding the pathogenesis but also for discovering the therapeutic targets. We proposed a novel prioritization method called Function-Interaction-Pearson (FIP) by creating gene-disease similarity scores to prioritize candidate genes underling VTE. The scores were calculated by integrating and optimizing three types of resources including gene expression, gene ontology and protein-protein interaction. As a result, 124 out of top 200 prioritized candidate genes had been confirmed in literature, among which there were 34 antithrombotic drug targets. Compared with two well-known gene prioritization tools Endeavour and ToppNet, FIP was shown to have better performance. The approach provides a valuable alternative for drug targets discovery and disease therapy.
Collapse
Affiliation(s)
- Jing Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Binhua Liang
- National Microbology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Ruiqiang Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Binbin Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Hao Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Yiran Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Hei Longjiang Province, China
- * E-mail: (LC); (WH)
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
- * E-mail: (LC); (WH)
| |
Collapse
|
24
|
Discovering gene re-ranking efficiency and conserved gene-gene relationships derived from gene co-expression network analysis on breast cancer data. Sci Rep 2016; 6:20518. [PMID: 26892392 PMCID: PMC4759568 DOI: 10.1038/srep20518] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 01/05/2016] [Indexed: 12/18/2022] Open
Abstract
Systemic approaches are essential in the discovery of disease-specific genes, offering a different perspective and new tools on the analysis of several types of molecular relationships, such as gene co-expression or protein-protein interactions. However, due to lack of experimental information, this analysis is not fully applicable. The aim of this study is to reveal the multi-potent contribution of statistical network inference methods in highlighting significant genes and interactions. We have investigated the ability of statistical co-expression networks to highlight and prioritize genes for breast cancer subtypes and stages in terms of: (i) classification efficiency, (ii) gene network pattern conservation, (iii) indication of involved molecular mechanisms and (iv) systems level momentum to drug repurposing pipelines. We have found that statistical network inference methods are advantageous in gene prioritization, are capable to contribute to meaningful network signature discovery, give insights regarding the disease-related mechanisms and boost drug discovery pipelines from a systems point of view.
Collapse
|
25
|
Sulakhe D, Xie B, Taylor A, D'Souza M, Balasubramanian S, Hashemifar S, White S, Dave UJ, Agam G, Xu J, Wang S, Gilliam TC, Maltsev N. Lynx: a knowledge base and an analytical workbench for integrative medicine. Nucleic Acids Res 2016; 44:D882-7. [PMID: 26590263 PMCID: PMC4702889 DOI: 10.1093/nar/gkv1257] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 10/29/2015] [Accepted: 10/30/2015] [Indexed: 01/29/2023] Open
Abstract
Lynx (http://lynx.ci.uchicago.edu) is a web-based database and a knowledge extraction engine. It supports annotation and analysis of high-throughput experimental data and generation of weighted hypotheses regarding genes and molecular mechanisms contributing to human phenotypes or conditions of interest. Since the last release, the Lynx knowledge base (LynxKB) has been periodically updated with the latest versions of the existing databases and supplemented with additional information from public databases. These additions have enriched the data annotations provided by Lynx and improved the performance of Lynx analytical tools. Moreover, the Lynx analytical workbench has been supplemented with new tools for reconstruction of co-expression networks and feature-and-network-based prioritization of genetic factors and molecular mechanisms. These developments facilitate the extraction of meaningful knowledge from experimental data and LynxKB. The Service Oriented Architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL 60637, USA
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637, USA
| | - Steven White
- Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, Chicago, IL 60637, USA
| | - Utpal J Dave
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL 60637, USA
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL 60637, USA
| |
Collapse
|
26
|
NetRanker: A network-based gene ranking tool using protein-protein interaction and gene expression data. BIOCHIP JOURNAL 2015. [DOI: 10.1007/s13206-015-9407-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
27
|
Antanaviciute A, Daly C, Crinnion LA, Markham AF, Watson CM, Bonthron DT, Carr IM. GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles. Bioinformatics 2015; 31:2728-35. [PMID: 25861967 PMCID: PMC4528628 DOI: 10.1093/bioinformatics/btv196] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 04/01/2015] [Indexed: 12/12/2022] Open
Abstract
Motivation: In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually. Results: Here, we present Gene TIssue Expression Ranker (GeneTIER), a new web-based application for candidate gene prioritization. GeneTIER replaces knowledge-based inference traditionally used in candidate disease gene prioritization applications with experimental data from tissue-specific gene expression datasets and thus largely overcomes the bias toward the better characterized genes/diseases that commonly afflict other methods. We show that our approach is capable of accurate candidate gene prioritization and illustrate its strengths and weaknesses using case study examples. Availability and Implementation: Freely available on the web at http://dna.leeds.ac.uk/GeneTIER/. Contact:umaan@leeds.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Agne Antanaviciute
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Catherine Daly
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Laura A Crinnion
- Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, UK
| | - Alexander F Markham
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | | | - David T Bonthron
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| | - Ian M Carr
- Section of Genetics, Institute of Biomedical and Clinical Sciences, School of Medicine, University of Leeds, St James's University Hospital and
| |
Collapse
|
28
|
Xie B, Agam G, Balasubramanian S, Xu J, Gilliam TC, Maltsev N, Börnigen D. Disease gene prioritization using network and feature. J Comput Biol 2015; 22:313-23. [PMID: 25844670 PMCID: PMC4808289 DOI: 10.1089/cmb.2015.0001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.
Collapse
Affiliation(s)
- Bingqing Xie
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois
| | | | - Jinbo Xu
- Toyota Technological Institute of Chicago, Chicago, Illinois
| | - T. Conrad Gilliam
- Department of Human Genetics, University of Chicago, Chicago, Illinois
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, Chicago, Illinois
| | - Daniela Börnigen
- Department of Human Genetics, University of Chicago, Chicago, Illinois
- Toyota Technological Institute of Chicago, Chicago, Illinois
| |
Collapse
|
29
|
Soul J, Hardingham TE, Boot-Handford RP, Schwartz JM. PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci Rep 2015; 5:8117. [PMID: 25631385 PMCID: PMC4822650 DOI: 10.1038/srep08117] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/19/2014] [Indexed: 12/19/2022] Open
Abstract
We describe a new method, PhenomeExpress, for the analysis of transcriptomic datasets to identify pathogenic disease mechanisms. Our analysis method includes input from both protein-protein interaction and phenotype similarity networks. This introduces valuable information from disease relevant phenotypes, which aids the identification of sub-networks that are significantly enriched in differentially expressed genes and are related to the disease relevant phenotypes. This contrasts with many active sub-network detection methods, which rely solely on protein-protein interaction networks derived from compounded data of many unrelated biological conditions and which are therefore not specific to the context of the experiment. PhenomeExpress thus exploits readily available animal model and human disease phenotype information. It combines this prior evidence of disease phenotypes with the experimentally derived disease data sets to provide a more targeted analysis. Two case studies, in subchondral bone in osteoarthritis and in Pax5 in acute lymphoblastic leukaemia, demonstrate that PhenomeExpress identifies core disease pathways in both mouse and human disease expression datasets derived from different technologies. We also validate the approach by comparison to state-of-the-art active sub-network detection methods, which reveals how it may enhance the detection of molecular phenotypes and provide a more detailed context to those previously identified as possible candidates.
Collapse
Affiliation(s)
- Jamie Soul
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Timothy E Hardingham
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Raymond P Boot-Handford
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Jean-Marc Schwartz
- Wellcome Trust Centre for Cell-Matrix Research, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| |
Collapse
|
30
|
Dubchak I, Balasubramanian S, Wang S, Meyden C, Sulakhe D, Poliakov A, Börnigen D, Xie B, Taylor A, Ma J, Paciorkowski AR, Mirzaa GM, Dave P, Agam G, Xu J, Al-Gazali L, Mason CE, Ross ME, Maltsev N, Gilliam TC. An integrative computational approach for prioritization of genomic variants. PLoS One 2014; 9:e114903. [PMID: 25506935 PMCID: PMC4266634 DOI: 10.1371/journal.pone.0114903] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/15/2014] [Indexed: 12/27/2022] Open
Abstract
An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidate genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.
Collapse
Affiliation(s)
- Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
- * E-mail: (ID); (NM)
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Cem Meyden
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
- Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, New York, United States of America
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| | - Alexander Poliakov
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Daniela Börnigen
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois, United States of America
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Alex R. Paciorkowski
- Departments of Neurology, Pediatrics, and Biomedical Genetics and Center for Neural Development and Disease, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Ghayda M. Mirzaa
- Seattle Children's Research Institute and Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
| | - Paul Dave
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Lihadh Al-Gazali
- Department of Pediatrics, Faculty of Medicine and Health Sciences, United Arab Emirates University, Al-Ain, UAE
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
- Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, New York, United States of America
| | - M. Elizabeth Ross
- Laboratory of Neurogenetics and Development, Weill Cornell Medical College, New York, New York, United States of America
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
- * E-mail: (ID); (NM)
| | - T. Conrad Gilliam
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| |
Collapse
|
31
|
Wang Q, Zhang S, Pang S, Zhang M, Wang B, Liu Q, Li J. GroupRank: rank candidate genes in PPI network by differentially expressed gene groups. PLoS One 2014; 9:e110406. [PMID: 25330105 PMCID: PMC4199715 DOI: 10.1371/journal.pone.0110406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 09/19/2014] [Indexed: 11/25/2022] Open
Abstract
Many cell activities are organized as a network, and genes are clustered into co-expressed groups if they have the same or closely related biological function or they are co-regulated. In this study, based on an assumption that a strong candidate disease gene is more likely close to gene groups in which all members coordinately differentially express than individual genes with differential expression, we developed a novel disease gene prioritization method GroupRank by integrating gene co-expression and differential expression information generated from microarray data as well as PPI network. A candidate gene is ranked high using GroupRank if it is differentially expressed in disease and control or is close to differentially co-expressed groups in PPI network. We tested our method on data sets of lung, kidney, leukemia and breast cancer. The results revealed GroupRank could efficiently prioritize disease genes with significantly improved AUC value in comparison to the previous method with no consideration of co-exprssed gene groups in PPI network. Moreover, the functional analyses of the major contributing gene group in gene prioritization of kidney cancer verified that our algorithm GroupRank not only ranks disease genes efficiently but also could help us identify and understand possible mechanisms in important physiological and pathological processes of disease.
Collapse
Affiliation(s)
- Qing Wang
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Siyi Zhang
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Shichao Pang
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Menghuan Zhang
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Bo Wang
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qi Liu
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail: (QL); (JL)
| | - Jing Li
- Department of Bioinformatics & Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
- * E-mail: (QL); (JL)
| |
Collapse
|
32
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|
33
|
Sulakhe D, Taylor A, Balasubramanian S, Feng B, Xie B, Börnigen D, Dave UJ, Foster IT, Gilliam TC, Maltsev N. Lynx web services for annotations and systems analysis of multi-gene disorders. Nucleic Acids Res 2014; 42:W473-7. [PMID: 24948611 PMCID: PMC4086124 DOI: 10.1093/nar/gku517] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, IL 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | - Bo Feng
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Bingqing Xie
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Daniela Börnigen
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Utpal J Dave
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, IL 60637, USA
| | - Ian T Foster
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, IL 60637, USA
| | - T Conrad Gilliam
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, IL 60637, USA Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Natalia Maltsev
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, IL 60637, USA Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
34
|
Abstract
Bioinformatics aids in the understanding of the biological processes of living beings and the genetic architecture of human diseases. The discovery of disease-related genes improves the diagnosis and therapy design for the disease. To save the cost and time involved in the experimental verification of the candidate genes, computational methods are employed for ranking the genes according to their likelihood of being associated with the disease. Only top-ranked genes are then verified experimentally. A variety of methods have been conceived by the researchers for the prioritization of the disease candidate genes, which differ in the data source being used or the scoring function used for ranking the genes. A review of various aspects of computational disease gene prioritization and its research issues is presented in this article. The aspects covered are gene prioritization process, data sources used, types of prioritization methods, and performance assessment methods. This article provides a brief overview and acts as a quick guide for disease gene prioritization.
Collapse
Affiliation(s)
- Nivit Gill
- 1 Punjabi University Regional Centre For IT and Management , Mohali, Punjab, India
| | | | | |
Collapse
|
35
|
Jadamba E, Shin M. A novel approach to significant pathway identification using pathway interaction network from PPI data. BIOCHIP JOURNAL 2014. [DOI: 10.1007/s13206-014-8104-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
36
|
Qabaja A, Alshalalfa M, Alanazi E, Alhajj R. Prediction of novel drug indications using network driven biological data prioritization and integration. J Cheminform 2014; 6:1. [PMID: 24397863 PMCID: PMC3896815 DOI: 10.1186/1758-2946-6-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 11/28/2013] [Indexed: 11/23/2022] Open
Abstract
Background With the rapid development of high-throughput genomic technologies and the accumulation of genome-wide datasets for gene expression profiling and biological networks, the impact of diseases and drugs on gene expression can be comprehensively characterized. Drug repositioning offers the possibility of reduced risks in the drug discovery process, thus it is an essential step in drug development. Results Computational prediction of drug-disease interactions using gene expression profiling datasets and biological networks is a new direction in drug repositioning that has gained increasing interest. We developed a computational framework to build disease-drug networks using drug- and disease-specific subnetworks. The framework incorporates protein networks to refine drug and disease associated genes and prioritize genes in disease and drug specific networks. For each drug and disease we built multiple networks using gene expression profiling and text mining. Finally a logistic regression model was used to build functional associations between drugs and diseases. Conclusions We found that representing drugs and diseases by genes with high centrality degree in gene networks is the most promising representation of drug or disease subnetworks.
Collapse
Affiliation(s)
| | - Mohammed Alshalalfa
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada.
| | | | | |
Collapse
|
37
|
High-Throughput Translational Medicine: Challenges and Solutions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:39-67. [DOI: 10.1007/978-1-4614-8778-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
38
|
Wu C, Zhu J, Zhang X. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma. BMC Bioinformatics 2013; 14:365. [PMID: 24341432 PMCID: PMC3878503 DOI: 10.1186/1471-2105-14-365] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/12/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. RESULTS In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. CONCLUSIONS In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer.
Collapse
Affiliation(s)
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, NY, USA.
| | | |
Collapse
|
39
|
Abstract
MOTIVATION Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population, and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes. RESULTS We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM uses prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection. AVAILABILITY SDREM is available from http://sb.cs.cmu.edu/sdrem. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department and Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
40
|
Sulakhe D, Balasubramanian S, Xie B, Feng B, Taylor A, Wang S, Berrocal E, Dave U, Xu J, Börnigen D, Gilliam TC, Maltsev N. Lynx: a database and knowledge extraction engine for integrative medicine. Nucleic Acids Res 2013; 42:D1007-12. [PMID: 24270788 PMCID: PMC3965040 DOI: 10.1093/nar/gkt1166] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Computation Institute, the University of Chicago, Chicago, IL 60637, USA, Department of Human Genetics, the University of Chicago, Chicago, IL 60637, USA, Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA and Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Van den broeck A, Vankelecom H, Van Delm W, Gremeaux L, Wouters J, Allemeersch J, Govaere O, Roskams T, Topal B. Human pancreatic cancer contains a side population expressing cancer stem cell-associated and prognostic genes. PLoS One 2013; 8:e73968. [PMID: 24069258 PMCID: PMC3775803 DOI: 10.1371/journal.pone.0073968] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 07/23/2013] [Indexed: 12/17/2022] Open
Abstract
In many types of cancers, a side population (SP) has been identified based on high efflux capacity, thereby enriching for chemoresistant cells as well as for candidate cancer stem cells (CSC). Here, we explored whether human pancreatic ductal adenocarcinoma (PDAC) contains a SP, and whether its gene expression profile is associated with chemoresistance, CSC and prognosis. After dispersion into single cells and incubation with Hoechst dye, we analyzed human PDAC resections specimens using flow cytometry (FACS). We identified a SP and main population (MP) in all human PDAC resection specimens (n = 52) analyzed, but detected immune (CD45+) and endothelial (CD31+) cells in this fraction together with tumor cells. The SP and MP cells, or more purified fractions depleted from CD31+/CD45+ cells (pSP and pMP), were sorted by FACS and subjected to whole-genome expression analysis. This revealed upregulation of genes associated with therapy resistance and of markers identified before in putative pancreatic CSC. pSP gene signatures of 32 or 10 up- or downregulated genes were developed and tested for discriminatory competence between pSP and pMP in different sets of PDAC samples. The prognostic value of the pSP genes was validated in a large independent series of PDAC patients (n = 78) using nCounter analysis of expression (in tumor versus surrounding pancreatic tissue) and Cox regression for disease-free and overall survival. Of these genes, expression levels of ABCB1 and CXCR4 were correlated with worse patient survival. Thus, our study for the first time demonstrates that human PDAC contains a SP. This tumor subpopulation may represent a valuable therapeutic target given its chemoresistance- and CSC-associated gene expression characteristics with potential prognostic value.
Collapse
MESH Headings
- ATP Binding Cassette Transporter, Subfamily B
- ATP Binding Cassette Transporter, Subfamily B, Member 1/genetics
- ATP Binding Cassette Transporter, Subfamily B, Member 1/metabolism
- Adult
- Aged
- Aged, 80 and over
- Carcinoma, Pancreatic Ductal/genetics
- Carcinoma, Pancreatic Ductal/metabolism
- Carcinoma, Pancreatic Ductal/mortality
- Case-Control Studies
- Female
- Gene Expression Profiling
- Humans
- Immunophenotyping
- Male
- Middle Aged
- Neoplastic Stem Cells/metabolism
- Pancreatic Neoplasms/genetics
- Pancreatic Neoplasms/metabolism
- Pancreatic Neoplasms/mortality
- Prognosis
- Receptors, CXCR4/genetics
- Receptors, CXCR4/metabolism
- Side-Population Cells/metabolism
Collapse
Affiliation(s)
- Anke Van den broeck
- Department of Abdominal Surgery, University Hospitals Leuven, Leuven, Belgium
- Laboratory of Tissue Plasticity, Research Unit of Embryo and Stem Cells, Department of Development & Regeneration, University of Leuven (KU Leuven), Leuven, Belgium
| | - Hugo Vankelecom
- Laboratory of Tissue Plasticity, Research Unit of Embryo and Stem Cells, Department of Development & Regeneration, University of Leuven (KU Leuven), Leuven, Belgium
| | - Wouter Van Delm
- VIB Nucleomics Core, University of Leuven (KU Leuven), Leuven, Belgium
| | - Lies Gremeaux
- Laboratory of Tissue Plasticity, Research Unit of Embryo and Stem Cells, Department of Development & Regeneration, University of Leuven (KU Leuven), Leuven, Belgium
| | - Jasper Wouters
- Laboratory of Tissue Plasticity, Research Unit of Embryo and Stem Cells, Department of Development & Regeneration, University of Leuven (KU Leuven), Leuven, Belgium
| | - Joke Allemeersch
- VIB Nucleomics Core, University of Leuven (KU Leuven), Leuven, Belgium
| | - Olivier Govaere
- Department of Pathology, University Hospitals Leuven, Leuven, Belgium
| | - Tania Roskams
- Department of Pathology, University Hospitals Leuven, Leuven, Belgium
| | - Baki Topal
- Department of Abdominal Surgery, University Hospitals Leuven, Leuven, Belgium
- * E-mail:
| |
Collapse
|
42
|
Zhang X, Greenlee MHW, Serb JM. EnRICH: Extraction and Ranking using Integration and Criteria Heuristics. BMC SYSTEMS BIOLOGY 2013; 7:4. [PMID: 23320748 PMCID: PMC3564850 DOI: 10.1186/1752-0509-7-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 01/07/2013] [Indexed: 11/10/2022]
Abstract
Background High throughput screening technologies enable biologists to generate candidate genes at a rate that, due to time and cost constraints, cannot be studied by experimental approaches in the laboratory. Thus, it has become increasingly important to prioritize candidate genes for experiments. To accomplish this, researchers need to apply selection requirements based on their knowledge, which necessitates qualitative integration of heterogeneous data sources and filtration using multiple criteria. A similar approach can also be applied to putative candidate gene relationships. While automation can assist in this routine and imperative procedure, flexibility of data sources and criteria must not be sacrificed. A tool that can optimize the trade-off between automation and flexibility to simultaneously filter and qualitatively integrate data is needed to prioritize candidate genes and generate composite networks from heterogeneous data sources. Results We developed the java application, EnRICH (Extraction and Ranking using Integration and Criteria Heuristics), in order to alleviate this need. Here we present a case study in which we used EnRICH to integrate and filter multiple candidate gene lists in order to identify potential retinal disease genes. As a result of this procedure, a candidate pool of several hundred genes was narrowed down to five candidate genes, of which four are confirmed retinal disease genes and one is associated with a retinal disease state. Conclusions We developed a platform-independent tool that is able to qualitatively integrate multiple heterogeneous datasets and use different selection criteria to filter each of them, provided the datasets are tables that have distinct identifiers (required) and attributes (optional). With the flexibility to specify data sources and filtering criteria, EnRICH automatically prioritizes candidate genes or gene relationships for biologists based on their specific requirements. Here, we also demonstrate that this tool can be effectively and easily used to apply highly specific user-defined criteria and can efficiently identify high quality candidate genes from relatively sparse datasets.
Collapse
Affiliation(s)
- Xia Zhang
- Department of Biomedical Sciences, 2008 Veterinary Medicine, Iowa State University, Ames, IA 50010, USA
| | | | | |
Collapse
|
43
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
44
|
Masoudi-Nejad A, Meshkin A, Haji-Eghrari B, Bidkhori G. RETRACTED ARTICLE: Candidate gene prioritization. Mol Genet Genomics 2012; 287:679-98. [DOI: 10.1007/s00438-012-0710-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 07/12/2012] [Indexed: 01/16/2023]
|
45
|
Kubrycht J, Sigler K, Souček P. Virtual interactomics of proteins from biochemical standpoint. Mol Biol Int 2012; 2012:976385. [PMID: 22928109 PMCID: PMC3423939 DOI: 10.1155/2012/976385] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 05/18/2012] [Accepted: 05/18/2012] [Indexed: 12/24/2022] Open
Abstract
Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations.
Collapse
Affiliation(s)
- Jaroslav Kubrycht
- Department of Physiology, Second Medical School, Charles University, 150 00 Prague, Czech Republic
| | - Karel Sigler
- Laboratory of Cell Biology, Institute of Microbiology, Academy of Sciences of the Czech Republic, 142 20 Prague, Czech Republic
| | - Pavel Souček
- Toxicogenomics Unit, National Institute of Public Health, 100 42 Prague, Czech Republic
| |
Collapse
|
46
|
Wu C, Zhu J, Zhang X. Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinformatics 2012; 13:182. [PMID: 22838965 PMCID: PMC3464615 DOI: 10.1186/1471-2105-13-182] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 07/17/2012] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND To understand the roles they play in complex diseases, genes need to be investigated in the networks they are involved in. Integration of gene expression and network data is a promising approach to prioritize disease-associated genes. Some methods have been developed in this field, but the problem is still far from being solved. RESULTS In this paper, we developed a method, Networked Gene Prioritizer (NGP), to prioritize cancer-associated genes. Applications on several breast cancer and lung cancer datasets demonstrated that NGP performs better than the existing methods. It provides stable top ranking genes between independent datasets. The top-ranked genes by NGP are enriched in the cancer-associated pathways. The top-ranked genes by NGP-PLK1, MCM2, MCM3, MCM7, MCM10 and SKP2 might coordinate to promote cell cycle related processes in cancer but not normal cells. CONCLUSIONS In this paper, we have developed a method named NGP, to prioritize cancer-associated genes. Our results demonstrated that NGP performs better than the existing methods.
Collapse
Affiliation(s)
- Chao Wu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, PR China.
| | | | | |
Collapse
|
47
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
48
|
Britto R, Sallou O, Collin O, Michaux G, Primig M, Chalmel F. GPSy: a cross-species gene prioritization system for conserved biological processes--application in male gamete development. Nucleic Acids Res 2012; 40:W458-65. [PMID: 22570409 PMCID: PMC3394256 DOI: 10.1093/nar/gks380] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
We present gene prioritization system (GPSy), a cross-species gene prioritization system that facilitates the arduous but critical task of prioritizing genes for follow-up functional analyses. GPSy’s modular design with regard to species, data sets and scoring strategies enables users to formulate queries in a highly flexible manner. Currently, the system encompasses 20 topics related to conserved biological processes including male gamete development discussed in this article. The web server-based tool is freely available at http://gpsy.genouest.org.
Collapse
|
49
|
Panagiotou G, Taboureau O. The impact of network biology in pharmacology and toxicology. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:221-235. [PMID: 22352466 DOI: 10.1080/1062936x.2012.657237] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
With the need to investigate alternative approaches and emerging technologies in order to increase drug efficacy and reduce adverse drug effects, network biology offers a novel way of approaching drug discovery by considering the effect of a molecule and protein's function in a global physiological environment. By studying drug action across multiple scales of complexity, from molecular to cellular and tissue level, network-based computational methods have the potential to improve our understanding of the impact of chemicals in human health. In this review we present the available large-scale databases and tools that allow integration and analysis of such information for understanding the properties of small molecules in the context of cellular networks. With the recent advances in the omics area, global integrative approaches are necessary to cope with the massive amounts of data, and biomedical researchers are urged to implement new types of analyses that can lead to new therapeutic interventions with increased safety and efficacy compared with existing medications.
Collapse
Affiliation(s)
- G Panagiotou
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | | |
Collapse
|
50
|
Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012; 13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Mathematics and Computer Science, University of Balearic Islands, ctra. de Valldemossa Km 7.5, Palma de Mallorca, 07122 Spain.
| | | | | | | |
Collapse
|