451
|
Schmitt T, Ogris C, Sonnhammer ELL. FunCoup 3.0: database of genome-wide functional coupling networks. Nucleic Acids Res 2013; 42:D380-8. [PMID: 24185702 PMCID: PMC3965084 DOI: 10.1093/nar/gkt984] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.
Collapse
Affiliation(s)
- Thomas Schmitt
- Stockholm Bioinformatics Centre, Science for Life Laboratory, Box 1031, Solna SE-17121, Sweden, Department of Biochemistry and Biophysics, Stockholm University and Swedish eScience Research Center
| | | | | |
Collapse
|
452
|
Dorn C, Grunert M, Sperling SR. Application of high-throughput sequencing for studying genomic variations in congenital heart disease. Brief Funct Genomics 2013; 13:51-65. [PMID: 24095982 DOI: 10.1093/bfgp/elt040] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Congenital heart diseases (CHD) represent the most common birth defect in human. The majority of cases are caused by a combination of complex genetic alterations and environmental influences. In the past, many disease-causing mutations have been identified; however, there is still a large proportion of cardiac malformations with unknown precise origin. High-throughput sequencing technologies established during the last years offer novel opportunities to further study the genetic background underlying the disease. In this review, we provide a roadmap for designing and analyzing high-throughput sequencing studies focused on CHD, but also with general applicability to other complex diseases. The three main next-generation sequencing (NGS) platforms including their particular advantages and disadvantages are presented. To identify potentially disease-related genomic variations and genes, different filtering steps and gene prioritization strategies are discussed. In addition, available control datasets based on NGS are summarized. Finally, we provide an overview of current studies already using NGS technologies and showing that these techniques will help to further unravel the complex genetics underlying CHD.
Collapse
Affiliation(s)
- Cornelia Dorn
- Department of Cardiovascular Genetics, Experimental and Clinical Research Center (ECRC), Charité-University Medicine Berlin and Max Delbrück Center (MDC) for Molecular Medicine, Lindenberger Weg 80, 13125 Berlin, Germany. Department of Biochemistry, Free University Berlin, Berlin, Germany. Tel.: +49-(0)30-450540123; Fax: +49-(0)30-84131699;
| | | | | |
Collapse
|
453
|
Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling. Mol Syst Biol 2013; 9:692. [PMID: 24084807 PMCID: PMC3817400 DOI: 10.1038/msb.2013.50] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/29/2013] [Indexed: 12/16/2022] Open
Abstract
By analyzing the conservation of human proteins across 87 species, we sorted proteins into clusters of coevolution. Some clusters are enriched for genes assigned to particular human diseases or molecular pathways; the other genes in the same cluster may function in related pathways and diseases. ![]()
Many genes that were thought to map to different diseases are actually coevolved together and mapped into the same phylogenetic clusters. Many molecular pathways map to the same phylogenetic clusters as genes associated with specific human diseases. Focusing on proteins coevolved with the microphthalmia-associated transcription factor (MITF), we identified the Notch pathway suppressor of hairless (RBP-Jk/SuH) transcription factor, and showed that RBP-Jk functions as an MITF cofactor. Our analysis thus establishes a connectivity between different diseases and pathways, linking diseases phenotypes and functional gene groups.
Genes with common profiles of the presence and absence in disparate genomes tend to function in the same pathway. By mapping all human genes into about 1000 clusters of genes with similar patterns of conservation across eukaryotic phylogeny, we determined that sets of genes associated with particular diseases have similar phylogenetic profiles. By focusing on those human phylogenetic gene clusters that significantly overlap some of the thousands of human gene sets defined by their coexpression or annotation to pathways or other molecular attributes, we reveal the evolutionary map that connects molecular pathways and human diseases. The other genes in the phylogenetic clusters enriched for particular known disease genes or molecular pathways identify candidate genes for roles in those same disorders and pathways. Focusing on proteins coevolved with the microphthalmia-associated transcription factor (MITF), we identified the Notch pathway suppressor of hairless (RBP-Jk/SuH) transcription factor, and showed that RBP-Jk functions as an MITF cofactor.
Collapse
|
454
|
Abstract
High throughput technologies have been applied to investigate the underlying mechanisms of complex diseases, identify disease-associations and help to improve treatment. However it is challenging to derive biological insight from conventional single gene based analysis of "omics" data from high throughput experiments due to sample and patient heterogeneity. To address these challenges, many novel pathway and network based approaches were developed to integrate various "omics" data, such as gene expression, copy number alteration, Genome Wide Association Studies, and interaction data. This review will cover recent methodological developments in pathway analysis for the detection of dysregulated interactions and disease-associated subnetworks, prioritization of candidate disease genes, and disease classifications. For each application, we will also discuss the associated challenges and potential future directions.
Collapse
|
455
|
Zuberi K, Franz M, Rodriguez H, Montojo J, Lopes CT, Bader GD, Morris Q. GeneMANIA prediction server 2013 update. Nucleic Acids Res 2013; 41:W115-22. [PMID: 23794635 PMCID: PMC3692113 DOI: 10.1093/nar/gkt533] [Citation(s) in RCA: 311] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
GeneMANIA (http://www.genemania.org) is a flexible user-friendly web interface for generating hypotheses about gene function, analyzing gene lists and prioritizing genes for functional assays. Given a query gene list, GeneMANIA extends the list with functionally similar genes that it identifies using available genomics and proteomics data. GeneMANIA also reports weights that indicate the predictive value of each selected data set for the query. GeneMANIA can also be used in a function prediction setting: given a query gene, GeneMANIA finds a small set of genes that are most likely to share function with that gene based on their interactions with it. Enriched Gene Ontology categories among this set can sometimes point to the function of the gene. Seven organisms are currently supported (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Homo sapiens, Rattus norvegicus and Saccharomyces cerevisiae), and hundreds of data sets have been collected from GEO, BioGRID, IRefIndex and I2D, as well as organism-specific functional genomics data sets. Users can customize their search by selecting specific data sets to query and by uploading their own data sets to analyze.
Collapse
Affiliation(s)
- Khalid Zuberi
- The Donnelly Centre, University of Toronto, Ontario, Canada
| | | | | | | | | | | | | |
Collapse
|
456
|
Yang JS, Kim J, Park S, Jeon J, Shin YE, Kim S. Spatial and functional organization of mitochondrial protein network. Sci Rep 2013; 3:1403. [PMID: 23466738 PMCID: PMC3590558 DOI: 10.1038/srep01403] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 02/21/2013] [Indexed: 12/24/2022] Open
Abstract
Characterizing the spatial organization of the human mitochondrial proteome will enhance our understanding of mitochondrial functions at the molecular level and provide key insight into protein-disease associations. However, the sub-organellar location and possible association with mitochondrial diseases are not annotated for most mitochondrial proteins. Here, we characterized the functional and spatial organization of mitochondrial proteins by assessing their position in the Mitochondrial Protein Functional (MPF) network. Network position was assigned to the MPF network and facilitated the determination of sub-organellar location and functional organization of mitochondrial proteins. Moreover, network position successfully identified candidate disease genes of several mitochondrial disorders. Thus, our data support the use of network position as a novel method to explore the molecular function and pathogenesis of mitochondrial proteins.
Collapse
Affiliation(s)
- Jae-Seong Yang
- School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, Pohang, Gyeongbuk, Korea, 790-784
| | | | | | | | | | | |
Collapse
|
457
|
Abstract
We present a comprehensive toolkit for post-processing, visualization and advanced analysis of GWAS results. In the spirit of comparable tools for gene-expression analysis, we attempt to unify and simplify several procedures that are essential for the interpretation of GWAS results. This includes the generation of advanced Manhattan and regional association plots including rare variant display as well as novel interaction network analysis tools for the investigation of systems-biology aspects. Our package supports virtually all model organisms and represents the first cohesive implementation of such tools for the popular language R. Previous software of that range is dispersed over a wide range of platforms and mostly not adaptable for custom work pipelines. We demonstrate the utility of this package by providing an example workflow on a publicly available dataset.
Collapse
|
458
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
459
|
Kim YN, Kim S, Kim IY, Shin JH, Cho S, Yi SS, Kim WK, Kim KS, Lee S, Seong JK. Transcriptomic analysis of insulin-sensitive tissues from anti-diabetic drug treated ZDF rats, a T2DM animal model. PLoS One 2013; 8:e69624. [PMID: 23922760 PMCID: PMC3724940 DOI: 10.1371/journal.pone.0069624] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 06/12/2013] [Indexed: 12/11/2022] Open
Abstract
Gene expression changes have been associated with type 2 diabetes mellitus (T2DM); however, the alterations are not fully understood. We investigated the effects of anti-diabetic drugs on gene expression in Zucker diabetic fatty (ZDF) rats using oligonucleotide microarray technology to identify gene expression changes occurring in T2DM. Global gene expression in the pancreas, adipose tissue, skeletal muscle, and liver was profiled from Zucker lean control (ZLC) and anti-diabetic drug treated ZDF rats compared with those in ZDF rats. We showed that anti-diabetic drugs regulate the expression of a large number of genes. We provided a more integrated view of the diabetic changes by examining the gene expression networks. The resulting sub-networks allowed us to identify several biological processes that were significantly enriched by the anti-diabetic drug treatment, including oxidative phosphorylation (OXPHOS), systemic lupus erythematous, and the chemokine signaling pathway. Among them, we found that white adipose tissue from ZDF rats showed decreased expression of a set of OXPHOS genes that were normalized by rosiglitazone treatment accompanied by rescued blood glucose levels. In conclusion, we suggest that alterations in OXPHOS gene expression in white adipose tissue may play a role in the pathogenesis and drug mediated recovery of T2DM through a comprehensive gene expression network study after multi-drug treatment of ZDF rats.
Collapse
Affiliation(s)
- Yo Na Kim
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Research Institute for Veterinary Science, BK21 Program for Veterinary Science, Seoul National University, Seoul, Korea
| | - Sangok Kim
- Ewha Research Center for Systems Biology, Division of Molecular and Life Sciences, Ewha Womans University, Seoul, Korea
| | - Il-Yong Kim
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Research Institute for Veterinary Science, BK21 Program for Veterinary Science, Seoul National University, Seoul, Korea
| | - Jae Hoon Shin
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Research Institute for Veterinary Science, BK21 Program for Veterinary Science, Seoul National University, Seoul, Korea
| | - Sooyoung Cho
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Research Institute for Veterinary Science, BK21 Program for Veterinary Science, Seoul National University, Seoul, Korea
- Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX Institute, Seoul National University, Seoul, Korea
| | - Sun Shin Yi
- Department of Biomedical Laboratory Science, College of Medical Sciences, Soonchunhyang University, Asan, Chungnam, Korea
| | - Wan Kyu Kim
- Ewha Research Center for Systems Biology, Division of Molecular and Life Sciences, Ewha Womans University, Seoul, Korea
| | - Kyung-Sub Kim
- Department of Biochemistry and Molecular Biology, Integrated Genomic Research Center for Metabolic Regulation, Institute of Genetic Science, Brain Korea 21 Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Sanghyuk Lee
- Ewha Research Center for Systems Biology, Division of Molecular and Life Sciences, Ewha Womans University, Seoul, Korea
| | - Je Kyung Seong
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Research Institute for Veterinary Science, BK21 Program for Veterinary Science, Seoul National University, Seoul, Korea
- Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX Institute, Seoul National University, Seoul, Korea
- * E-mail:
| |
Collapse
|
460
|
Halldórsson BV, Sharan R. Network-based interpretation of genomic variation data. J Mol Biol 2013; 425:3964-9. [PMID: 23886866 DOI: 10.1016/j.jmb.2013.07.026] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 07/02/2013] [Accepted: 07/16/2013] [Indexed: 02/02/2023]
Abstract
Advances in sequencing technologies are allowing genome-wide association studies at an ever-growing scale. The interpretation of these studies requires dealing with statistical and combinatorial challenges, owing to the multi-factorial nature of human diseases and the huge space of genomic markers that are being monitored. Recently, it was proposed that using protein-protein interaction network information could help in tackling these challenges by restricting attention to markers or combinations of markers that map to close proteins in the network. In this review, we survey techniques for integrating genomic variation data with network information to improve our understanding of complex diseases and reveal meaningful associations.
Collapse
Affiliation(s)
- Bjarni V Halldórsson
- School of Science and Engineering, Reykjavík University, 101 Reykjavík, Iceland.
| | | |
Collapse
|
461
|
Abstract
Genetic studies in immune-mediated diseases have yielded a large number of disease-associated loci. Here we review the progress being made in 12 such diseases, for which 199 independently associated non-HLA loci have been identified by genome-wide association studies since 2007. It is striking that many of the loci are not unique to a single disease but shared between different immune-mediated diseases. The challenge now is to understand how the unique and shared genetic factors can provide insight into the underlying disease biology. We annotated disease-associated variants using the Encyclopedia of DNA Elements (ENCODE) database and demonstrate that, of the predisposing disease variants, the majority have the potential to be regulatory. We also demonstrate that many of these variants affect the expression of nearby genes. Furthermore, we summarize results from the Immunochip, a custom array, which allows a detailed comparison between five of the diseases that have so far been analyzed using this platform.
Collapse
Affiliation(s)
- Isis Ricaño-Ponce
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands;
| | | |
Collapse
|
462
|
Zhou X, Chen P, Wei Q, Shen X, Chen X. Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets. ACTA ACUST UNITED AC 2013; 29:2024-31. [PMID: 23782618 DOI: 10.1093/bioinformatics/btt353] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION A molecular interaction network can be viewed as a network in which genes with related functions are connected. Therefore, at a systems level, connections between individual genes in a molecular interaction network can be used to infer the collective functional linkages between biologically meaningful gene sets. RESULTS We present the human interactome resource and the gene set linkage analysis (GSLA) tool for the functional interpretation of biologically meaningful gene sets observed in experiments. GSLA determines whether an observed gene set has significant functional linkages to established biological processes. When an observed gene set is not enriched by known biological processes, traditional enrichment-based interpretation methods cannot produce functional insights, but GSLA can still evaluate whether those genes work in concert to regulate specific biological processes, thereby suggesting the functional implications of the observed gene set. The quality of human interactome resource and the utility of GSLA are illustrated with multiple assessments. AVAILABILITY http://www.cls.zju.edu.cn/hir/
Collapse
Affiliation(s)
- Xi Zhou
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, P.R. China
| | | | | | | | | |
Collapse
|
463
|
Baranzini S, Khankhanian P, Patsopoulos N, Li M, Stankovich J, Cotsapas C, Søndergaard H, Ban M, Barizzone N, Bergamaschi L, Booth D, Buck D, Cavalla P, Celius E, Comabella M, Comi G, Compston A, Cournu-Rebeix I, D’alfonso S, Damotte V, Din L, Dubois B, Elovaara I, Esposito F, Fontaine B, Franke A, Goris A, Gourraud PA, Graetz C, Guerini F, Guillot-Noel L, Hafler D, Hakonarson H, Hall P, Hamsten A, Harbo H, Hemmer B, Hillert J, Kemppinen A, Kockum I, Koivisto K, Larsson M, Lathrop M, Leone M, Lill C, Macciardi F, Martin R, Martinelli V, Martinelli-Boneschi F, McCauley J, Myhr KM, Naldi P, Olsson T, Oturai A, Pericak-Vance M, Perla F, Reunanen M, Saarela J, Saker-Delye S, Salvetti M, Sellebjerg F, Sørensen P, Spurkland A, Stewart G, Taylor B, Tienari P, Winkelmann J, Zipp F, Ivinson A, Haines J, Sawcer S, DeJager P, Hauser S, Oksenberg J. Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls. Am J Hum Genet 2013; 92:854-65. [PMID: 23731539 PMCID: PMC3958952 DOI: 10.1016/j.ajhg.2013.04.019] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Revised: 04/04/2013] [Accepted: 04/23/2013] [Indexed: 02/03/2023] Open
Abstract
Multiple sclerosis (MS) is an inflammatory CNS disease with a substantial genetic component, originally mapped to only the human leukocyte antigen (HLA) region. In the last 5 years, a total of seven genome-wide association studies and one meta-analysis successfully identified 57 non-HLA susceptibility loci. Here, we merged nominal statistical evidence of association and physical evidence of interaction to conduct a protein-interaction-network-based pathway analysis (PINBPA) on two large genetic MS studies comprising a total of 15,317 cases and 29,529 controls. The distribution of nominally significant loci at the gene level matched the patterns of extended linkage disequilibrium in regions of interest. We found that products of genome-wide significantly associated genes are more likely to interact physically and belong to the same or related pathways. We next searched for subnetworks (modules) of genes (and their encoded proteins) enriched with nominally associated loci within each study and identified those modules in common between the two studies. We demonstrate that these modules are more likely to contain genes with bona fide susceptibility variants and, in addition, identify several high-confidence candidates (including BCL10, CD48, REL, TRAF3, and TEC). PINBPA is a powerful approach to gaining further insights into the biology of associated genes and to prioritizing candidates for subsequent genetic studies of complex traits.
Collapse
|
464
|
Ma RCW, Hu C, Tam CH, Zhang R, Kwan P, Leung TF, Thomas GN, Go MJ, Hara K, Sim X, Ho JSK, Wang C, Li H, Lu L, Wang Y, Li JW, Wang Y, Lam VKL, Wang J, Yu W, Kim YJ, Ng DP, Fujita H, Panoutsopoulou K, Day-Williams AG, Lee HM, Ng ACW, Fang YJ, Kong APS, Jiang F, Ma X, Hou X, Tang S, Lu J, Yamauchi T, Tsui SKW, Woo J, Leung PC, Zhang X, Tang NLS, Sy HY, Liu J, Wong TY, Lee JY, Maeda S, Xu G, Cherny SS, Chan TF, Ng MCY, Xiang K, Morris AP, Keildson S, Hu R, Ji L, Lin X, Cho YS, Kadowaki T, Tai ES, Zeggini E, McCarthy MI, Hon KL, Baum L, Tomlinson B, So WY, Bao Y, Chan JCN, Jia W. Genome-wide association study in a Chinese population identifies a susceptibility locus for type 2 diabetes at 7q32 near PAX4. Diabetologia 2013; 56:1291-305. [PMID: 23532257 PMCID: PMC3648687 DOI: 10.1007/s00125-013-2874-4] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Accepted: 01/31/2013] [Indexed: 12/18/2022]
Abstract
AIMS/HYPOTHESIS Most genetic variants identified for type 2 diabetes have been discovered in European populations. We performed genome-wide association studies (GWAS) in a Chinese population with the aim of identifying novel variants for type 2 diabetes in Asians. METHODS We performed a meta-analysis of three GWAS comprising 684 patients with type 2 diabetes and 955 controls of Southern Han Chinese descent. We followed up the top signals in two independent Southern Han Chinese cohorts (totalling 10,383 cases and 6,974 controls), and performed in silico replication in multiple populations. RESULTS We identified CDKN2A/B and four novel type 2 diabetes association signals with p < 1 × 10(-5) from the meta-analysis. Thirteen variants within these four loci were followed up in two independent Chinese cohorts, and rs10229583 at 7q32 was found to be associated with type 2 diabetes in a combined analysis of 11,067 cases and 7,929 controls (p meta = 2.6 × 10(-8); OR [95% CI] 1.18 [1.11, 1.25]). In silico replication revealed consistent associations across multiethnic groups, including five East Asian populations (p meta = 2.3 × 10(-10)) and a population of European descent (p = 8.6 × 10(-3)). The rs10229583 risk variant was associated with elevated fasting plasma glucose, impaired beta cell function in controls, and an earlier age at diagnosis for the cases. The novel variant lies within an islet-selective cluster of open regulatory elements. There was significant heterogeneity of effect between Han Chinese and individuals of European descent, Malaysians and Indians. CONCLUSIONS/INTERPRETATION Our study identifies rs10229583 near PAX4 as a novel locus for type 2 diabetes in Chinese and other populations and provides new insights into the pathogenesis of type 2 diabetes.
Collapse
Affiliation(s)
- R. C. W. Ma
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
- Hong Kong Institute of Diabetes and Obesity, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
- Li Ka Shing Institute of Life Sciences, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - C. Hu
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
- Shanghai Jiao Tong University Affiliated Sixth People’s Hospital South Campus, Shanghai, People’s Republic of China
| | - C. H. Tam
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - R. Zhang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - P. Kwan
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - T. F. Leung
- Department of Paediatrics, Chinese University of Hong Kong, Hong Kong, People’s Republic of China
| | - G. N. Thomas
- Department of Public Health, Epidemiology and Biostatistics, University of Birmingham, Birmingham, UK
| | - M. J. Go
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Gangoe-myeon, Yeonje-ri, Cheongwon-gun, Chungcheongbuk-do Republic of Korea
| | - K. Hara
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
- Department of Integrated Molecular Science on Metabolic Diseases, University of Tokyo Hospital, Tokyo, Japan
| | - X. Sim
- Centre for Molecular Epidemiology, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Republic of Singapore
- Center for Statistical Genetics and Department of Biostatistics, University of Michigan, Ann Arbor, MI USA
| | - J. S. K. Ho
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - C. Wang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - H. Li
- Key Laboratory of Nutrition and Metabolism, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Graduate School of the Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - L. Lu
- Key Laboratory of Nutrition and Metabolism, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Graduate School of the Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Y. Wang
- Key Laboratory of Nutrition and Metabolism, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Graduate School of the Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - J. W. Li
- School of Life Sciences, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - Y. Wang
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - V. K. L. Lam
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - J. Wang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - W. Yu
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - Y. J. Kim
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Gangoe-myeon, Yeonje-ri, Cheongwon-gun, Chungcheongbuk-do Republic of Korea
| | - D. P. Ng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Republic of Singapore
| | - H. Fujita
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - K. Panoutsopoulou
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - A. G. Day-Williams
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - H. M. Lee
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - A. C. W. Ng
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - Y-J. Fang
- Department of Colorectal Surgery, State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Guangzhou, People’s Republic of China
| | - A. P. S. Kong
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - F. Jiang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - X. Ma
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - X. Hou
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - S. Tang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - J. Lu
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - T. Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - S. K. W. Tsui
- School of Biomedical Sciences, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - J. Woo
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - P. C. Leung
- Department of Orthopaedics, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - X. Zhang
- Shanghai Jiao Tong University Affiliated Sixth People’s Hospital South Campus, Shanghai, People’s Republic of China
| | - N. L. S. Tang
- Department of Chemical Pathology, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - H. Y. Sy
- Department of Paediatrics, Chinese University of Hong Kong, Hong Kong, People’s Republic of China
| | - J. Liu
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Republic of Singapore
| | - T. Y. Wong
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Republic of Singapore
- Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
- Centre for Eye Research Australia, University of Melbourne, East Melbourne, VIC Australia
| | - J. Y. Lee
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Gangoe-myeon, Yeonje-ri, Cheongwon-gun, Chungcheongbuk-do Republic of Korea
| | - S. Maeda
- Laboratory for Endocrinology and Metabolism, RIKEN Center for Genomic Medicine, Yokohama, Japan
| | - G. Xu
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - S. S. Cherny
- Department of Psychiatry and State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - T. F. Chan
- School of Life Sciences, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - M. C. Y. Ng
- Center for Genomics and Personalized Medicine Research, Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, NC USA
| | - K. Xiang
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - A. P. Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - S. Keildson
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - R. Hu
- Institute of Endocrinology and Diabetology, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, People’s Republic of China
| | - L. Ji
- Department of Endocrinology and Metabolism, Peking University People’s Hospital, Beijing, People’s Republic of China
| | - X. Lin
- Key Laboratory of Nutrition and Metabolism, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Graduate School of the Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Y. S. Cho
- Department of Biomedical Science, Hallym University, Chuncheon, Gangwon-do Republic of Korea
| | - T. Kadowaki
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - E. S. Tai
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
- Graduate Medical School, Duke-National University of Singapore, Singapore, Republic of Singapore
| | - E. Zeggini
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - M. I. McCarthy
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, UK
| | - K. L. Hon
- Department of Paediatrics, Chinese University of Hong Kong, Hong Kong, People’s Republic of China
| | - L. Baum
- School of Pharmacy, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - B. Tomlinson
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - W. Y. So
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
| | - Y. Bao
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| | - J. C. N. Chan
- Department of Medicine and Therapeutics, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, SAR People’s Republic of China
- Hong Kong Institute of Diabetes and Obesity, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
- Li Ka Shing Institute of Life Sciences, Chinese University of Hong Kong, Hong Kong, SAR People’s Republic of China
| | - W. Jia
- Department of Endocrinology and Metabolism, Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Key Clinical Center for Metabolic Disease, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, 600 Yishan Road, Shanghai, 200233 People’s Republic of China
| |
Collapse
|
465
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
466
|
Lehne B, Schlitt T. Breaking free from the chains of pathway annotation: de novo pathway discovery for the analysis of disease processes. Pharmacogenomics 2013; 13:1967-78. [PMID: 23215889 DOI: 10.2217/pgs.12.170] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Interpreting the biological implications of high-throughput experiments such as gene-expression studies, genome-wide association studies and large-scale sequencing studies is not trivial. Gene-set and pathway analyses are useful tools to support the interpretation of such experiments, but rely on curated pathways or gene sets. The recent development of de novo pathway discovery methods aims to overcome this limitation. This article provides an overview of the methods currently available and reviews the advantages and challenges of this approach. In detail, it highlights the particular issues of de novo pathway discovery based on genome-wide association studies data, for which multiple different strategies have been proposed.
Collapse
Affiliation(s)
- Benjamin Lehne
- Bioinformatics Group, Department of Medical & Molecular Genetics, 8th Floor Tower Wing Guy's Hospital, London SE1 9RT, UK
| | | |
Collapse
|
467
|
Kim E, Kim H, Lee I. JiffyNet: a web-based instant protein network modeler for newly sequenced species. Nucleic Acids Res 2013; 41:W192-7. [PMID: 23685435 PMCID: PMC3692116 DOI: 10.1093/nar/gkt419] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Revolutionary DNA sequencing technology has enabled affordable genome sequencing for numerous species. Thousands of species already have completely decoded genomes, and tens of thousands more are in progress. Naturally, parallel expansion of the functional parts list library is anticipated, yet genome-level understanding of function also requires maps of functional relationships, such as functional protein networks. Such networks have been constructed for many sequenced species including common model organisms. Nevertheless, the majority of species with sequenced genomes still have no protein network models available. Moreover, biologists might want to obtain protein networks for their species of interest on completion of the genome projects. Therefore, there is high demand for accessible means to automatically construct genome-scale protein networks based on sequence information from genome projects only. Here, we present a public web server, JiffyNet, specifically designed to instantly construct genome-scale protein networks based on associalogs (functional associations transferred from a template network by orthology) for a query species with only protein sequences provided. Assessment of the networks by JiffyNet demonstrated generally high predictive ability for pathway annotations. Furthermore, JiffyNet provides network visualization and analysis pages for wide variety of molecular concepts to facilitate network-guided hypothesis generation. JiffyNet is freely accessible at http://www.jiffynet.org.
Collapse
Affiliation(s)
- Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | | | | |
Collapse
|
468
|
Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM. Prediction and validation of gene-disease associations using methods inspired by social network analyses. PLoS One 2013; 8:e58977. [PMID: 23650495 PMCID: PMC3641094 DOI: 10.1371/journal.pone.0058977] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Accepted: 02/12/2013] [Indexed: 11/30/2022] Open
Abstract
Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall. The authors want to thank Jon Laurent and Kris McGary for some of the data used, and Li and Patra for making their code available. Most of Ambuj Tewari's contribution to this work happened while he was a postdoctoral fellow at the University of Texas at Austin.
Collapse
Affiliation(s)
- U. Martin Singh-Blom
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
- Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Nagarajan Natarajan
- Department of Computer Science. University of Texas, Austin, Texas, United States of America
| | - Ambuj Tewari
- Department of Statistics. University of Michigan, Ann Arbor, Michigan, United States of America
| | - John O. Woods
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
| | - Inderjit S. Dhillon
- Department of Computer Science. University of Texas, Austin, Texas, United States of America
- * E-mail: (EMM); (ISD)
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
- Department of Chemistry and Biochemistry. University of Texas, Austin, Texas, United States of America
- * E-mail: (EMM); (ISD)
| |
Collapse
|
469
|
Lee I. Network approaches to the genetic dissection of phenotypes in animals and humans. Anim Cells Syst (Seoul) 2013. [DOI: 10.1080/19768354.2013.789076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
470
|
The human gene connectome as a map of short cuts for morbid allele discovery. Proc Natl Acad Sci U S A 2013; 110:5558-63. [PMID: 23509278 DOI: 10.1073/pnas.1218167110] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
High-throughput genomic data reveal thousands of gene variants per patient, and it is often difficult to determine which of these variants underlies disease in a given individual. However, at the population level, there may be some degree of phenotypic homogeneity, with alterations of specific physiological pathways underlying the pathogenesis of a particular disease. We describe here the human gene connectome (HGC) as a unique approach for human mendelian genetic research, facilitating the interpretation of abundant genetic data from patients with the same disease, and guiding subsequent experimental investigations. We first defined the set of the shortest plausible biological distances, routes, and degrees of separation between all pairs of human genes by applying a shortest distance algorithm to the full human gene network. We then designed a hypothesis-driven application of the HGC, in which we generated a Toll-like receptor 3-specific connectome useful for the genetic dissection of inborn errors of Toll-like receptor 3 immunity. In addition, we developed a functional genomic alignment approach from the HGC. In functional genomic alignment, the genes are clustered according to biological distance (rather than the traditional molecular evolutionary genetic distance), as estimated from the HGC. Finally, we compared the HGC with three state-of-the-art methods: String, FunCoup, and HumanNet. We demonstrated that the existing methods are more suitable for polygenic studies, whereas HGC approaches are more suitable for monogenic studies. The HGC and functional genomic alignment data and computer programs are freely available to noncommercial users from http://lab.rockefeller.edu/casanova/HGC and should facilitate the genome-wide selection of disease-causing candidate alleles for experimental validation.
Collapse
|
471
|
Quantitative genetic-interaction mapping in mammalian cells. Nat Methods 2013; 10:432-7. [PMID: 23407553 DOI: 10.1038/nmeth.2398] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 01/17/2013] [Indexed: 12/14/2022]
Abstract
Mapping genetic interactions (GIs) by simultaneously perturbing pairs of genes is a powerful tool for understanding complex biological phenomena. Here we describe an experimental platform for generating quantitative GI maps in mammalian cells using a combinatorial RNA interference strategy. We performed ∼11,000 pairwise knockdowns in mouse fibroblasts, focusing on 130 factors involved in chromatin regulation to create a GI map. Comparison of the GI and protein-protein interaction (PPI) data revealed that pairs of genes exhibiting positive GIs and/or similar genetic profiles were predictive of the corresponding proteins being physically associated. The mammalian GI map identified pathways and complexes but also resolved functionally distinct submodules within larger protein complexes. By integrating GI and PPI data, we created a functional map of chromatin complexes in mouse fibroblasts, revealing that the PAF complex is a central player in the mammalian chromatin landscape.
Collapse
|
472
|
Dand N, Sprengel F, Ahlers V, Schlitt T. BioGranat-IG: a network analysis tool to suggest mechanisms of genetic heterogeneity from exome-sequencing data. ACTA ACUST UNITED AC 2013; 29:733-41. [PMID: 23361329 DOI: 10.1093/bioinformatics/btt045] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION Recent exome-sequencing studies have successfully identified disease-causing sequence variants for several rare monogenic diseases by examining variants common to a group of patients. However, the current data analysis strategies are only insufficiently able to deal with confounding factors such as genetic heterogeneity, incomplete penetrance, individuals lacking data and involvement of several genes. RESULTS We introduce BioGranat-IG, an analysis strategy that incorporates the information contained in biological networks to the analysis of exome-sequencing data. To identify genes that may have a disease-causing role, we label all nodes of the network according to the individuals that are carrying a sequence variant and subsequently identify small subnetworks linked to all or most individuals. Using simulated exome-sequencing data, we demonstrate that BioGranat-IG is able to recover the genes responsible for two diseases known to be caused by variants in an underlying complex. We also examine the performance of BioGranat-IG under various conditions likely to be faced by the user, and show that its network-based approach is more powerful than a set-cover-based approach.
Collapse
Affiliation(s)
- Nick Dand
- Department of Medical and Molecular Genetics, King's College London, London SE1 9RT, UK
| | | | | | | |
Collapse
|
473
|
Abstract
To what extent can variation in phenotypic traits such as disease risk be accurately predicted in individuals? In this Review, I highlight recent studies in model organisms that are relevant both to the challenge of accurately predicting phenotypic variation from individual genome sequences ('whole-genome reverse genetics') and for understanding why, in many cases, this may be impossible. These studies argue that only by combining genetic knowledge with in vivo measurements of biological states will it be possible to make accurate genetic predictions for individual humans.
Collapse
|
474
|
Peng J, Chen J, Wang Y. Identifying cross-category relations in gene ontology and constructing genome-specific term association networks. BMC Bioinformatics 2013; 14 Suppl 2:S15. [PMID: 23368677 PMCID: PMC3549802 DOI: 10.1186/1471-2105-14-s2-s15] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background Gene Ontology (GO) has been widely used in biological databases, annotation projects, and computational analyses. Although the three GO categories are structured as independent ontologies, the biological relationships across the categories are not negligible for biological reasoning and knowledge integration. However, the existing cross-category ontology term similarity measures are either developed by utilizing the GO data only or based on manually curated term name similarities, ignoring the fact that GO is evolving quickly and the gene annotations are far from complete. Results In this paper we introduce a new cross-category similarity measurement called CroGO by incorporating genome-specific gene co-function network data. The performance study showed that our measurement outperforms the existing algorithms. We also generated genome-specific term association networks for yeast and human. An enrichment based test showed our networks are better than those generated by the other measures. Conclusions The genome-specific term association networks constructed using CroGO provided a platform to enable a more consistent use of GO. In the networks, the frequently occurred MF-centered hub indicates that a molecular function may be shared by different genes in multiple biological processes, or a set of genes with the same functions may participate in distinct biological processes. And common subgraphs in multiple organisms also revealed conserved GO term relationships. Software and data are available online at http://www.msu.edu/˜jinchen/CroGO.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | | | | |
Collapse
|
475
|
Systems biology approach reveals genome to phenome correlation in type 2 diabetes. PLoS One 2013; 8:e53522. [PMID: 23308243 PMCID: PMC3538588 DOI: 10.1371/journal.pone.0053522] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 12/03/2012] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies (GWASs) have discovered association of several loci with Type 2 diabetes (T2D), a common complex disease characterized by impaired insulin secretion by pancreatic β cells and insulin signaling in target tissues. However, effect of genetic risk variants on continuous glycemic measures in nondiabetic subjects mainly elucidates perturbation of insulin secretion. Also, the disease associated genes do not clearly converge on functional categories consistent with the known aspects of T2D pathophysiology. We used a systems biology approach to unravel genome to phenome correlation in T2D. We first examined enrichment of pathways in genes identified in T2D GWASs at genome-wide or lower levels of significance. Genes at lower significance threshold showed enrichment of insulin secretion related pathway. Notably, physical and genetic interaction network of these genes showed robust enrichment of insulin signaling and other T2D pathophysiology related pathways including insulin secretion. The network also overrepresented genes reported to interact with insulin secretion and insulin action targeting antidiabetic drugs. The drug interacting genes themselves showed overrepresentation of insulin signaling and other T2D relevant pathways. Next, we generated genome-wide expression profiles of multiple insulin responsive tissues from nondiabetic and diabetic patients. Remarkably, the differentially expressed genes showed significant overlap with the network genes, with the intersection showing enrichment of insulin signaling and other pathways consistent with T2D pathophysiology. Literature search led our genomic, interactomic, transcriptomic and toxicogenomic evidence to converge on TGF-beta signaling, a pathway known to play a crucial role in pancreatic islets development and function, and insulin signaling. Cumulatively, we find that GWAS genes relate directly to insulin secretion and indirectly, through collaborating with other genes, to insulin resistance. This seems to support the epidemiological evidence that environmentally triggered insulin resistance interacts with genetically programmed β cell dysfunction to precipitate diabetes.
Collapse
|
476
|
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013; 41:D808-15. [PMID: 23203871 PMCID: PMC3531103 DOI: 10.1093/nar/gks1094] [Citation(s) in RCA: 3321] [Impact Index Per Article: 276.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Revised: 10/15/2012] [Accepted: 10/18/2012] [Indexed: 12/12/2022] Open
Abstract
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Collapse
Affiliation(s)
- Andrea Franceschini
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Sune Frankild
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael Kuhn
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Milan Simonovic
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Alexander Roth
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Jianyi Lin
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Pablo Minguez
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Peer Bork
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Lars J. Jensen
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| |
Collapse
|
477
|
KIM YOOAH, SALARI RAHELEH, WUCHTY STEFAN, PRZYTYCKA TERESAM. Module cover - a new approach to genotype-phenotype studies. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:135-46. [PMID: 23424119 PMCID: PMC3595055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Uncovering and interpreting phenotype/genotype relationships are among the most challenging open questions in disease studies. Set cover approaches are explicitly designed to provide a representative set for diverse disease cases and thus are valuable in studies of heterogeneous datasets. At the same time pathway-centric methods have emerged as key approaches that significantly empower studies of genotype-phenotype relationships. Combining the utility of set cover techniques with the power of network-centric approaches, we designed a novel approach that extends the concept of set cover to network modules cover. We developed two alternative methods to solve the module cover problem: (i) an integrated method that simultaneously determines network modules and optimizes the coverage of disease cases. (ii) a two-step method where we first determined a candidate set of network modules and subsequently selected modules that provided the best coverage of the disease cases. The integrated method showed superior performance in the context of our application. We demonstrated the utility of the module cover approach for the identification of groups of related genes whose activity is perturbed in a coherent way by specific genomic alterations, allowing the interpretation of the heterogeneity of cancer cases.
Collapse
Affiliation(s)
- YOO-AH KIM
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 2089, USA
| | | | - STEFAN WUCHTY
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 2089, USA
| | - TERESA M. PRZYTYCKA
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 2089, USA
| |
Collapse
|
478
|
Abstract
High-throughput methods for screening of physical and functional interactions now provide the means to study virus-host interactions on a genome scale. The limited coverage of these methods and the large size and uncertain quality of the identified interaction sets, however, require sophisticated computational approaches to obtain novel insights and hypotheses on virus infection processes from these interactions. Here, we describe the central steps of bioinformatics methods applied most commonly for this task and highlight important aspects that need to be considered and potential pitfalls that should be avoided.
Collapse
Affiliation(s)
- Susanne M. Bailer
- University of Stuttgart Institute of Interfacial Process, Stuttgart, Germany
| | - Diana Lieber
- Ulm University Medical Center Institute of Virology, Ulm, Germany
| |
Collapse
|
479
|
Mostafavi S, Goldenberg A, Morris Q. Labeling nodes using three degrees of propagation. PLoS One 2012; 7:e51947. [PMID: 23284828 PMCID: PMC3532359 DOI: 10.1371/journal.pone.0051947] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 11/12/2012] [Indexed: 02/07/2023] Open
Abstract
The properties (or labels) of nodes in networks can often be predicted based on their proximity and their connections to other labeled nodes. So-called "label propagation algorithms" predict the labels of unlabeled nodes by propagating information about local label density iteratively through the network. These algorithms are fast, simple and scale to large networks but nonetheless regularly perform better than slower and much more complex algorithms on benchmark problems. We show here, however, that these algorithms have an intrinsic limitation that prevents them from adapting to some common patterns of network node labeling; we introduce a new algorithm, 3Prop, that retains all their advantages but is much more adaptive. As we show, 3Prop performs very well on node labeling problems ill-suited to label propagation, including predicting gene function in protein and genetic interaction networks and gender in friendship networks, and also performs slightly better on problems already well-suited to label propagation such as labeling blogs and patents based on their citation networks. 3Prop gains its adaptability by assigning separate weights to label information from different steps of the propagation. Surprisingly, we found that for many networks, the third iteration of label propagation receives a negative weight.
Collapse
Affiliation(s)
- Sara Mostafavi
- Department of Computer Science, Stanford University, Palo Alto, California, United States of America
| | - Anna Goldenberg
- Sick Kids Research Institute, and Department of Computer Science, University of Toronto, Toronto, Canada
| | - Quaid Morris
- Department of Molecular Genetics, Department of Computer Science, and the Donnelly Centre, University of Toronto, Toronto, Canada
| |
Collapse
|
480
|
Abstract
Modern experimental strategies often generate genome-scale measurements of human tissues or cell lines in various physiological states. Investigators often use these datasets individually to help elucidate molecular mechanisms of human diseases. Here we discuss approaches that effectively weight and integrate hundreds of heterogeneous datasets to gene-gene networks that focus on a specific process or disease. Diverse and systematic genome-scale measurements provide such approaches both a great deal of power and a number of challenges. We discuss some such challenges as well as methods to address them. We also raise important considerations for the assessment and evaluation of such approaches. When carefully applied, these integrative data-driven methods can make novel high-quality predictions that can transform our understanding of the molecular-basis of human disease.
Collapse
Affiliation(s)
- Casey S. Greene
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Olga G. Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
481
|
Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM. RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network. Genome Biol 2012; 13:R125. [PMID: 23268829 PMCID: PMC4056375 DOI: 10.1186/gb-2012-13-12-r125] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 12/26/2012] [Indexed: 01/08/2023] Open
Abstract
The growing availability of large-scale functional networks has promoted the development of many successful techniques for predicting functions of genes. Here we extend these network-based principles and techniques to functionally characterize whole sets of genes. We present RIDDLE (Reflective Diffusion and Local Extension), which uses well developed guilt-by-association principles upon a human gene network to identify associations of gene sets. RIDDLE is particularly adept at characterizing sets with no annotations, a major challenge where most traditional set analyses fail. Notably, RIDDLE found microRNA-450a to be strongly implicated in ocular diseases and development. A web application is available at http://www.functionalnet.org/RIDDLE.
Collapse
|
482
|
Liu Y, Maxwell S, Feng T, Zhu X, Elston RC, Koyutürk M, Chance MR. Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S15. [PMID: 23281810 PMCID: PMC3524014 DOI: 10.1186/1752-0509-6-s3-s15] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Background Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted. Results We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis. Conclusion We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Collapse
Affiliation(s)
- Yu Liu
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, USA
| | | | | | | | | | | | | |
Collapse
|
483
|
Zhu C, Kushwaha A, Berman K, Jegga AG. A vertex similarity-based framework to discover and rank orphan disease-related genes. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S8. [PMID: 23281592 PMCID: PMC3524320 DOI: 10.1186/1752-0509-6-s3-s8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Background A rare or orphan disease (OD) is any disease that affects a small percentage of the population. While opportunities now exist to accelerate progress toward understanding the basis for many more ODs, the prioritization of candidate genes is still a critical step for disease-gene identification. Several network-based frameworks have been developed to address this problem with varied results. Result We have developed a novel vertex similarity (VS) based parameter-free prioritizing framework to identify and rank orphan disease candidate genes. We validate our approach by using 1598 known orphan disease-causing genes (ODGs) representing 172 orphan diseases (ODs). We compare our approach with a state-of-art parameter-based approach (PageRank with Priors or PRP) and with another parameter-free method (Interconnectedness or ICN). Our results show that VS-based approach outperforms ICN and is comparable to PRP. We further apply VS-based ranking to identify and rank potential novel candidate genes for several ODs. Conclusion We demonstrate that VS-based parameter-free ranking approach can be successfully used for disease candidate gene prioritization and can complement other network-based methods for candidate disease gene ranking. Importantly, our VS-ranked top candidate genes for the ODs match the known literature, suggesting several novel causal relationships for further investigation.
Collapse
Affiliation(s)
- Cheng Zhu
- Department of Computer Science, University of Cincinnati, Cincinnati, Ohio 45229, USA
| | | | | | | |
Collapse
|
484
|
Gonçalves JP, Francisco AP, Moreau Y, Madeira SC. Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores. PLoS One 2012. [PMID: 23185389 PMCID: PMC3501465 DOI: 10.1371/journal.pone.0049634] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Collapse
Affiliation(s)
- Joana P. Gonçalves
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| | - Alexandre P. Francisco
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
| | - Yves Moreau
- Electrical Engineering Department, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Sara C. Madeira
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| |
Collapse
|
485
|
Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 2012; 30:1095-106. [PMID: 23138309 PMCID: PMC3703467 DOI: 10.1038/nbt.2422] [Citation(s) in RCA: 351] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 10/16/2012] [Indexed: 12/13/2022]
Abstract
Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has primarily focused on protein-coding variants, due to the difficulty of interpreting non-coding mutations. This picture has changed with advances in the systematic annotation of functional non-coding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs, and molecular quantitative trait loci all provide complementary information about non-coding function. These functional maps can help prioritize variants on risk haplotypes, filter mutations encountered in the clinic, and perform systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable dataset integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis, and treatment.
Collapse
Affiliation(s)
- Lucas D Ward
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| | | |
Collapse
|
486
|
Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang PI, Boutz DR, Fong V, Phanse S, Babu M, Craig SA, Hu P, Wan C, Vlasblom J, Dar VUN, Bezginov A, Clark GW, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A. A census of human soluble protein complexes. Cell 2012; 150:1068-81. [PMID: 22939629 DOI: 10.1016/j.cell.2012.08.011] [Citation(s) in RCA: 671] [Impact Index Per Article: 51.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2012] [Revised: 07/30/2012] [Accepted: 08/10/2012] [Indexed: 12/19/2022]
Abstract
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.
Collapse
Affiliation(s)
- Pierre C Havugimana
- Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
487
|
Ma X, Gao L. Biological network analysis: insights into structure and functions. Brief Funct Genomics 2012; 11:434-442. [PMID: 23184677 DOI: 10.1093/bfgp/els045] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In the past two decades, great efforts have been devoted to extract the dependence and interplay between structure and functions in biological networks because they have strong relevance to biological processes. In this article, we reviewed the recent development in the biological network analysis. In detail, we first reviewed the interactome topological properties of biological networks, the methods for structure and functional patterns.
Collapse
Affiliation(s)
- Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No. 2 South TaiBai Road, Xi'an, Shaanxi 710071, P.R. China
| | | |
Collapse
|
488
|
Park S, Yang JS, Kim J, Shin YE, Hwang J, Park J, Jang SK, Kim S. Evolutionary history of human disease genes reveals phenotypic connections and comorbidity among genetic diseases. Sci Rep 2012; 2:757. [PMID: 23091697 PMCID: PMC3477654 DOI: 10.1038/srep00757] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 10/03/2012] [Indexed: 01/02/2023] Open
Abstract
The extent to which evolutionary changes have impacted the phenotypic relationships among human diseases remains unclear. In this work, we report that phenotypically similar diseases are connected by the evolutionary constraints on human disease genes. Human disease groups can be classified into slowly or rapidly evolving classes, where the diseases in the slowly evolving class are enriched with morphological phenotypes and those in the rapidly evolving class are enriched with physiological phenotypes. Our findings establish a clear evolutionary connection between disease classes and disease phenotypes for the first time. Furthermore, the high comorbidity found between diseases connected by similar evolutionary constraints enables us to improve the predictability of the relative risk of human diseases. We find the evolutionary constraints on disease genes are a new layer of molecular connection in the network-based exploration of human diseases.
Collapse
Affiliation(s)
- Solip Park
- School of Interdisciplinary Bioscience and Bioengineering, Biotechnology Research Center, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | | | | | |
Collapse
|
489
|
Magger O, Waldman YY, Ruppin E, Sharan R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol 2012; 8:e1002690. [PMID: 23028288 PMCID: PMC3459874 DOI: 10.1371/journal.pcbi.1002690] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 07/28/2012] [Indexed: 01/07/2023] Open
Abstract
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Collapse
Affiliation(s)
- Oded Magger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | |
Collapse
|
490
|
Hu X, Daly M. What have we learned from six years of GWAS in autoimmune diseases, and what is next? Curr Opin Immunol 2012; 24:571-5. [PMID: 23017373 DOI: 10.1016/j.coi.2012.09.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Revised: 08/30/2012] [Accepted: 09/04/2012] [Indexed: 01/03/2023]
Abstract
Genome-wide association studies (GWAS) have discovered hundreds of common genetic variants that predispose humans to autoimmune diseases, opening up unprecedented potential for elucidating the pathways and processes of disease. To understand the role of these variants in susceptibility, we need to derive mechanistic insight by integration of genetic results with other biological data types and also with careful functional studies. In many cases, such studies have highlighted coherent biological processes at a high level and elucidated specific mechanisms that contribute to autoimmunity and inflammation. The understanding of the genetic component of autoimmune etiology will become more complete as fine-mapping and sequencing data become readily available. A comprehensive catalog of human immune phenotypes could provide a functional basis for assessing genetic influence on immune function and variation in response to therapeutic interventions, as well as for rationally designing new targeted therapeutics.
Collapse
Affiliation(s)
- Xinli Hu
- Harvard Medical School, Harvard-MIT Division of Health Sciences and Technology, Boston, MA 02114, USA
| | | |
Collapse
|
491
|
Guney E, Oliva B. Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization. PLoS One 2012; 7:e43557. [PMID: 23028459 PMCID: PMC3448640 DOI: 10.1371/journal.pone.0043557] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 07/23/2012] [Indexed: 11/23/2022] Open
Abstract
Complex genetic disorders often involve products of multiple genes acting cooperatively. Hence, the pathophenotype is the outcome of the perturbations in the underlying pathways, where gene products cooperate through various mechanisms such as protein-protein interactions. Pinpointing the decisive elements of such disease pathways is still challenging. Over the last years, computational approaches exploiting interaction network topology have been successfully applied to prioritize individual genes involved in diseases. Although linkage intervals provide a list of disease-gene candidates, recent genome-wide studies demonstrate that genes not associated with any known linkage interval may also contribute to the disease phenotype. Network based prioritization methods help highlighting such associations. Still, there is a need for robust methods that capture the interplay among disease-associated genes mediated by the topology of the network. Here, we propose a genome-wide network-based prioritization framework named GUILD. This framework implements four network-based disease-gene prioritization algorithms. We analyze the performance of these algorithms in dozens of disease phenotypes. The algorithms in GUILD are compared to state-of-the-art network topology based algorithms for prioritization of genes. As a proof of principle, we investigate top-ranking genes in Alzheimer's disease (AD), diabetes and AIDS using disease-gene associations from various sources. We show that GUILD is able to significantly highlight disease-gene associations that are not used a priori. Our findings suggest that GUILD helps to identify genes implicated in the pathology of human disorders independent of the loci associated with the disorders.
Collapse
Affiliation(s)
- Emre Guney
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
- * E-mail:
| |
Collapse
|
492
|
Abstract
Background Co-expression based Cancer Modules (CMs) are sets of genes that act in concert to carry out specific functions in different cancer types, and are constructed by exploiting gene expression profiles related to specific clinical conditions or expression signatures associated to specific processes altered in cancer. Unfortunately, genes involved in cancer are not always detectable using only expression signatures or co-expressed sets of genes, and in principle other types of functional interactions should be exploited to obtain a comprehensive picture of the molecular mechanisms underlying the onset and progression of cancer. Results We propose a novel semi-supervised method to rank genes with respect to CMs using networks constructed from different sources of functional information, not limited to gene expression data. It exploits on the one hand local learning strategies through score functions that extend the guilt-by-association approach, and on the other hand global learning strategies through graph kernels embedded in the score functions, able to take into account the overall topology of the network. The proposed kernelized score functions compare favorably with other state-of-the-art semi-supervised machine learning methods for gene ranking in biological networks and scales well with the number of genes, thus allowing fast processing of very large gene networks. Conclusions The modular nature of kernelized score functions provides an algorithmic scheme from which different gene ranking algorithms can be derived, and the results show that using integrated functional networks we can successfully predict CMs defined mainly through expression signatures obtained from gene expression data profiling. A preliminary analysis of top ranked "false positive" genes shows that our approach could be in perspective applied to discover novel genes involved in the onset and progression of tumors related to specific CMs.
Collapse
Affiliation(s)
- Matteo Re
- Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano MI, Italia
| | | |
Collapse
|
493
|
Emmert-Streib F, de Matos Simoes R, Tripathi S, Glazko GV, Dehmer M. A Bayesian analysis of the chromosome architecture of human disorders by integrating reductionist data. Sci Rep 2012; 2:513. [PMID: 22822426 PMCID: PMC3400933 DOI: 10.1038/srep00513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Accepted: 06/27/2012] [Indexed: 11/09/2022] Open
Abstract
In this paper, we present a Bayesian approach to estimate a chromosome and a disorder network from the Online Mendelian Inheritance in Man (OMIM) database. In contrast to other approaches, we obtain statistic rather than deterministic networks enabling a parametric control in the uncertainty of the underlying disorder-disease gene associations contained in the OMIM, on which the networks are based. From a structural investigation of the chromosome network, we identify three chromosome subgroups that reflect architectural differences in chromosome-disorder associations that are predictively exploitable for a functional analysis of diseases.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center forCancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen’s University Belfast, 97 Lisburn Road, Belfast, UK.
| | | | | | | | | |
Collapse
|
494
|
Integration of biological networks and pathways with genetic association studies. Hum Genet 2012; 131:1677-86. [PMID: 22777728 DOI: 10.1007/s00439-012-1198-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 06/27/2012] [Indexed: 12/13/2022]
Abstract
Millions of genetic variants have been assessed for their effects on the trait of interest in genome-wide association studies (GWAS). The complex traits are affected by a set of inter-related genes. However, the typical GWAS only examine the association of a single genetic variant at a time. The individual effects of a complex trait are usually small, and the simple sum of these individual effects may not reflect the holistic effect of the genetic system. High-throughput methods enable genomic studies to produce a large amount of data to expand the knowledge base of the biological systems. Biological networks and pathways are built to represent the functional or physical connectivity among genes. Integrated with GWAS data, the network- and pathway-based methods complement the approach of single genetic variant analysis, and may improve the power to identify trait-associated genes. Taking advantage of the biological knowledge, these approaches are valuable to interpret the functional role of the genetic variants, and to further understand the molecular mechanism influencing the traits. The network- and pathway-based methods have demonstrated their utilities, and will be increasingly important to address a number of challenges facing the mainstream GWAS.
Collapse
|
495
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
496
|
Magi A, Tattini L, Benelli M, Giusti B, Abbate R, Ruffo S. WNP: a novel algorithm for gene products annotation from weighted functional networks. PLoS One 2012; 7:e38767. [PMID: 22761703 PMCID: PMC3386258 DOI: 10.1371/journal.pone.0038767] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 05/13/2012] [Indexed: 02/07/2023] Open
Abstract
Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively.
Collapse
Affiliation(s)
- Alberto Magi
- Dipartimento di Area Critica Medico-Chirurgica, Università degli Studi di Firenze, Firenze, Italy.
| | | | | | | | | | | |
Collapse
|
497
|
Mtiraoui N, Turki A, Nemr R, Echtay A, Izzidi I, Al-Zaben GS, Irani-Hakime N, Keleshian SH, Mahjoub T, Almawi WY. Contribution of common variants of ENPP1, IGF2BP2, KCNJ11, MLXIPL, PPARγ, SLC30A8 and TCF7L2 to the risk of type 2 diabetes in Lebanese and Tunisian Arabs. DIABETES & METABOLISM 2012; 38:444-9. [PMID: 22749234 DOI: 10.1016/j.diabet.2012.05.002] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 05/07/2012] [Accepted: 05/07/2012] [Indexed: 12/15/2022]
Abstract
BACKGROUND While several type 2 diabetes mellitus (T2DM) susceptibility loci identified through genome-wide association studies (GWAS) have been replicated in many populations, their association in Arabs has not been reported. For this reason, the present study looked at the contribution of ENNP1 (rs1044498), IGF2BP2 (rs1470579), KCNJ11 (rs5219), MLXIPL (rs7800944), PPARγ (rs1801282), SLC30A8 (rs13266634) and TCF7L2 (rs7903146) SNPs to the risk of T2DM in Lebanese and Tunisian Arabs. METHODS Study subjects (case/controls) were Lebanese (751/918) and Tunisians (1470/838). Genotyping was carried out by the allelic discrimination method. RESULTS In Lebanese and Tunisians, neither ENNP1 nor MLXIPL was associated with T2DM, whereas TCF7L2 was significantly associated with an increased risk of T2DM in both the Lebanese [P < 0.001; OR (95% CI): 1.38 (1.20-1.59)] and Tunisians [P < 0.001; OR (95% CI): 1.36 (1.18-1.56)]. Differential associations of IGF2BP2, KCNJ11, PPARγ and SLC30A8 with T2DM were noted in the two populations. IGF2BP2 [P = 1.3 × 10(-5); OR (95% CI): 1.66 (1.42-1.94)] and PPARγ [P = 0.005; OR (95% CI): 1.41 (1.10-1.80)] were associated with T2DM in the Lebanese, but not Tunisians, while KCNJ11 [P = 8.0 × 10(-4); OR (95% CI): 1.27 (1.09-1.47)] and SLC30A8 [P = 1.6 × 10(-5); OR (95% CI): 1.37 (1.15-1.62)] were associated with T2DM in the Tunisians, but not Lebanese, after adjusting for gender and body mass index. CONCLUSION T2DM susceptibility loci SNPs identified through GWAS showed differential associations with T2DM in two Arab populations, thus further confirming the ethnic contributions of these variants to T2DM susceptibility.
Collapse
Affiliation(s)
- N Mtiraoui
- Research Unit of Biology and Genetics of Hematological and Autoimmune diseases, Faculty of Pharmacy, University of Monastir, Monastir, Tunisia
| | | | | | | | | | | | | | | | | | | |
Collapse
|
498
|
Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases. PLoS One 2012; 7:e38937. [PMID: 22719993 PMCID: PMC3375301 DOI: 10.1371/journal.pone.0038937] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 05/16/2012] [Indexed: 12/14/2022] Open
Abstract
Despite considerable progress in understanding the molecular origins of hereditary human diseases, the molecular basis of several thousand genetic diseases still remains unknown. High-throughput phenotype studies are underway to systematically assess the phenotype outcome of targeted mutations in model organisms. Thus, comparing the similarity between experimentally identified phenotypes and the phenotypes associated with human diseases can be used to suggest causal genes underlying a disease. In this manuscript, we present a method for disease gene prioritization based on comparing phenotypes of mouse models with those of human diseases. For this purpose, either human disease phenotypes are “translated” into a mouse-based representation (using the Mammalian Phenotype Ontology), or mouse phenotypes are “translated” into a human-based representation (using the Human Phenotype Ontology). We apply a measure of semantic similarity and rank experimentally identified phenotypes in mice with respect to their phenotypic similarity to human diseases. Our method is evaluated on manually curated and experimentally verified gene–disease associations for human and for mouse. We evaluate our approach using a Receiver Operating Characteristic (ROC) analysis and obtain an area under the ROC curve of up to . Furthermore, we are able to confirm previous results that the Vax1 gene is involved in Septo-Optic Dysplasia and suggest Gdf6 and Marcks as further potential candidates. Our method significantly outperforms previous phenotype-based approaches of prioritizing gene–disease associations. To enable the adaption of our method to the analysis of other phenotype data, our software and prioritization results are freely available under a BSD licence at http://code.google.com/p/phenomeblast/wiki/CAMP. Furthermore, our method has been integrated in PhenomeNET and the results can be explored using the PhenomeBrowser at http://phenomebrowser.net.
Collapse
|
499
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
500
|
Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry. Mol Syst Biol 2012; 8:573. [PMID: 22415777 PMCID: PMC3321526 DOI: 10.1038/msb.2012.5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 01/30/2012] [Indexed: 12/23/2022] Open
Abstract
Proteomic analysis of T cells emerging from quiescence identifies dynamic network-level changes in key cellular processes. Disruption of two such processes, ribosome biogenesis and RNA splicing, reveals that the programs controlling cell growth and cell-cycle entry are separable. ![]()
The authors conduct a proteomic and protein interaction network analysis of human T lymphocytes during entry into the first cell cycle. Inhibiting the induction of eIF6 (60S ribosome biogenesis) causes T cells to enter the cell cycle without growing in size. Inhibiting the induction of SF3B2/SF3B4 (U2/U12-dependent RNA splicing) allows an increase in cell size without entering the cell cycle. These results provide proof of principle that blastogenesis and proliferation programs are separable in primary human T cells.
Regulating the transition of cells such as T lymphocytes from quiescence (G0) into an activated, proliferating state involves initiation of cellular programs resulting in entry into the cell cycle (proliferation), the growth cycle (blastogenesis, cell size) and effector (functional) activation. We show the first proteomic analysis of protein interaction networks activated during entry into the first cell cycle from G0. We also provide proof of principle that blastogenesis and proliferation programs are separable in primary human T cells. We employed a proteomic profiling method to identify large-scale changes in chromatin/nuclear matrix-bound and unbound proteins in human T lymphocytes during the transition from G0 into the first cell cycle and mapped them to form functionally annotated, dynamic protein interaction networks. Inhibiting the induction of two proteins involved in two of the most significantly upregulated cellular processes, ribosome biogenesis (eIF6) and hnRNA splicing (SF3B2/SF3B4), showed, respectively, that human T cells can enter the cell cycle without growing in size, or increase in size without entering the cell cycle.
Collapse
|