51
|
Hirbo J, Eidem H, Rokas A, Abbot P. Integrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes. PLoS One 2015; 10:e0144155. [PMID: 26641094 PMCID: PMC4671692 DOI: 10.1371/journal.pone.0144155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 11/14/2015] [Indexed: 11/18/2022] Open
Abstract
Progress in understanding complex genetic diseases has been bolstered by synthetic approaches that overlay diverse data types and analyses to identify functionally important genes. Pre-term birth (PTB), a major complication of pregnancy, is a leading cause of infant mortality worldwide. A major obstacle in addressing PTB is that the mechanisms controlling parturition and birth timing remain poorly understood. Integrative approaches that overlay datasets derived from comparative genomics with function-derived ones have potential to advance our understanding of the genetics of birth timing, and thus provide insights into the genes that may contribute to PTB. We intersected data from fast evolving coding and non-coding gene regions in the human and primate lineage with data from genes expressed in the placenta, from genes that show enriched expression only in the placenta, as well as from genes that are differentially expressed in four distinct PTB clinical subtypes. A large fraction of genes that are expressed in placenta, and differentially expressed in PTB clinical subtypes (23–34%) are fast evolving, and are associated with functions that include adhesion neurodevelopmental and immune processes. Functional categories of genes that express fast evolution in coding regions differ from those linked to fast evolution in non-coding regions. Finally, there is a surprising lack of overlap between fast evolving genes that are differentially expressed in four PTB clinical subtypes. Integrative approaches, especially those that incorporate evolutionary perspectives, can be successful in identifying potential genetic contributions to complex genetic diseases, such as PTB.
Collapse
Affiliation(s)
- Jibril Hirbo
- Department of Biological Sciences, Vanderbilt University, Box 35164 Station B, Nashville, TN, 37235–1634, United States of America
| | - Haley Eidem
- Department of Biological Sciences, Vanderbilt University, Box 35164 Station B, Nashville, TN, 37235–1634, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Box 35164 Station B, Nashville, TN, 37235–1634, United States of America
| | - Patrick Abbot
- Department of Biological Sciences, Vanderbilt University, Box 35164 Station B, Nashville, TN, 37235–1634, United States of America
- * E-mail:
| |
Collapse
|
52
|
MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH. Inferring pathway dysregulation in cancers from multiple types of omic data. Genome Med 2015; 7:61. [PMID: 26170901 PMCID: PMC4499940 DOI: 10.1186/s13073-015-0189-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.
Collapse
Affiliation(s)
- Shelley M MacNeil
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
| | - William E Johnson
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
| | - Dean Y Li
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Medicine, University of Utah, Salt Lake City, UT USA
- />Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| | - Stephen R Piccolo
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
- />Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
- />Department of Biology, Brigham Young University, Provo, UT USA
| | - Andrea H Bild
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
| |
Collapse
|
53
|
Hass J, Walton E, Wright C, Beyer A, Scholz M, Turner J, Liu J, Smolka MN, Roessner V, Sponheim SR, Gollub RL, Calhoun VD, Ehrlich S. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis. Prog Neuropsychopharmacol Biol Psychiatry 2015; 59:31-39. [PMID: 25598502 PMCID: PMC4346504 DOI: 10.1016/j.pnpbp.2015.01.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 01/06/2015] [Accepted: 01/13/2015] [Indexed: 12/18/2022]
Abstract
Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia.
Collapse
Affiliation(s)
- Johanna Hass
- Translational Developmental Neuroscience Section, Department of Child and Adolescent Psychiatry, Faculty of Medicine of the TU Dresden, Dresden, Germany
| | - Esther Walton
- Translational Developmental Neuroscience Section, Department of Child and Adolescent Psychiatry, Faculty of Medicine of the TU Dresden, Dresden, Germany
| | - Carrie Wright
- Department of Neurosciences, Health Sciences Center, University of New Mexico, Albuquerque, NM, USA,The Mind Research Network, Albuquerque, NM USA
| | - Andreas Beyer
- Cellular Networks and Systems Biology, Biotechnology Center, TU Dresden, Dresden, Germany,University of Cologne, CECAD, Cologne, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Leipzig, Germany,LIFE (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes and Environment), University of Leipzig, Leipzig, Germany
| | - Jessica Turner
- The Mind Research Network, Albuquerque, NM USA,Psychology Department, University of New Mexico, Albuquerque, NM, USA
| | - Jingyu Liu
- The Mind Research Network, Albuquerque, NM USA,Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM USA
| | - Michael N. Smolka
- Department of Psychiatry, Faculty of Medicine of the TU Dresden, Dresden, Germany
| | - Veit Roessner
- Translational Developmental Neuroscience Section, Department of Child and Adolescent Psychiatry, Faculty of Medicine of the TU Dresden, Dresden, Germany
| | - Scott R. Sponheim
- Department of Psychiatry and the Center for magnetic Resonance Research, University of Minnesota, Minneapolis, MN USA
| | - Randy L. Gollub
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA USA,MGH/MIT/HMS Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA USA
| | - Vince D. Calhoun
- The Mind Research Network, Albuquerque, NM USA,Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM USA
| | - Stefan Ehrlich
- Translational Developmental Neuroscience Section, Department of Child and Adolescent Psychiatry, Faculty of Medicine, TU Dresden, Dresden, Germany; Department of Psychiatry, Massachusetts General Hospital, Boston, MA USA; MGH/MIT/HMS Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA USA.
| |
Collapse
|
54
|
Priyadarshi S, Ray CS, Biswal NC, Nayak SR, Panda KC, Desai A, Ramchander PV. Genetic association and altered gene expression of osteoprotegerin in otosclerosis patients. Ann Hum Genet 2015; 79:225-37. [PMID: 25998045 DOI: 10.1111/ahg.12118] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Accepted: 04/08/2015] [Indexed: 12/14/2022]
Abstract
Otosclerosis (OTSC) is a late-onset hearing disorder characterized by increased bone turnover in the otic capsule. Disturbed osteoprotegerin expression has been found in the otosclerotic foci which may have an important role in the pathogenesis of OTSC. To identify the genetic risk factors, we sequenced the coding region and exon-intron boundaries of the OPG gene in 254 OTSC patients and 262 controls. Sequence analysis identified five known polymorphisms c.9C>G, c.30+15C>T, c.400+4C>T, c.768A>G, and c.817+8A>C. Testing of these SNPs revealed sex specific association with c.9C>G in males and c.30+15C>T in females after multiple correction. Furthermore, meta-analysis provided evidence of association of the c.9C>G polymorphism with OTSC. In secondary analysis, we investigated the mRNA expression of OPG and associated genes RANK and RANKL in otosclerotic tissues compared to controls. Expression analysis revealed significantly missing/reduced OPG expression only in otosclerotic tissues. However, the signal sequence polymorphism c.9C>G has shown no effect on OPG mRNA expression. In conclusion, our results suggest that the risk of OTSC is influenced by variations in the OPG gene along with other factors which might regulate its altered expression in otosclerotic tissues. Further research is warranted to elucidate the mechanisms underlying these observations.
Collapse
Affiliation(s)
- Saurabh Priyadarshi
- Institute of Life Sciences, Nalco Square, Chandrasekharpur, Bhubaneswar, India
| | - Chinmay Sundar Ray
- Department of Ear, Nose, and Throat (ENT), Shrirama Chandra Bhanj (SCB) Medical College, Cuttack, India
| | - Narayan Chandra Biswal
- Department of Ear, Nose, and Throat (ENT), Shrirama Chandra Bhanj (SCB) Medical College, Cuttack, India
| | - Soumya Ranjan Nayak
- Department of Forensic Medicine & Toxicology (FMT), Shrirama Chandra Bhanj (SCB) Medical College, Cuttack, India
| | | | - Ashim Desai
- Dr. ABR Desai Ear, Nose and Throat (ENT) Clinic and Research Centre, Mumbai, India
| | | |
Collapse
|
55
|
Huang YT, Liang L, Moffatt MF, Cookson WOCM, Lin X. iGWAS: Integrative Genome-Wide Association Studies of Genetic and Genomic Data for Disease Susceptibility Using Mediation Analysis. Genet Epidemiol 2015; 39:347-56. [PMID: 25997986 DOI: 10.1002/gepi.21905] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 03/23/2015] [Accepted: 04/07/2015] [Indexed: 12/20/2022]
Abstract
Genome-wide association studies (GWAS) have been a standard practice in identifying single nucleotide polymorphisms (SNPs) for disease susceptibility. We propose a new approach, termed integrative GWAS (iGWAS) that exploits the information of gene expressions to investigate the mechanisms of the association of SNPs with a disease phenotype, and to incorporate the family-based design for genetic association studies. Specifically, the relations among SNPs, gene expression, and disease are modeled within the mediation analysis framework, which allows us to disentangle the genetic effect on a disease phenotype into two parts: an effect mediated through a gene expression (mediation effect, ME) and an effect through other biological mechanisms or environment-mediated mechanisms (alternative effect, AE). We develop omnibus tests for the ME and AE that are robust to underlying true disease models. Numerical studies show that the iGWAS approach is able to facilitate discovering genetic association mechanisms, and outperforms the SNP-only method for testing genetic associations. We conduct a family-based iGWAS of childhood asthma that integrates genetic and genomic data. The iGWAS approach identifies six novel susceptibility genes (MANEA, MRPL53, LYCAT, ST8SIA4, NDFIP1, and PTCH1) using the omnibus test with false discovery rate less than 1%, whereas no gene using SNP-only analyses survives with the same cut-off. The iGWAS analyses further characterize that genetic effects of these genes are mostly mediated through their gene expressions. In summary, the iGWAS approach provides a new analytic framework to investigate the mechanism of genetic etiology, and identifies novel susceptibility genes of childhood asthma that were biologically meaningful.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Departments of Epidemiology and Biostatistics, Brown University, Providence, Rhode Island, United States of America
| | - Liming Liang
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Miriam F Moffatt
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | | | - Xihong Lin
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
56
|
Kang M, Kim DC, Liu C, Gao J. Multiblock discriminant analysis for integrative genomic study. BIOMED RESEARCH INTERNATIONAL 2015; 2015:783592. [PMID: 26075260 PMCID: PMC4450020 DOI: 10.1155/2015/783592] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 04/21/2015] [Indexed: 12/27/2022]
Abstract
Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data "multiblock data." In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The performance of the proposed MultiDA was assessed by intensive simulation experiments, where the outstanding performance comparing the related methods was reported. As a target application, we applied MultiDA to human brain data of psychiatric disorders. The findings and gene regulatory network derived from the experiment are discussed.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Dong-Chul Kim
- Department of Computer Science, University of Texas-Pan American, Edinburg, TX 78539, USA
| | - Chunyu Liu
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA
| | - Jean Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
57
|
Yang XH, Li M, Wang B, Zhu W, Desgardin A, Onel K, de Jong J, Chen J, Chen L, Cunningham JM. Systematic computation with functional gene-sets among leukemic and hematopoietic stem cells reveals a favorable prognostic signature for acute myeloid leukemia. BMC Bioinformatics 2015; 16:97. [PMID: 25887548 PMCID: PMC4376348 DOI: 10.1186/s12859-015-0510-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 02/24/2015] [Indexed: 12/16/2022] Open
Abstract
Background Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational “gene-to-function, snapshot-to-dynamics, and biology-to-clinic” framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. Results We introduced an adjustable “soft threshold” to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). Conclusion The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about AML and other complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0510-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinan Holly Yang
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| | - Meiyi Li
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Bin Wang
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| | - Wanqi Zhu
- Laboratory Schools, The University of Chicago, Chicago, USA.
| | - Aurelie Desgardin
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| | - Kenan Onel
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| | - Jill de Jong
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| | - Jianjun Chen
- Department of Medicine, The University of Chicago, Chicago, USA.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - John M Cunningham
- Department of Pediatrics, and Comer Children's Hospital, Section of Hematology/Oncology, The University of Chicago, 900 East 57th Street, KCBD Room 5121, Chicago, Illinois, 60637, USA.
| |
Collapse
|
58
|
Taskesen E, Babaei S, Reinders MMJ, de Ridder J. Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia. BMC Bioinformatics 2015; 16 Suppl 4:S5. [PMID: 25734246 PMCID: PMC4347619 DOI: 10.1186/1471-2105-16-s4-s5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Acute Myeloid Leukemia (AML) is characterized by various cytogenetic and molecular abnormalities. Detection of these abnormalities is important in the risk-classification of patients but requires laborious experimentation. Various studies showed that gene expression profiles (GEP), and the gene signatures derived from GEP, can be used for the prediction of subtypes in AML. Similarly, successful prediction was also achieved by exploiting DNA-methylation profiles (DMP). There are, however, no studies that compared classification accuracy and performance between GEP and DMP, neither are there studies that integrated both types of data to determine whether predictive power can be improved. Approach Here, we used 344 well-characterized AML samples for which both gene expression and DNA-methylation profiles are available. We created three different classification strategies including early, late and no integration of these datasets and used them to predict AML subtypes using a logistic regression model with Lasso regularization. Results We illustrate that both gene expression and DNA-methylation profiles contain distinct patterns that contribute to discriminating AML subtypes and that an integration strategy can exploit these patterns to achieve synergy between both data types. We show that concatenation of features from both data sets, i.e. early integration, improves the predictive power compared to classifiers trained on GEP or DMP alone. A more sophisticated strategy, i.e. the late integration strategy, employs a two-layer classifier which outperforms the early integration strategy. Conclusion We demonstrate that prediction of known cytogenetic and molecular abnormalities in AML can be further improved by integrating GEP and DMP profiles.
Collapse
|
59
|
Lin D, Zhang J, Li J, He H, Deng HW, Wang YP. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression. Front Cell Dev Biol 2014; 2:62. [PMID: 25364766 PMCID: PMC4209817 DOI: 10.3389/fcell.2014.00062] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 10/01/2014] [Indexed: 01/10/2023] Open
Abstract
A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Jingyao Li
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Hao He
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| |
Collapse
|
60
|
Huang YT. Integrative modeling of multi-platform genomic data under the framework of mediation analysis. Stat Med 2014; 34:162-78. [PMID: 25316269 DOI: 10.1002/sim.6326] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 07/02/2014] [Accepted: 09/22/2014] [Indexed: 12/24/2022]
Abstract
Given the availability of genomic data, there have been emerging interests in integrating multi-platform data. Here, we propose to model genetics (single nucleotide polymorphism (SNP)), epigenetics (DNA methylation), and gene expression data as a biological process to delineate phenotypic traits under the framework of causal mediation modeling. We propose a regression model for the joint effect of SNPs, methylation, gene expression, and their nonlinear interactions on the outcome and develop a variance component score test for any arbitrary set of regression coefficients. The test statistic under the null follows a mixture of chi-square distributions, which can be approximated using a characteristic function inversion method or a perturbation procedure. We construct tests for candidate models determined by different combinations of SNPs, DNA methylation, gene expression, and interactions and further propose an omnibus test to accommodate different models. We then study three path-specific effects: the direct effect of SNPs on the outcome, the effect mediated through expression, and the effect through methylation. We characterize correspondences between the three path-specific effects and coefficients in the regression model, which are influenced by causal relations among SNPs, DNA methylation, and gene expression. We illustrate the utility of our method in two genomic studies and numerical simulation studies.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Department of Epidemiology, Brown University, 121 S. Main St., Box G-S121-2, Providence, RI, 02912, U.S.A
| |
Collapse
|
61
|
GSAASeqSP: a toolset for gene set association analysis of RNA-Seq data. Sci Rep 2014; 4:6347. [PMID: 25213199 PMCID: PMC4161965 DOI: 10.1038/srep06347] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 08/19/2014] [Indexed: 12/11/2022] Open
Abstract
RNA-Seq is quickly becoming the preferred method for comprehensively characterizing whole transcriptome activity, and the analysis of count data from RNA-Seq requires new computational tools. We developed GSAASeqSP, a novel toolset for genome-wide gene set association analysis of sequence count data. This toolset offers a variety of statistical procedures via combinations of multiple gene-level and gene set-level statistics, each having their own strengths under different sample and experimental conditions. These methods can be employed independently, or results generated from multiple or all methods can be integrated to determine more robust profiles of significantly altered biological pathways. Using simulations, we demonstrate the ability of these methods to identify association signals and to measure the strength of the association. We show that GSAASeqSP analyses of RNA-Seq data from diverse tissue samples provide meaningful insights into the biological mechanisms that differentiate these samples. GSAASeqSP is a powerful platform for investigating molecular underpinnings of complex traits and diseases arising from differential activity within the biological pathways. GSAASeqSP is available at http://gsaa.unc.edu.
Collapse
|
62
|
Pharmacogenomic characterization of gemcitabine response--a framework for data integration to enable personalized medicine. Pharmacogenet Genomics 2014; 24:81-93. [PMID: 24401833 PMCID: PMC3888473 DOI: 10.1097/fpc.0000000000000015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Supplemental Digital Content is available in the text. Objectives Response to the oncology drug gemcitabine may be variable in part due to genetic differences in the enzymes and transporters responsible for its metabolism and disposition. The aim of our in-silico study was to identify gene variants significantly associated with gemcitabine response that may help to personalize treatment in the clinic. Methods We analyzed two independent data sets: (a) genotype data from NCI-60 cell lines using the Affymetrix DMET 1.0 platform combined with gemcitabine cytotoxicity data in those cell lines, and (b) genome-wide association studies (GWAS) data from 351 pancreatic cancer patients treated on an NCI-sponsored phase III clinical trial. We also performed a subset analysis on the GWAS data set for 135 patients who were given gemcitabine+placebo. Statistical and systems biology analyses were performed on each individual data set to identify biomarkers significantly associated with gemcitabine response. Results Genetic variants in the ABC transporters (ABCC1, ABCC4) and the CYP4 family members CYP4F8 and CYP4F12, CHST3, and PPARD were found to be significant in both the NCI-60 and GWAS data sets. We report significant association between drug response and variants within members of the chondroitin sulfotransferase family (CHST) whose role in gemcitabine response is yet to be delineated. Conclusion Biomarkers identified in this integrative analysis may contribute insights into gemcitabine response variability. As genotype data become more readily available, similar studies can be conducted to gain insights into drug response mechanisms and to facilitate clinical trial design and regulatory reviews.
Collapse
|
63
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 721] [Impact Index Per Article: 65.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
64
|
Structure and function of BCRP, a broad specificity transporter of xenobiotics and endobiotics. Arch Toxicol 2014; 88:1205-48. [PMID: 24777822 DOI: 10.1007/s00204-014-1224-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 03/06/2014] [Indexed: 12/20/2022]
|
65
|
Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014; 362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]
Abstract
Network modeling has proven to be a fundamental tool in analyzing the inner workings of a cell. It has revolutionized our understanding of biological processes and made significant contributions to the discovery of disease biomarkers. Much effort has been devoted to reconstruct various types of biochemical networks using functional genomic datasets generated by high-throughput technologies. This paper discusses statistical methods used to reconstruct gene regulatory networks using gene expression data. In particular, we highlight progress made and challenges yet to be met in the problems involved in estimating gene interactions, inferring causality and modeling temporal changes of regulation behaviors. As rapid advances in technologies have made available diverse, large-scale genomic data, we also survey methods of incorporating all these additional data to achieve better, more accurate inference of gene networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
66
|
Huang YT. Integrative modeling of multiple genomic data from different types of genetic association studies. Biostatistics 2014; 15:587-602. [PMID: 24705142 DOI: 10.1093/biostatistics/kxu014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Genome-wide association studies (GWASs) and expression-/methylation-quantitative trait loci (eQTL/mQTL) studies constitute popular approaches for investigating the association of single nucleotide polymorphisms (SNPs) with disease and expression/methylation, respectively. Here, we propose to integrate QTL studies to more powerfully test the SNP effect on disease in GWASs when they are conducted among different subjects. We propose a model for the joint effect of SNPs, methylation, and gene expression on disease risk and obtain the marginal model for SNPs by integrating out methylation and expression. We characterize all possible causal relations among SNPs, methylation, and expression and study the corresponding null hypotheses of no SNP effect in terms of the regression coefficients in the joint model. We develop a score test for variance components of regression coefficients to evaluate the genetic effect. We further propose an omnibus test to accommodate different models. We illustrate the utility of the proposed method in an asthma GWAS study, a brain tumor study, and numerical simulations.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Department of Epidemiology, Brown University, Providence, RI 02912, USA
| |
Collapse
|
67
|
Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum Genet 2014; 133:125-38. [PMID: 24122152 PMCID: PMC3943795 DOI: 10.1007/s00439-013-1377-1] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Accepted: 10/03/2013] [Indexed: 01/24/2023]
Abstract
Genome-wide association studies (GWAS) have rapidly become a powerful tool in genetic studies of complex diseases and traits. Traditionally, single marker-based tests have been used prevalently in GWAS and have uncovered tens of thousands of disease-associated SNPs. Network-assisted analysis (NAA) of GWAS data is an emerging area in which network-related approaches are developed and utilized to perform advanced analyses of GWAS data in order to study various human diseases or traits. Progress has been made in both methodology development and applications of NAA in GWAS data, and it has already been demonstrated that NAA results may enhance our interpretation and prioritization of candidate genes and markers. Inspired by the strong interest in and high demand for advanced GWAS data analysis, in this review article, we discuss the methodologies and strategies that have been reported for the NAA of GWAS data. Many NAA approaches search for subnetworks and assess the combined effects of multiple genes participating in the resultant subnetworks through a gene set analysis. With no restriction to pre-defined canonical pathways, NAA has the advantage of defining subnetworks with the guidance of the GWAS data under investigation. In addition, some NAA methods prioritize genes from GWAS data based on their interconnections in the reference network. Here, we summarize NAA applications to various diseases and discuss the available options and potential caveats related to their practical usage. Additionally, we provide perspectives regarding this rapidly growing research area.
Collapse
|
68
|
Hu J, Tzeng JY. Integrative gene set analysis of multi-platform data with sample heterogeneity. ACTA ACUST UNITED AC 2014; 30:1501-7. [PMID: 24489370 DOI: 10.1093/bioinformatics/btu060] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset. RESULTS We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies. AVAILABILITY AND IMPLEMENTATION http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/
Collapse
Affiliation(s)
- Jun Hu
- Bioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, Taiwan
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, Taiwan
| |
Collapse
|
69
|
Ferguson AA, Roy S, Kormanik KN, Kim Y, Dumas KJ, Ritov VB, Matern D, Hu PJ, Fisher AL. TATN-1 mutations reveal a novel role for tyrosine as a metabolic signal that influences developmental decisions and longevity in Caenorhabditis elegans. PLoS Genet 2013; 9:e1004020. [PMID: 24385923 PMCID: PMC3868569 DOI: 10.1371/journal.pgen.1004020] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 10/28/2013] [Indexed: 11/18/2022] Open
Abstract
Recent work has identified changes in the metabolism of the aromatic amino acid tyrosine as a risk factor for diabetes and a contributor to the development of liver cancer. While these findings could suggest a role for tyrosine as a direct regulator of the behavior of cells and tissues, evidence for this model is currently lacking. Through the use of RNAi and genetic mutants, we identify tatn-1, which is the worm ortholog of tyrosine aminotransferase and catalyzes the first step of the conserved tyrosine degradation pathway, as a novel regulator of the dauer decision and modulator of the daf-2 insulin/IGF-1-like (IGFR) signaling pathway in Caenorhabditis elegans. Mutations affecting tatn-1 elevate tyrosine levels in the animal, and enhance the effects of mutations in genes that lie within the daf-2/insulin signaling pathway or are otherwise upstream of daf-16/FOXO on both dauer formation and worm longevity. These effects are mediated by elevated tyrosine levels as supplemental dietary tyrosine mimics the phenotypes produced by a tatn-1 mutation, and the effects still occur when the enzymes needed to convert tyrosine into catecholamine neurotransmitters are missing. The effects on dauer formation and lifespan require the aak-2/AMPK gene, and tatn-1 mutations increase phospho-AAK-2 levels. In contrast, the daf-16/FOXO transcription factor is only partially required for the effects on dauer formation and not required for increased longevity. We also find that the controlled metabolism of tyrosine by tatn-1 may function normally in dauer formation because the expression of the TATN-1 protein is regulated both by daf-2/IGFR signaling and also by the same dietary and environmental cues which influence dauer formation. Our findings point to a novel role for tyrosine as a developmental regulator and modulator of longevity, and support a model where elevated tyrosine levels play a causal role in the development of diabetes and cancer in people.
Collapse
Affiliation(s)
- Annabel A. Ferguson
- Division of Geriatric Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Sudipa Roy
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
- Center for Healthy Aging, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
| | - Kaitlyn N. Kormanik
- Division of Geriatric Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Yongsoon Kim
- Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kathleen J. Dumas
- Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Vladimir B. Ritov
- Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Dietrich Matern
- Biochemical Genetics Laboratory, Department of Laboratory Medicine and Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America
| | - Patrick J. Hu
- Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
- Departments of Internal Medicine and Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Alfred L. Fisher
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
- Center for Healthy Aging, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
- GRECC, South Texas VA Health Care System, San Antonio, Texas, United States of America
- * E-mail:
| |
Collapse
|
70
|
Abstract
The recent availability of genomic data has spurred many genome-wide studies of human adaptation in different populations worldwide. Such studies have provided insights into novel candidate genes and pathways that are putatively involved in adaptation to different environments, diets and disease prevalence. However, much work is needed to translate these results into candidate adaptive variants that are biologically interpretable. In this Review, we discuss methods that may help to identify true biological signals of selection and studies that incorporate complementary phenotypic and functional data. We conclude with recommendations for future studies that focus on opportunities to use integrative genomics methodologies in human adaptation studies.
Collapse
|
71
|
Kang M, Zhang B, Wu X, Liu C, Gao J. Sparse generalized canonical correlation analysis for biological model integration: a genetic study of psychiatric disorders. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:1490-3. [PMID: 24109981 DOI: 10.1109/embc.2013.6609794] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the post-genomic era, unveiling causal traits in the complex mechanisms that involve a number of diseases has been highlighted as one of the key goals. Much research has recently suggested integrative approaches of both genomewide association studies (GWAS) and gene expression profiling-based studies provide greater insight of the mechanism than utilizing only one. In this paper, we propose a novel method, sparse generalized canonical correlation analysis (SGCCA), to integrate multiple biological data such as genetic markers, gene expressions, and disease phenotypes. The proposed method provides a powerful approach to comprehensively analyze complex biological mechanism while utilizing the multiple data simultaneously. The new method is also designed to identify a few of the elements significantly involved in the system among a large number of elements within the variable sets. The advantage of the method as well lies in the output of easily interpretable solutions. To verify the performance of SGCCA, we performed experiments with simulation data and human brain data of psychiatric diseases. Its capability to detect significant elements of the sets and the relations of the complex system is assessed.
Collapse
|
72
|
Couto Alves A, Bruhn S, Ramasamy A, Wang H, Holloway JW, Hartikainen AL, Jarvelin MR, Benson M, Balding DJ, Coin LJM. Dysregulation of complement system and CD4+ T cell activation pathways implicated in allergic response. PLoS One 2013; 8:e74821. [PMID: 24116013 PMCID: PMC3792967 DOI: 10.1371/journal.pone.0074821] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 08/06/2013] [Indexed: 11/18/2022] Open
Abstract
Allergy is a complex disease that is likely to involve dysregulated CD4+ T cell activation. Here we propose a novel methodology to gain insight into how coordinated behaviour emerges between disease-dysregulated pathways in response to pathophysiological stimuli. Using peripheral blood mononuclear cells of allergic rhinitis patients and controls cultured with and without pollen allergens, we integrate CD4+ T cell gene expression from microarray data and genetic markers of allergic sensitisation from GWAS data at the pathway level using enrichment analysis; implicating the complement system in both cellular and systemic response to pollen allergens. We delineate a novel disease network linking T cell activation to the complement system that is significantly enriched for genes exhibiting correlated gene expression and protein-protein interactions, suggesting a tight biological coordination that is dysregulated in the disease state in response to pollen allergen but not to diluent. This novel disease network has high predictive power for the gene and protein expression of the Th2 cytokine profile (IL-4, IL-5, IL-10, IL-13) and of the Th2 master regulator (GATA3), suggesting its involvement in the early stages of CD4+ T cell differentiation. Dissection of the complement system gene expression identifies 7 genes specifically associated with atopic response to pollen, including C1QR1, CFD, CFP, ITGB2, ITGAX and confirms the role of C3AR1 and C5AR1. Two of these genes (ITGB2 and C3AR1) are also implicated in the network linking complement system to T cell activation, which comprises 6 differentially expressed genes. C3AR1 is also significantly associated with allergic sensitisation in GWAS data.
Collapse
MESH Headings
- Allergens/pharmacology
- CD4-Positive T-Lymphocytes/drug effects
- CD4-Positive T-Lymphocytes/immunology
- CD4-Positive T-Lymphocytes/metabolism
- Cell Differentiation/drug effects
- Cell Differentiation/genetics
- Cytokines/genetics
- Cytokines/metabolism
- GATA3 Transcription Factor/genetics
- GATA3 Transcription Factor/metabolism
- Gene Expression Profiling
- Humans
- Leukocytes, Mononuclear/drug effects
- Leukocytes, Mononuclear/immunology
- Leukocytes, Mononuclear/metabolism
- Lymphocyte Activation/drug effects
- Lymphocyte Activation/genetics
- Lymphocyte Activation/immunology
- Pollen
- Receptors, Complement/genetics
- Receptors, Complement/metabolism
- Rhinitis, Allergic, Seasonal/genetics
- Rhinitis, Allergic, Seasonal/immunology
- Rhinitis, Allergic, Seasonal/metabolism
Collapse
Affiliation(s)
- Alexessander Couto Alves
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
| | - Sören Bruhn
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Adaikalavan Ramasamy
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Hui Wang
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
- Dept of Paediatrics, Gothenburg University, Gothenburg, Sweden
| | - John W. Holloway
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Anna-Liisa Hartikainen
- Department of Clinical Sciences, Obstetrics and Gynecology, Institute of Clinical Medicine, University of Oulu, Oulu, Finland
| | - Marjo-Riitta Jarvelin
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Institute of Health Sciences, University of Oulu, and Unit of General Practice, University Hospital of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
- National Institute of Health and Welfare, Oulu, Finland
| | - Mikael Benson
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - David J. Balding
- Department of Epidemiology and Biostatistics, Imperial College London, MRC-HPA Centre for Environment and Health, Imperial College London, London, United Kingdom
- Genetics Institute, University College London, United Kingdom
| | - Lachlan J. M. Coin
- Department of Genomics of Common Diseases, School of Public Health, Imperial College London, London, United Kingdom
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
| |
Collapse
|
73
|
Abstract
High throughput technologies have been applied to investigate the underlying mechanisms of complex diseases, identify disease-associations and help to improve treatment. However it is challenging to derive biological insight from conventional single gene based analysis of "omics" data from high throughput experiments due to sample and patient heterogeneity. To address these challenges, many novel pathway and network based approaches were developed to integrate various "omics" data, such as gene expression, copy number alteration, Genome Wide Association Studies, and interaction data. This review will cover recent methodological developments in pathway analysis for the detection of dysregulated interactions and disease-associated subnetworks, prioritization of candidate disease genes, and disease classifications. For each application, we will also discuss the associated challenges and potential future directions.
Collapse
|
74
|
Guzzi PH, Cannataro M. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 111:402-409. [PMID: 23731720 DOI: 10.1016/j.cmpb.2013.04.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Revised: 03/14/2013] [Accepted: 04/11/2013] [Indexed: 06/02/2023]
Abstract
A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power Tools), (ii) the manual loading of preprocessing libraries, and (iii) the management of intermediate files, such as results and metadata. Micro-Analyzer users can directly manage Affymetrix binary data without worrying about locating and invoking the proper preprocessing tools and chip-specific libraries. Moreover, users of the Micro-Analyzer tool can load the preprocessed data directly into the well-known TM4 platform, extending in such a way also the TM4 capabilities. Consequently, Micro Analyzer offers the following advantages: (i) it reduces possible errors in the preprocessing and further analysis phases, e.g. due to the incorrect choice of parameters or due to the use of old libraries, (ii) it enables the combined and centralized pre-processing of different arrays, (iii) it may enhance the quality of further analysis by storing the workflow, i.e. information about the preprocessing steps, and (iv) finally Micro-Analzyer is freely available as a standalone application at the project web site http://sourceforge.net/projects/microanalyzer/.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Bioinformatics Laboratory, Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, Italy.
| | | |
Collapse
|
75
|
Halldórsson BV, Sharan R. Network-based interpretation of genomic variation data. J Mol Biol 2013; 425:3964-9. [PMID: 23886866 DOI: 10.1016/j.jmb.2013.07.026] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 07/02/2013] [Accepted: 07/16/2013] [Indexed: 02/02/2023]
Abstract
Advances in sequencing technologies are allowing genome-wide association studies at an ever-growing scale. The interpretation of these studies requires dealing with statistical and combinatorial challenges, owing to the multi-factorial nature of human diseases and the huge space of genomic markers that are being monitored. Recently, it was proposed that using protein-protein interaction network information could help in tackling these challenges by restricting attention to markers or combinations of markers that map to close proteins in the network. In this review, we survey techniques for integrating genomic variation data with network information to improve our understanding of complex diseases and reveal meaningful associations.
Collapse
Affiliation(s)
- Bjarni V Halldórsson
- School of Science and Engineering, Reykjavík University, 101 Reykjavík, Iceland.
| | | |
Collapse
|
76
|
Runcie DE, Mukherjee S. Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices. Genetics 2013; 194:753-67. [PMID: 23636737 PMCID: PMC3697978 DOI: 10.1534/genetics.113.151217] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 04/17/2013] [Indexed: 01/29/2023] Open
Abstract
Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.
Collapse
Affiliation(s)
- Daniel E Runcie
- Department of Biology, Duke University, Durham, North Carolina 27708, USA.
| | | |
Collapse
|
77
|
Li L, Kabesch M, Bouzigon E, Demenais F, Farrall M, Moffatt MF, Lin X, Liang L. Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma. Front Genet 2013; 4:103. [PMID: 23755072 PMCID: PMC3668139 DOI: 10.3389/fgene.2013.00103] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 05/21/2013] [Indexed: 12/20/2022] Open
Abstract
Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini-Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.
Collapse
Affiliation(s)
- Lin Li
- Department of Biostatistics, Harvard School of Public HealthBoston, MA, USA
| | - Michael Kabesch
- Department of Pediatric Pneumology and Allergy, KUNO University Children's Hospital RegensburgRegensburg, Germany
| | - Emmanuelle Bouzigon
- INSERM, Genetic Variation and Human Diseases Unit, U946Paris, France
- Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Université Paris DiderotParis, France
| | - Florence Demenais
- INSERM, Genetic Variation and Human Diseases Unit, U946Paris, France
- Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Université Paris DiderotParis, France
| | | | - Miriam F. Moffatt
- Molecular Genetics and Genomics Section, National Heart and Lung Institute, Imperial College LondonLondon, UK
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public HealthBoston, MA, USA
| | - Liming Liang
- Department of Biostatistics, Harvard School of Public HealthBoston, MA, USA
- Department of Epidemiology, Harvard School of Public HealthBoston, MA, USA
| |
Collapse
|
78
|
Neurobiology meets genomic science: the promise of human-induced pluripotent stem cells. Dev Psychopathol 2013; 24:1443-51. [PMID: 23062309 DOI: 10.1017/s095457941200082x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The recent introduction of the induced pluripotent stem cell technology has made possible the derivation of neuronal cells from somatic cells obtained from human individuals. This in turn has opened new areas of investigation that can potentially bridge the gap between neuroscience and psychopathology. For the first time we can study the cell biology and genetics of neurons derived from any individual. Furthermore, by recapitulating in vitro the developmental steps whereby stem cells give rise to neuronal cells, we can now hope to understand factors that control typical and atypical development. We can begin to explore how human genes and their variants are transcribed into messenger RNAs within developing neurons and how these gene transcripts control the biology of developing cells. Thus, human-induced pluripotent stem cells have the potential to uncover not only what aspects of development are uniquely human but also variations in the series of events necessary for normal human brain development that predispose to psychopathology.
Collapse
|
79
|
A predictive framework for integrating disparate genomic data types using sample-specific gene set enrichment analysis and multi-task learning. PLoS One 2012; 7:e44635. [PMID: 23028573 PMCID: PMC3441565 DOI: 10.1371/journal.pone.0044635] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 08/06/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.
Collapse
|
80
|
Affiliation(s)
- Gordon B Mills
- The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| |
Collapse
|