1
|
Iino H, Kizaki H, Imai S, Hori S. Identifying the Relative Importance of Factors Influencing Medication Compliance in General Patients Using Regularized Logistic Regression and LightGBM: Web-Based Survey Analysis. JMIR Form Res 2024; 8:e65882. [PMID: 39715551 PMCID: PMC11704655 DOI: 10.2196/65882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 11/11/2024] [Accepted: 12/04/2024] [Indexed: 12/25/2024] Open
Abstract
BACKGROUND Medication compliance, which refers to the extent to which patients correctly adhere to prescribed regimens, is influenced by various psychological, behavioral, and demographic factors. When analyzing these factors, challenges such as multicollinearity and variable selection often arise, complicating the interpretation of results. To address the issue of multicollinearity and better analyze the importance of each factor, machine learning methods are considered to be useful. OBJECTIVE This study aimed to identify key factors influencing medication compliance by applying regularized logistic regression and LightGBM. METHODS A questionnaire survey was conducted among 638 adult patients in Japan who had been continuously taking medications for at least 3 months. The survey collected data on demographics, medication habits, psychological adherence factors, and compliance. Logistic regression with regularization was used to handle multicollinearity, while LightGBM was used to calculate feature importance. RESULTS The regularized logistic regression model identified significant predictors, including "using the drug at approximately the same time each day" (coefficient 0.479; P=.02), "taking meals at approximately the same time each day" (coefficient 0.407; P=.02), and "I would like to have my medication reduced" (coefficient -0.410; P=.01). The top 5 variables with the highest feature importance scores in the LightGBM results were "Age" (feature importance 179.1), "Using the drug at approximately the same time each day" (feature importance 148.4), "Taking meals at approximately the same time each day" (feature importance 109.0), "I would like to have my medication reduced" (feature importance 77.48), and "I think I want to take my medicine" (feature importance 70.85). Additionally, the feature importance scores for the groups of medication adherence-related factors were 77.92 for lifestyle-related items, 52.04 for awareness of medication, 20.30 for relationships with health care professionals, and 5.05 for others. CONCLUSIONS The most significant factors for medication compliance were the consistency of medication and meal timing (mean of feature importance), followed by the number of medications and patient attitudes toward their treatment. This study is the first to use a machine learning model to calculate and compare the relative importance of factors affecting medication adherence. Our findings demonstrate that, in terms of relative importance, lifestyle habits are the most significant contributors to medication compliance among the general patient population. The findings suggest that regularization and machine learning methods, such as LightGBM, are useful for better understanding the numerous adherence factors affected by multicollinearity.
Collapse
Affiliation(s)
- Haru Iino
- Division of Drug Informatics, Faculty of Pharmacy and Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| | - Hayato Kizaki
- Division of Drug Informatics, Faculty of Pharmacy and Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| | - Shungo Imai
- Division of Drug Informatics, Faculty of Pharmacy and Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| | - Satoko Hori
- Division of Drug Informatics, Faculty of Pharmacy and Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| |
Collapse
|
2
|
Hajiaghabozorgi M, Fischbach M, Albrecht M, Wang W, Myers CL. BridGE: a pathway-based analysis tool for detecting genetic interactions from GWAS. Nat Protoc 2024; 19:1400-1435. [PMID: 38514837 PMCID: PMC11311251 DOI: 10.1038/s41596-024-00954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 11/22/2023] [Indexed: 03/23/2024]
Abstract
Genetic interactions have the potential to modulate phenotypes, including human disease. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions; however, traditional methods for identifying them, which tend to focus on testing individual variant pairs, lack statistical power. In this protocol, we describe a novel computational approach, called Bridging Gene sets with Epistasis (BridGE), for discovering genetic interactions between biological pathways from GWAS data. We present a Python-based implementation of BridGE along with instructions for its application to a typical human GWAS cohort. The major stages include initial data processing and quality control, construction of a variant-level genetic interaction network, measurement of pathway-level genetic interactions, evaluation of statistical significance using sample permutations and generation of results in a standardized output format. The BridGE software pipeline includes options for running the analysis on multiple cores and multiple nodes for users who have access to computing clusters or a cloud computing environment. In a cluster computing environment with 10 nodes and 100 GB of memory per node, the method can be run in less than 24 h for typical human GWAS cohorts. Using BridGE requires knowledge of running Python programs and basic shell script programming experience.
Collapse
Affiliation(s)
- Mehrad Hajiaghabozorgi
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Mathew Fischbach
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA
| | - Michael Albrecht
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
3
|
Wei PL, Huang CY, Chang TC, Lin JC, Lee CC, Prince GMSH, Makondi PT, Chui AWY, Chang YJ. PCTAIRE Protein Kinase 1 (PCTK1) Suppresses Proliferation, Stemness, and Chemoresistance in Colorectal Cancer through the BMPR1B-Smad1/5/8 Signaling Pathway. Int J Mol Sci 2023; 24:10008. [PMID: 37373155 DOI: 10.3390/ijms241210008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/02/2023] [Accepted: 06/07/2023] [Indexed: 06/29/2023] Open
Abstract
Colorectal cancer (CRC) is the third most common cancer and a leading cause of cancer-related mortality worldwide. Even with advances in therapy, CRC mortality remains high. Therefore, there is an urgent need to develop effective therapeutics for CRC. PCTAIRE protein kinase 1 (PCTK1) is an atypical member of the cyclin-dependent kinase (CDK) family, and the function of PCTK1 in CRC is poorly understood. In this study, we found that patients with elevated PCTK1 levels had a better overall survival rate in CRC based on the TCGA dataset. Functional analysis also showed that PCTK1 suppressed cancer stemness and cell proliferation by using PCTK1 knockdown (PCTK1-KD) or knockout (PCTK1-KO) and PCTK1 overexpression (PCTK1-over) CRC cell lines. Furthermore, overexpression of PCTK1 decreased xenograft tumor growth and knockout of PCTK1 significantly increased in vivo tumor growth. Moreover, knockout of PCTK1 was observed to increase the resistance of CRC cells to both irinotecan (CPT-11) alone and in combination with 5-fluorouracil (5-FU). Additionally, the fold change of the anti-apoptotic molecules (Bcl-2 and Bcl-xL) and the proapoptotic molecules (Bax, c-PARP, p53, and c-caspase3) was reflected in the chemoresistance of PCTK1-KO CRC cells. PCTK1 signaling in the regulation of cancer progression and chemoresponse was analyzed using RNA sequencing and gene set enrichment analysis (GSEA). Furthermore, PCTK1 and Bone Morphogenetic Protein Receptor Type 1B (BMPR1B) in CRC tumors were negatively correlated in CRC patients from the Timer2.0 and cBioPortal database. We also found that BMPR1B was negatively correlated with PCTK1 in CRC cells, and BMPR1B expression was upregulated in PCTK1-KO cells and xenograft tumor tissues. Finally, BMPR1B-KD partially reversed cell proliferation, cancer stemness, and chemoresistance in PCTK1-KO cells. Moreover, the nuclear translocation of Smad1/5/8, a downstream molecule of BMPR1B, was increased in PCTK1-KO cells. Pharmacological inhibition of Smad1/5/8 also suppressed the malignant progression of CRC. Taken together, our results indicated that PCTK1 suppresses proliferation and cancer stemness and increases the chemoresponse of CRC through the BMPR1B-Smad1/5/8 signaling pathway.
Collapse
Affiliation(s)
- Po-Li Wei
- Division of Colorectal Surgery, Department of Surgery, Taipei Medical University Hospital, Taipei Medical University, Taipei 11031, Taiwan
- Department of Surgery, College of Medicine, School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
- Cancer Research Center and Translational Laboratory, Department of Medical Research, Taipei Medical University Hospital, Taipei Medical University, Taipei 11031, Taiwan
- Graduate Institute of Cancer Biology and Drug Discovery, Taipei Medical University, Taipei 11031, Taiwan
| | - Chien-Yu Huang
- School of Medicine, National Tsing Hua University, Hsinchu 30013, Taiwan
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu 30013, Taiwan
- Department of Pathology, Wan Fang Hospital, Taipei Medical University, Taipei 11696, Taiwan
| | - Tung-Cheng Chang
- Department of Surgery, College of Medicine, School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
- Division of Colon and Rectal, Department of Surgery, Shuang Ho Hospital, Taipei Medical University, Taipei 11031, Taiwan
| | - Jang-Chun Lin
- Department of Radiotherapy and Oncology, Shuang Ho Hospital, Taipei Medical University, Taipei 11031, Taiwan
| | - Cheng-Chin Lee
- Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - G M Shazzad Hossain Prince
- Department of Surgery, College of Medicine, School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | | | | | - Yu-Jia Chang
- Cancer Research Center and Translational Laboratory, Department of Medical Research, Taipei Medical University Hospital, Taipei Medical University, Taipei 11031, Taiwan
- Department of Pathology, Wan Fang Hospital, Taipei Medical University, Taipei 11696, Taiwan
- Graduate Institute of Clinical Medicines, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
- Cell Physiology and Molecular Image Research Center, Wan Fang Hospital, Taipei Medical University, Taipei 11031, Taiwan
| |
Collapse
|
4
|
Bhandari P, Kim J, Lee TG. Genetic architecture of fresh-market tomato yield. BMC PLANT BIOLOGY 2023; 23:18. [PMID: 36624387 PMCID: PMC9827693 DOI: 10.1186/s12870-022-04018-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND The fresh-market tomato (Solanum lycopersicum) is bred for direct consumption and is selected for a high yield of large fruits. To understand the genetic variations (distinct types of DNA sequence polymorphism) that influence the yield, we collected the phenotypic variations in the yields of total fruit, extra-large-sized fruit, small-sized fruit, or red-colored fruit from 68 core inbred contemporary U.S. fresh-market tomatoes for three consecutive years and the genomic information in 8,289,741 single nucleotide polymorphism (SNP) positions from the whole-genome resequencing of these tomatoes. RESULTS Genome-wide association (GWA) mapping using the SNP data with or without SNP filtering steps using the regularization methods, validated with quantitative trait loci (QTL) linkage mapping, identified 18 significant association signals for traits evaluated. Among them, 10 of which were not located within genomic regions previously identified as being associated with fruit size/shape. When mapping-driven association signals [558 SNPs associated with 28 yield (component) traits] were used to calculate genomic estimated breeding values (GEBVs) of evaluated traits, the prediction accuracies of the extra-large-sized fruit and small-sized fruit yields were higher than those of the total and red-colored fruit yields, as we tested the generated breeding values in inbred tomatoes and F2 populations. Improved accuracy in GEBV calculation of evaluated traits was achieved by using 364 SNPs identified using the regularization methods. CONCLUSIONS Together, these results provide an understanding of the genetic variations underlying the heritable phenotypic variability in yield in contemporary tomato breeding and the information necessary for improving such economically important and complex quantitative trait through breeding.
Collapse
Affiliation(s)
- Prashant Bhandari
- Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA
| | - Juhee Kim
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, 33598, USA
| | - Tong Geon Lee
- Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA.
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, 33598, USA.
- Plant Breeders Working Group, University of Florida, Gainesville, FL, 32611, USA.
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL, 32611, USA.
- Bayer, Chesterfield, MO, 63017, USA.
| |
Collapse
|
5
|
Deng Y, He Y, Xu G, Pan W. Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling. Biometrics 2022; 78:261-273. [PMID: 33215683 PMCID: PMC8134502 DOI: 10.1111/biom.13407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/30/2020] [Accepted: 10/29/2020] [Indexed: 12/21/2022]
Abstract
A central but challenging problem in genetic studies is to test for (usually weak) associations between a complex trait (e.g., a disease status) and sets of multiple genetic variants. Due to the lack of a uniformly most powerful test, data-adaptive tests, such as the adaptive sum of powered score (aSPU) test, are advantageous in maintaining high power against a wide range of alternatives. However, there is often no closed-form to accurately and analytically calculate the p-values of many adaptive tests like aSPU, thus Monte Carlo (MC) simulations are often used, which can be time consuming to achieve a stringent significance level (e.g., 5e-8) used in genome-wide association studies (GWAS). To estimate such a small p-value, we need a huge number of MC simulations (e.g., 1e+10). As an alternative, we propose using importance sampling to speed up such calculations. We develop some theory to motivate a proposed algorithm for the aSPU test, and show that the proposed method is computationally more efficient than the standard MC simulations. Using both simulated and real data, we demonstrate the superior performance of the new method over the standard MC simulations.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| | - Yinqiu He
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Corresponding author:
| |
Collapse
|
6
|
Liu W, Li M, Zhang W, Zhou G, Wu X, Wang J, Lu Q, Zhao H. Leveraging functional annotation to identify genes associated with complex diseases. PLoS Comput Biol 2020; 16:e1008315. [PMID: 33137096 PMCID: PMC7660930 DOI: 10.1371/journal.pcbi.1008315] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 11/12/2020] [Accepted: 09/05/2020] [Indexed: 02/06/2023] Open
Abstract
To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7% to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer's disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.
Collapse
Affiliation(s)
- Wei Liu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Wenfeng Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States of America
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WI, United States of America
- Department of Statistics, University of Wisconsin-Madison, WI, United States of America
- Center for Demography of Health and Aging, University of Wisconsin-Madison, WI, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, CT, United States of America
| |
Collapse
|
7
|
Zhang L, Papachristou C, Choudhary PK, Biswas S. A Bayesian Hierarchical Framework for Pathway Analysis in Genome-Wide Association Studies. Hum Hered 2020; 84:240-255. [PMID: 32966977 DOI: 10.1159/000508664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 05/14/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Pathway analysis allows joint consideration of multiple SNPs belonging to multiple genes, which in turn belong to a biologically defined pathway. This type of analysis is usually more powerful than single-SNP analyses for detecting joint effects of variants in a pathway. METHODS We develop a Bayesian hierarchical model by fully modeling the 3-level hierarchy, namely, SNP-gene-pathway that is naturally inherent in the structure of the pathways, unlike the currently used ad hoc ways of combining such information. We model the effects at each level conditional on the effects of the levels preceding them within the generalized linear model framework. To deal with the high dimensionality, we regularize the regression coefficients through an appropriate choice of priors. The model is fit using a combination of iteratively weighted least squares and expectation-maximization algorithms to estimate the posterior modes and their standard errors. A normal approximation is used for inference. RESULTS We conduct simulations to study the proposed method and find that our method has higher power than some standard approaches in several settings for identifying pathways with multiple modest-sized variants. We illustrate the method by analyzing data from two genome-wide association studies on breast and renal cancers. CONCLUSION Our method can be helpful in detecting pathway association.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | | | - Pankaj K Choudhary
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
8
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchical variable selection prior for pathway-based GWAS using summary statistics. Stat Med 2019; 39:724-739. [PMID: 31777110 DOI: 10.1002/sim.8442] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/27/2019] [Accepted: 11/10/2019] [Indexed: 12/23/2022]
Abstract
While genome-wide association studies (GWASs) have been widely used to uncover associations between diseases and genetic variants, standard SNP-level GWASs often lack the power to identify SNPs that individually have a moderate effect size but jointly contribute to the disease. To overcome this problem, pathway-based GWASs methods have been developed as an alternative strategy that complements SNP-level approaches. We propose a Bayesian method that uses the generalized fused hierarchical structured variable selection prior to identify pathways associated with the disease using SNP-level summary statistics. Our prior has the flexibility to take in pathway structural information so that it can model the gene-level correlation based on prior biological knowledge, an important feature that makes it appealing compared to existing pathway-based methods. Using simulations, we show that our method outperforms competing methods in various scenarios, particularly when we have pathway structural information that involves complex gene-gene interactions. We apply our method to the Wellcome Trust Case Control Consortium Crohn's disease GWAS data, demonstrating its practical application to real data.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Lin Zhang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
9
|
Mora A. Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data. Brief Bioinform 2019; 21:1495-1508. [DOI: 10.1093/bib/bbz090] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 05/30/2019] [Accepted: 06/28/2019] [Indexed: 12/31/2022] Open
Abstract
Abstract
Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences
| |
Collapse
|
10
|
Discovering genetic interactions bridging pathways in genome-wide association studies. Nat Commun 2019; 10:4274. [PMID: 31537791 PMCID: PMC6753138 DOI: 10.1038/s41467-019-12131-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, a global genetic network mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discover significant interactions in Parkinson's disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.
Collapse
|
11
|
Li X, Yang H, Wen K, Zhong X, Xia X, Liu L, Qin D. A Method for Analyzing Two-locus Epistasis of Complex Diseases based on Decision Tree and Mutual Entropy. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190123150236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Epistasis makes complex diseases difficult to understand, especially when
heterogeneity also exists. Heterogeneity of complex diseases makes the distribution of case population
more confused. However, the traditional methods proposed to detect epistasis often ignore heterogeneity,
resulting in low power of association studies.
Methods:
In this study, we firstly use rank information in the Classification Decision Tree and Mutual
Entropy (CTME) to construct two different evaluation scores, namely multiple objectives. In addition, we
improve the calculation of joint entropy between SNPs and disease label, which elevates the efficiency of
CTME. Then, the ant colony algorithm is applied to search two-locus epistatic combination space. To
handle the potential heterogeneity, all candidate two-locus SNPs are merged to recognize multiple different
epistatic combinations. Finally, all these solutions are tested by χ2 test.
Results and Conclusion:
Experiments show that our method CTME improves the power of association
study. More importantly, CTME also detects multiple epistatic SNPs contributing to heterogeneity. The
experimental results show that CTME has advantages on power and efficiency.
Collapse
Affiliation(s)
- Xiong Li
- Key Laboratory of Advanced Control & Optimization of Jiangxi Province, East China Jiaotong University, Nanchang, 330013, China
| | - Hui Yang
- Key Laboratory of Advanced Control & Optimization of Jiangxi Province, East China Jiaotong University, Nanchang, 330013, China
| | - Kaifu Wen
- Postdoctoral Research Station, Jiang Xi Holitech Technology Co., Ltd., Jian, 343700, China
| | - Xiaoming Zhong
- Postdoctoral Research Station, Jiang Xi Holitech Technology Co., Ltd., Jian, 343700, China
| | - Xuewen Xia
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| | - Liyue Liu
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| | - Dehao Qin
- School of Software, East China Jiaotong University, Nanchang, 330013, China
| |
Collapse
|
12
|
High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2019. [DOI: 10.3390/make1010021] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are many new developments that extend this model significantly. Above all, the least absolute shrinkage and selection operator (LASSO) model gained considerable interest. In this paper, we review general regression models with a focus on the LASSO and extensions thereof, including the adaptive LASSO, elastic net, and group LASSO. We discuss the regularization terms responsible for inducing coefficient shrinkage and variable selection leading to improved performance metrics of these regression models. This makes these modern, computational regression models valuable tools for analyzing high-dimensional problems.
Collapse
|
13
|
Wu C, Pan W. Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet Epidemiol 2018; 42:303-316. [PMID: 29411426 PMCID: PMC5851843 DOI: 10.1002/gepi.22110] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 01/04/2018] [Accepted: 01/04/2018] [Indexed: 12/11/2022]
Abstract
Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS.
Collapse
Affiliation(s)
- Chong Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
14
|
Wei J, Li M, Gao F, Zeng R, Liu G, Li K. Multiple analyses of large-scale genome-wide association study highlight new risk pathways in lumbar spine bone mineral density. Oncotarget 2017; 7:31429-39. [PMID: 27119226 PMCID: PMC5058768 DOI: 10.18632/oncotarget.8948] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 03/29/2016] [Indexed: 11/25/2022] Open
Abstract
Osteoporosis is a common human complex disease. It is mainly characterized by low bone mineral density (BMD) and low-trauma osteoporotic fractures (OF). Until now, a large proportion of heritability has yet to be explained. The existing large-scale genome-wide association studies (GWAS) provide strong support for the investigation of osteoporosis mechanisms using pathway analysis. Recent findings showed that different risk pathways may be involved in BMD in different tissues. Here, we conducted multiple pathway analyses of a large-scale lumbar spine BMD GWAS dataset (2,468,080 SNPs and 31,800 samples) using two published gene-based analysis software including ProxyGeneLD and the PLINK. Using BMD genes from ProxyGeneLD, we identified 51 significant KEGG pathways with adjusted P<0.01. Using BMD genes from PLINK, we identified 38 significant KEGG pathways with adjusted P<0.01. Interestingly, 33 pathways are shared in both methods. In summary, we not only identified the known risk pathway such as Wnt signaling, in which the top GWAS variants are significantly enriched, but also highlight some new risk pathways. Interestingly, evidence from further supports the involvement of these pathways in MBD.
Collapse
Affiliation(s)
- Jinsong Wei
- Department of Orthopedic Surgery, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Ming Li
- Departmentof Endocrinology and Metabolism, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China
| | - Feng Gao
- Department of Trauma and Emergency Surgeon, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Rong Zeng
- Department of Orthopedic Surgery, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Guiyou Liu
- Genome Analysis Laboratory, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, China
| | - Keshen Li
- Institute of Neurology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China.,Stroke Center, Neurology & Neurosurgery Division, The Clinical Medicine Research Institute & The First Affiliated Hospital, Jinan University, Guangzhou, China
| |
Collapse
|
15
|
Glucocorticoid therapy regulates podocyte motility by inhibition of Rac1. Sci Rep 2017; 7:6725. [PMID: 28751734 PMCID: PMC5532274 DOI: 10.1038/s41598-017-06810-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 06/19/2017] [Indexed: 02/03/2023] Open
Abstract
Nephrotic syndrome (NS) occurs when the glomerular filtration barrier becomes excessively permeable leading to massive proteinuria. In childhood NS, immune system dysregulation has been implicated and increasing evidence points to the central role of podocytes in the pathogenesis. Children with NS are typically treated with an empiric course of glucocorticoid (Gc) therapy; a class of steroids that are activating ligands for the glucocorticoid receptor (GR) transcription factor. Although Gc-therapy has been the cornerstone of NS management for decades, the mechanism of action, and target cell, remain poorly understood. We tested the hypothesis that Gc acts directly on the podocyte to produce clinically useful effects without involvement of the immune system. In human podocytes, we demonstrated that the basic GR-signalling mechanism is intact and that Gc induced an increase in podocyte barrier function. Defining the GR-cistrome identified Gc regulation of motility genes. These findings were functionally validated with live-cell imaging. We demonstrated that treatment with Gc reduced the activity of the pro-migratory small GTPase regulator Rac1. Furthermore, Rac1 inhibition had a direct, protective effect on podocyte barrier function. Our studies reveal a new mechanism for Gc action directly on the podocyte, with translational relevance to designing new selective synthetic Gc molecules.
Collapse
|
16
|
Gui H, Kwan JS, Sham PC, Cherny SS, Li M. Sharing of Genes and Pathways Across Complex Phenotypes: A Multilevel Genome-Wide Analysis. Genetics 2017; 206:1601-1609. [PMID: 28495956 PMCID: PMC5500153 DOI: 10.1534/genetics.116.198150] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 04/20/2017] [Indexed: 12/15/2022] Open
Abstract
Evidence from genome-wide association studies (GWAS) suggest that pleiotropic effects on human complex phenotypes are very common. Recently, an atlas of genetic correlations among complex phenotypes has broadened our understanding of human diseases and traits. Here, we examine genetic overlap, from a gene-centric perspective, among the same 24 phenotypes previously investigated for genetic correlations. After adopting the multilevel pipeline (freely available at http://grass.cgs.hku.hk/limx/kgg/), which includes intragenic single nucleotide polymorphisms (SNPs), genes, and gene-sets, to estimate genetic similarities across phenotypes, a large amount of sharing of several biologically related phenotypes was confirmed. In addition, significant genetic overlaps were also found among phenotype pairs that were previously unidentified by SNP-level approaches. All these pairs with new genetic links are supported by earlier epidemiological evidence, although only a few of them have pleiotropic genes in the GWAS Catalog. Hence, our gene and gene-set analyses are able to provide new insights into cross-phenotype connections. The investigation on genetic sharing at three different levels presents a complementary picture of how common DNA sequence variations contribute to disease comorbidities and trait manifestations.
Collapse
Affiliation(s)
- Hongsheng Gui
- Center for Genomic Sciences, University of Hong Kong, Hong Kong SAR, China
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, Michigan 48202
| | - Johnny S Kwan
- Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China
| | - Pak C Sham
- Center for Genomic Sciences, University of Hong Kong, Hong Kong SAR, China
- Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China
- The State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Stacey S Cherny
- Center for Genomic Sciences, University of Hong Kong, Hong Kong SAR, China
- Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China
- The State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Miaoxin Li
- Center for Genomic Sciences, University of Hong Kong, Hong Kong SAR, China
- Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China
- Department of Medical Genetics, Center for Genome Research, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510275 China
| |
Collapse
|
17
|
Zeng Y, Navarro P, Fernandez-Pujals AM, Hall LS, Clarke TK, Thomson PA, Smith BH, Hocking LJ, Padmanabhan S, Hayward C, MacIntyre DJ, Wray NR, Deary IJ, Porteous DJ, Haley CS, McIntosh AM. A Combined Pathway and Regional Heritability Analysis Indicates NETRIN1 Pathway Is Associated With Major Depressive Disorder. Biol Psychiatry 2017; 81:336-346. [PMID: 27422368 PMCID: PMC5262437 DOI: 10.1016/j.biopsych.2016.04.017] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Revised: 04/20/2016] [Accepted: 04/21/2016] [Indexed: 01/14/2023]
Abstract
BACKGROUND Genome-wide association studies (GWASs) of major depressive disorder (MDD) have identified few significant associations. Testing the aggregation of genetic variants, in particular biological pathways, may be more powerful. Regional heritability analysis can be used to detect genomic regions that contribute to disease risk. METHODS We integrated pathway analysis and multilevel regional heritability analyses in a pipeline designed to identify MDD-associated pathways. The pipeline was applied to two independent GWAS samples [Generation Scotland: The Scottish Family Health Study (GS:SFHS, N = 6455) and Psychiatric Genomics Consortium (PGC:MDD) (N = 18,759)]. A polygenic risk score (PRS) composed of single nucleotide polymorphisms from the pathway most consistently associated with MDD was created, and its accuracy to predict MDD, using area under the curve, logistic regression, and linear mixed model analyses, was tested. RESULTS In GS:SFHS, four pathways were significantly associated with MDD, and two of these explained a significant amount of pathway-level regional heritability. In PGC:MDD, one pathway was significantly associated with MDD. Pathway-level regional heritability was significant in this pathway in one subset of PGC:MDD. For both samples the regional heritabilities were further localized to the gene and subregion levels. The NETRIN1 signaling pathway showed the most consistent association with MDD across the two samples. PRSs from this pathway showed competitive predictive accuracy compared with the whole-genome PRSs when using area under the curve statistics, logistic regression, and linear mixed model. CONCLUSIONS These post-GWAS analyses highlight the value of combining multiple methods on multiple GWAS data for the identification of risk pathways for MDD. The NETRIN1 signaling pathway is identified as a candidate pathway for MDD and should be explored in further large population studies.
Collapse
Affiliation(s)
- Yanni Zeng
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom.
| | - Pau Navarro
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Lynsey S Hall
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Toni-Kim Clarke
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Pippa A Thomson
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, United Kingdom; Medical Genetics Section, University of Edinburgh, Edinburgh, United Kingdom
| | - Blair H Smith
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, United Kingdom; Division of Population Health Sciences, University of Dundee, Dundee, United Kingdom
| | - Lynne J Hocking
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, United Kingdom; Division of Applied Health Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Sandosh Padmanabhan
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, United Kingdom; Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Caroline Hayward
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh, United Kingdom; Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Donald J MacIntyre
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Naomi R Wray
- Queensland Brain Institute, University of Queensland, St Lucia, Queensland, Australia
| | - Ian J Deary
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, United Kingdom; Generation Scotland, University of Edinburgh, Edinburgh, United Kingdom; Institute of Genetics and Molecular Medicine, Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom
| | - David J Porteous
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, United Kingdom; Medical Genetics Section, University of Edinburgh, Edinburgh, United Kingdom; Generation Scotland, University of Edinburgh, Edinburgh, United Kingdom
| | - Chris S Haley
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh, United Kingdom; The Roslin Institute and Royal (Dick) School of Veterinary Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom; Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, United Kingdom; Generation Scotland, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
18
|
Wang C, Ruggeri F, Hsiao CK, Argiento R. Bayesian nonparametric clustering and association studies for candidate SNP observations. Int J Approx Reason 2017. [DOI: 10.1016/j.ijar.2016.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Pathway Analysis Incorporating Protein-Protein Interaction Networks Identified Candidate Pathways for the Seven Common Diseases. PLoS One 2016; 11:e0162910. [PMID: 27622767 PMCID: PMC5021324 DOI: 10.1371/journal.pone.0162910] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 08/30/2016] [Indexed: 01/08/2023] Open
Abstract
Pathway analysis has become popular as a secondary analysis strategy for genome-wide association studies (GWAS). Most of the current pathway analysis methods aggregate signals from the main effects of single nucleotide polymorphisms (SNPs) in genes within a pathway without considering the effects of gene-gene interactions. However, gene-gene interactions can also have critical effects on complex diseases. Protein-protein interaction (PPI) networks have been used to define gene pairs for the gene-gene interaction tests. Incorporating the PPI information to define gene pairs for interaction tests within pathways can increase the power for pathway-based association tests. We propose a pathway association test, which aggregates the interaction signals in PPI networks within a pathway, for GWAS with case-control samples. Gene size is properly considered in the test so that genes do not contribute more to the test statistic simply due to their size. Simulation studies were performed to verify that the method is a valid test and can have more power than other pathway association tests in the presence of gene-gene interactions within a pathway under different scenarios. We applied the test to the Wellcome Trust Case Control Consortium GWAS datasets for seven common diseases. The most significant pathway is the chaperones modulate interferon signaling pathway for Crohn’s disease (p-value = 0.0003). The pathway modulates interferon gamma, which induces the JAK/STAT pathway that is involved in Crohn’s disease. Several other pathways that have functional implications for the seven diseases were also identified. The proposed test based on gene-gene interaction signals in PPI networks can be used as a complementary tool to the current existing pathway analysis methods focusing on main effects of genes. An efficient software implementing the method is freely available at http://puppi.sourceforge.net.
Collapse
|
20
|
Mooney MA, McWeeney SK, Faraone SV, Hinney A, Hebebrand J, Nigg JT, Wilmot B. Pathway analysis in attention deficit hyperactivity disorder: An ensemble approach. Am J Med Genet B Neuropsychiatr Genet 2016; 171:815-26. [PMID: 27004716 PMCID: PMC4983253 DOI: 10.1002/ajmg.b.32446] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 03/07/2016] [Indexed: 12/21/2022]
Abstract
Despite a wealth of evidence for the role of genetics in attention deficit hyperactivity disorder (ADHD), specific and definitive genetic mechanisms have not been identified. Pathway analyses, a subset of gene-set analyses, extend the knowledge gained from genome-wide association studies (GWAS) by providing functional context for genetic associations. However, there are numerous methods for association testing of gene sets and no real consensus regarding the best approach. The present study applied six pathway analysis methods to identify pathways associated with ADHD in two GWAS datasets from the Psychiatric Genomics Consortium. Methods that utilize genotypes to model pathway-level effects identified more replicable pathway associations than methods using summary statistics. In addition, pathways implicated by more than one method were significantly more likely to replicate. A number of brain-relevant pathways, such as RhoA signaling, glycosaminoglycan biosynthesis, fibroblast growth factor receptor activity, and pathways containing potassium channel genes, were nominally significant by multiple methods in both datasets. These results support previous hypotheses about the role of regulation of neurotransmitter release, neurite outgrowth and axon guidance in contributing to the ADHD phenotype and suggest the value of cross-method convergence in evaluating pathway analysis results. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Michael A. Mooney
- Division of Bioinformatics & Computational Biology, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon,OHSU Knight Cancer Institute, Portland, Oregon
| | - Shannon K. McWeeney
- Division of Bioinformatics & Computational Biology, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon,OHSU Knight Cancer Institute, Portland, Oregon,Oregon Clinical and Translational Research Institute, Portland, Oregon
| | - Stephen V. Faraone
- Departments of Psychiatry and Neuroscience & Physiology, State University of New York, Syracuse, New York,K.G. Jebsen Centre for Neuropsychiatric Disorders, Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Anke Hinney
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Johannes Hebebrand
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | | | | | - Joel T. Nigg
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, Oregon,Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, Oregon
| | - Beth Wilmot
- Division of Bioinformatics & Computational Biology, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon,OHSU Knight Cancer Institute, Portland, Oregon,Oregon Clinical and Translational Research Institute, Portland, Oregon,Correspondence to: Beth Wilmot, Ph.D., Division of Bioinformatics & Computational Biology, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 SW Sam Jackson Park Rd., Mail code: CR145, Portland, OR 97239.
| |
Collapse
|
21
|
Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? Nucleic Acids Res 2016; 44:6046-54. [PMID: 27269582 PMCID: PMC5291268 DOI: 10.1093/nar/gkw500] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Revised: 05/20/2016] [Accepted: 05/22/2016] [Indexed: 02/03/2023] Open
Abstract
While GWAS identify many disease-associated SNPs, using them to decipher disease mechanisms is hindered by the difficulty in mapping SNPs to genes. Most SNPs are in non-coding regions and it is often hard to identify the genes they implicate. To explore how far the SNP may be from the affected genes we used a pathway-based approach. We found that affected genes are often up to 2 Mbps away from the associated SNP, and are not necessarily the closest genes to the SNP. Existing approaches for mapping SNPs to genes leave many SNPs unmapped to genes and reveal only 86 significant phenotype-pathway associations for all known GWAS hits combined. Using the pathway-based approach we propose here allows mapping of virtually all SNPs to genes and reveals 435 statistically significant phenotype-pathway associations. In search for mechanisms that may explain the relationships between SNPs and distant genes, we found that SNPs that are mapped to distant genes have significantly more large insertions/deletions around them than other SNPs, suggesting that these SNPs may sometimes be markers for large insertions/deletions that may affect large genomic regions.
Collapse
Affiliation(s)
- Aharon Brodie
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| | - Johnathan Roy Azaria
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| | - Yanay Ofran
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|
22
|
WU Y, Zhu X, Li L, Fan W, Jin R, Zhang X. Mining Dual Networks. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 2016; 10:1-37. [DOI: 10.1145/2785970] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 05/01/2015] [Indexed: 01/05/2025]
Abstract
Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist
dual
networks. For example, in genetics, it is important to use protein interactions to interpret genetic interactions. In this application, one network represents
physical
interactions among nodes, for example, protein--protein interactions, and another network represents
conceptual
interactions, for example, genetic interactions. Edges in the conceptual network are usually derived based on certain correlation measure or statistical test measuring the strength of the interaction. Two nodes with strong conceptual interaction may not have direct physical interaction.
In this article, we propose the novel dual-network model and investigate the problem of finding the densest connected subgraph (DCS), which has the largest density in the conceptual network and is also connected in the physical network. Density in the conceptual network represents the average strength of the measured interacting signals among the set of nodes. Connectivity in the physical network shows how they interact physically. Such pattern cannot be identified using the existing algorithms for a single network. We show that even though finding the densest subgraph in a single network is polynomial time solvable, the DCS problem is NP-hard. We develop a two-step approach to solve the DCS problem. In the first step, we effectively prune the dual networks, while guarantee that the optimal solution is contained in the remaining networks. For the second step, we develop two efficient greedy methods based on different search strategies to find the DCS. Different variations of the DCS problem are also studied. We perform extensive experiments on a variety of real and synthetic dual networks to evaluate the effectiveness and efficiency of the developed methods.
Collapse
Affiliation(s)
- Yubao WU
- Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH
| | - Xiaofeng Zhu
- Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH
| | - Li Li
- Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH
| | - Wei Fan
- Baidu Research Big Data Lab, Sunnyvale, CA
| | - Ruoming Jin
- Computer Science, Kent State University, Kent, OH
| | - Xiang Zhang
- Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH
| |
Collapse
|
23
|
Chen M, Rothman N, Ye Y, Gu J, Scheet PA, Huang M, Chang DW, Dinney CP, Silverman DT, Figueroa JD, Chanock SJ, Wu X. Pathway analysis of bladder cancer genome-wide association study identifies novel pathways involved in bladder cancer development. Genes Cancer 2016; 7:229-239. [PMID: 27738493 PMCID: PMC5059113 DOI: 10.18632/genesandcancer.113] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 07/28/2016] [Indexed: 11/25/2022] Open
Abstract
Genome-wide association studies (GWAS) are designed to identify individual regions associated with cancer risk, but only explain a small fraction of the inherited variability. Alternative approach analyzing genetic variants within biological pathways has been proposed to discover networks of susceptibility genes with additional effects. The gene set enrichment analysis (GSEA) may complement and expand traditional GWAS analysis to identify novel genes and pathways associated with bladder cancer risk. We selected three GSEA methods: Gen-Gen, Aligator, and the SNP Ratio Test to evaluate cellular signaling pathways involved in bladder cancer susceptibility in a Texas GWAS population. The candidate genetic polymorphisms from the significant pathway selected by GSEA were validated in an independent NCI GWAS. We identified 18 novel pathways (P < 0.05) significantly associated with bladder cancer risk. Five of the most promising pathways (P ≤ 0.001 in any of the three GSEA methods) among the 18 pathways included two cell cycle pathways and neural cell adhesion molecule (NCAM), platelet-derived growth factor (PDGF), and unfolded protein response pathways. We validated the candidate polymorphisms in the NCI GWAS and found variants of RAPGEF1, SKP1, HERPUD1, CACNB2, CACNA1C, CACNA1S, COL4A2, SRC, and CACNA1C were associated with bladder cancer risk. Two CCNE1 variants, rs8102137 and rs997669, from cell cycle pathways showed the strongest associations; the CCNE1 signal at 19q12 has already been reported in previous GWAS. These findings offer additional etiologic insights highlighting the specific genes and pathways associated with bladder cancer development. GSEA may be a complementary tool to GWAS to identify additional loci of cancer susceptibility.
Collapse
Affiliation(s)
- Meng Chen
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Yuanqing Ye
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Jian Gu
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Paul A Scheet
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Maosheng Huang
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - David W Chang
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Colin P Dinney
- Department of Urology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Debra T Silverman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Jonine D Figueroa
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Xifeng Wu
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
24
|
Zhang H, Wheeler W, Hyland PL, Yang Y, Shi J, Chatterjee N, Yu K. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations. PLoS Genet 2016; 12:e1006122. [PMID: 27362418 PMCID: PMC4928884 DOI: 10.1371/journal.pgen.1006122] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 05/20/2016] [Indexed: 12/17/2022] Open
Abstract
Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.
Collapse
Affiliation(s)
- Han Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - William Wheeler
- Information Management Services Inc., Calverton, Maryland, United States of America
| | - Paula L. Hyland
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yifan Yang
- Department of Statistics, University of Kentucky, Lexington, Kentucky, United States of America
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (NC); (KY)
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (NC); (KY)
| |
Collapse
|
25
|
A comparison of DMET Plus microarray and genome-wide technologies by assessing population substructure. Pharmacogenet Genomics 2016; 26:147-153. [PMID: 26731477 DOI: 10.1097/fpc.0000000000000200] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE The capacity of the Affymetrix drug metabolism enzymes and transporters (DMET) Plus pharmacogenomics genotyping chip to estimate population substructure and cryptic relatedness was evaluated. The results were compared with estimates using genome-wide HapMap data for the same individuals. METHODS For 301 unrelated individuals, spanning three continental populations and one admixed population, genotypic data were collected using the Affymetrix DMET Plus microarray. Genome-wide data on these individuals were obtained from HapMap release 3. Population substructure was assessed using Eigenstrat and ADMIXTURE software for both platforms. Cryptic relatedness was explored by inbreeding coefficient estimation. Nonparametric tests were used to determine correlations of the analytical results of the two genotyping platforms. RESULTS Principal components analysis identified population substructure for both datasets, with 15.8 and 16.6% of the total variance explained in the first two principal components for DMET Plus and HapMap data, respectively. ADMIXTURE results correctly identified four subpopulations within each dataset. Nonparametric rank correlations indicated significant associations between analyses with an average ρ=0.7272 (P<10) across the three continental populations and ρ=0.4888 for the admixed population. Concordance correlation coefficients (average ρc=0.9693 across all four subpopulations) strongly indicate concordance between ADMIXTURE results. Inbreeding coefficients were slightly inflated (16 individuals>0.15) using DMET Plus data and no cryptic relatedness was indicated using HapMap data. The inflated inbreeding estimation could be because of the limited number of markers provided by DMET as a random sample of 1832 markers from HapMap also yielded inflated estimates of cryptic relatedness (39 individuals>0.15). Furthermore, use of single nucleotide polymorphisms located in genes involved in metabolism and transport may have different allele frequencies in subpopulations than single nucleotide polymorphisms sampled from the whole genome. CONCLUSION The DMET Plus pharmacogenomics genotyping chip is effective in quantifying population substructure across the three continental populations and inferring the presence of an admixed population. On the basis of our results, these microarrays offer sufficient depth for covariate adjustment of population substructure in genomic association studies.
Collapse
|
26
|
Huang J, Wang K, Wei P, Liu X, Liu X, Tan K, Boerwinkle E, Potash JB, Han S. FLAGS: A Flexible and Adaptive Association Test for Gene Sets Using Summary Statistics. Genetics 2016; 202:919-29. [PMID: 26773050 PMCID: PMC4788129 DOI: 10.1534/genetics.115.185009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 01/13/2016] [Indexed: 01/06/2023] Open
Abstract
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a FL: exible and A: daptive test for G: ene S: ets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn's disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available.
Collapse
Affiliation(s)
- Jianfei Huang
- Department of Psychiatry, University of Iowa, Iowa City, Iowa 52242
| | - Kai Wang
- Department of Biostatistics, University of Iowa, Iowa City, Iowa 52242
| | - Peng Wei
- Department of Biostatistics, University of Texas School of Public Health, Houston, Texas 77225
| | - Xiangtao Liu
- Department of Psychiatry, University of Iowa, Iowa City, Iowa 52242
| | - Xiaoming Liu
- Human Genetics Center, University of Texas Health Science Center, Houston, Texas 77030
| | - Kai Tan
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa 52242 Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, Iowa 52242
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center, Houston, Texas 77030 Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030
| | - James B Potash
- Department of Psychiatry, University of Iowa, Iowa City, Iowa 52242
| | - Shizhong Han
- Department of Psychiatry, University of Iowa, Iowa City, Iowa 52242 Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, Iowa 52242
| |
Collapse
|
27
|
Su YC, Gauderman WJ, Berhane K, Lewinger JP. Adaptive Set-Based Methods for Association Testing. Genet Epidemiol 2015; 40:113-22. [PMID: 26707371 DOI: 10.1002/gepi.21950] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 11/02/2015] [Accepted: 11/17/2015] [Indexed: 12/31/2022]
Abstract
With a typical sample size of a few thousand subjects, a single genome-wide association study (GWAS) using traditional one single nucleotide polymorphism (SNP)-at-a-time methods can only detect genetic variants conferring a sizable effect on disease risk. Set-based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. Although self-contained set-based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set-based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self-contained methods are best. In particular, several self-contained set tests have been proposed to directly or indirectly "adapt" to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set-based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best-combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a least absolute shrinkage and selection operator (LASSO)-based test.
Collapse
Affiliation(s)
- Yu-Chen Su
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - William James Gauderman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Kiros Berhane
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Juan Pablo Lewinger
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
28
|
Dumancas GG, Ramasahayam S, Bello G, Hughes J, Kramer R. Chemometric regression techniques as emerging, powerful tools in genetic association studies. Trends Analyt Chem 2015. [DOI: 10.1016/j.trac.2015.05.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
29
|
Mooney MA, Wilmot B. Gene set analysis: A step-by-step guide. Am J Med Genet B Neuropsychiatr Genet 2015; 168:517-27. [PMID: 26059482 PMCID: PMC4638147 DOI: 10.1002/ajmg.b.32328] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/20/2015] [Indexed: 12/21/2022]
Abstract
To maximize the potential of genome-wide association studies, many researchers are performing secondary analyses to identify sets of genes jointly associated with the trait of interest. Although methods for gene-set analyses (GSA), also called pathway analyses, have been around for more than a decade, the field is still evolving. There are numerous algorithms available for testing the cumulative effect of multiple SNPs, yet no real consensus in the field about the best way to perform a GSA. This paper provides an overview of the factors that can affect the results of a GSA, the lessons learned from past studies, and suggestions for how to make analysis choices that are most appropriate for different types of data. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Michael A. Mooney
- Department of Medical Informatics & Clinical Epidemiology, Division of Bioinformatics & Computational Biology, Oregon Health & Science University, Portland, Oregon,OHSU Knight Cancer Institute, Portland, Oregon
| | - Beth Wilmot
- Department of Medical Informatics & Clinical Epidemiology, Division of Bioinformatics & Computational Biology, Oregon Health & Science University, Portland, Oregon,OHSU Knight Cancer Institute, Portland, Oregon,Oregon Clinical and Translational Research Institute, Portland, Oregon,Correspondence to: Beth Wilmot, Department of Medical Informatics & Clinical Epidemiology, Division of Bioinformatics & Computational Biology, Oregon Health & Science University, Portland, OR 97239.
| |
Collapse
|
30
|
Bao X, Liu G, Jiang Y, Jiang Q, Liao M, Feng R, Zhang L, Ma G, Zhang S, Chen Z, Zhao B, Wang R, Li K, Liu G. Cell adhesion molecule pathway genes are regulated by cis-regulatory SNPs and show significantly altered expression in Alzheimer's disease brains. Neurobiol Aging 2015; 36:2904.e1-7. [PMID: 26149918 DOI: 10.1016/j.neurobiolaging.2015.06.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 04/27/2015] [Accepted: 06/04/2015] [Indexed: 01/21/2023]
|
31
|
Derrick T, Roberts CH, Last AR, Burr SE, Holland MJ. Trachoma and Ocular Chlamydial Infection in the Era of Genomics. Mediators Inflamm 2015; 2015:791847. [PMID: 26424969 PMCID: PMC4573990 DOI: 10.1155/2015/791847] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 08/05/2015] [Indexed: 12/19/2022] Open
Abstract
Trachoma is a blinding disease usually caused by infection with Chlamydia trachomatis (Ct) serovars A, B, and C in the upper tarsal conjunctiva. Individuals in endemic regions are repeatedly infected with Ct throughout childhood. A proportion of individuals experience prolonged or severe inflammatory episodes that are known to be significant risk factors for ocular scarring in later life. Continued scarring often leads to trichiasis and in-turning of the eyelashes, which causes pain and can eventually cause blindness. The mechanisms driving the chronic immunopathology in the conjunctiva, which largely progresses in the absence of detectable Ct infection in adults, are likely to be multifactorial. Socioeconomic status, education, and behavior have been identified as contributing to the risk of scarring and inflammation. We focus on the contribution of host and pathogen genetic variation, bacterial ecology of the conjunctiva, and host epigenetic imprinting including small RNA regulation by both host and pathogen in the development of ocular pathology. Each of these factors or processes contributes to pathogenic outcomes in other inflammatory diseases and we outline their potential role in trachoma.
Collapse
Affiliation(s)
- Tamsyn Derrick
- Department of Clinical Research, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Chrissy h. Roberts
- Department of Clinical Research, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Anna R. Last
- Department of Clinical Research, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Sarah E. Burr
- Department of Clinical Research, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Martin J. Holland
- Department of Clinical Research, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| |
Collapse
|
32
|
Austin E, Shen X, Pan W. A Novel Statistic for Global Association Testing Based on Penalized Regression. Genet Epidemiol 2015; 39:415-26. [PMID: 26282998 DOI: 10.1002/gepi.21915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 05/22/2015] [Accepted: 06/11/2015] [Indexed: 11/09/2022]
Abstract
Natural genetic structures like genes may contain multiple variants that work as a group to determine a biologic outcome. The effect of rare variants, mutations occurring in less than 5% of samples, is hypothesized to be explained best as groups collectively associated with a biologic function. Therefore, it is important to develop powerful association tests to identify a true association between an outcome of interest and a group of variants, in particular a group with many rare variants. In this article we first delineate a novel penalized regression-based global test for the association between sets of variants and a disease phenotype. Next, we use Genetic Analysis Workshop 18 (GAW18) data to assess the power of the new global association test to capture a relationship between an aggregated group of variants and a simulated hypertension status. Rare variant only, common variant only, and combined variant groups are studied. The power values are compared to those obtained from eight well-regarded global tests (Score, Sum, SSU, SSUw, UminP, aSPU, aSPUw, and sequence kernel association test (SKAT)) that do not use penalized regression and a set of tests using either the SSU or score statistics and least absolute shrinkage and selection operator penalty (LASSO) logistic regression. Association testing of rare variants with our method was the top performer when there was low linkage disequilibrium (LD) between and within causal variants. This was similarly true when simultaneously testing rare and common variants in low LD scenarios. Finally, our method was able to provide meaningful variant-specific association information.
Collapse
Affiliation(s)
- Erin Austin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States Of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
33
|
Pathway analysis of body mass index genome-wide association study highlights risk pathways in cardiovascular disease. Sci Rep 2015; 5:13025. [PMID: 26264282 PMCID: PMC4533004 DOI: 10.1038/srep13025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 07/15/2015] [Indexed: 01/02/2023] Open
Abstract
Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. It is reported that body mass index (BMI) is risk factor for CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors. However, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms from 4763 samples. We identified 31 significant KEGG pathways. There is literature evidence supporting the involvement of GnRH signaling, vascular smooth muscle contraction, dilated cardiomyopathy, Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results may advance our understanding of BMI mechanisms in CVD.
Collapse
|
34
|
Pan W, Kwak IY, Wei P. A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 2015; 97:86-98. [PMID: 26119817 DOI: 10.1016/j.ajhg.2015.05.018] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/21/2015] [Indexed: 12/11/2022] Open
Abstract
In spite of the success of genome-wide association studies (GWASs), only a small proportion of heritability for each complex trait has been explained by identified genetic variants, mainly SNPs. Likely reasons include genetic heterogeneity (i.e., multiple causal genetic variants) and small effect sizes of causal variants, for which pathway analysis has been proposed as a promising alternative to the standard single-SNP-based analysis. A pathway contains a set of functionally related genes, each of which includes multiple SNPs. Here we propose a pathway-based test that is adaptive at both the gene and SNP levels, thus maintaining high power across a wide range of situations with varying numbers of the genes and SNPs associated with a trait. The proposed method is applicable to both common variants and rare variants and can incorporate biological knowledge on SNPs and genes to boost statistical power. We use extensively simulated data and a WTCCC GWAS dataset to compare our proposal with several existing pathway-based and SNP-set-based tests, demonstrating its promising performance and its potential use in practice.
Collapse
|
35
|
Ray D, Li X, Pan W, Pankow JS, Basu S. A Bayesian Partitioning Model for the Detection of Multilocus Effects in Case-Control Studies. Hum Hered 2015; 79:69-79. [PMID: 26044550 DOI: 10.1159/000369858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 11/12/2014] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. METHODS Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. RESULTS We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. CONCLUSION We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.
Collapse
Affiliation(s)
- Debashree Ray
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minn., USA
| | | | | | | | | |
Collapse
|
36
|
Wang YT, Sung PY, Lin PL, Yu YW, Chung RH. A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families. BMC Genomics 2015; 16:381. [PMID: 25975968 PMCID: PMC4433014 DOI: 10.1186/s12864-015-1620-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 05/05/2015] [Indexed: 01/22/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set. Results We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6). Conclusions Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yi-Ting Wang
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Pei-Yuan Sung
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Peng-Lin Lin
- Department of Medical Science, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Ya-Wen Yu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan.
| | - Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan.
| |
Collapse
|
37
|
JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits. Genes (Basel) 2015; 6:238-51. [PMID: 26110313 PMCID: PMC4488663 DOI: 10.3390/genes6020238] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/27/2015] [Indexed: 12/25/2022] Open
Abstract
Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG), which can be applied to Genome Wide Association (GWA) data and tests for the joint effect of all single nucleotide polymorphisms (SNPs) located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC) Crohn’s disease data set and show that JAG correctly identifies validated gene-sets for Crohn’s disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website.
Collapse
|
38
|
Wang X, Qin L, Zhang H, Zhang Y, Hsu L, Wang P. A regularized multivariate regression approach for eQTL analysis. STATISTICS IN BIOSCIENCES 2015; 7:129-146. [PMID: 26085849 DOI: 10.1007/s12561-013-9106-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the high-dimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of trans-hub-eQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These findings may provide insight into biological processes associated with cancers and generate hypotheses for future studies.
Collapse
Affiliation(s)
- Xianlong Wang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA, USA
| | - Li Qin
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA, USA
| | - Hexin Zhang
- Institute of Mathematics Sciences, Peking University, Beijing, China
| | - Yuzheng Zhang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA, USA
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA, USA
| | - Pei Wang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA, USA
| |
Collapse
|
39
|
Wojcik GL, Kao WHL, Duggal P. Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies. BMC Genet 2015; 16:34. [PMID: 25887572 PMCID: PMC4391470 DOI: 10.1186/s12863-015-0191-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 03/19/2015] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Despite the success of genome-wide association studies (GWAS), there still remains "missing heritability" for many traits. One contributing factor may be the result of examining one marker at a time as opposed to a group of markers that are biologically meaningful in aggregate. To address this problem, a variety of gene- and pathway-level methods have been developed to identify putative biologically relevant associations. A simulation was conducted to systematically assess the performance of these methods. Using genetic data from 4,500 individuals in the Wellcome Trust Case Control Consortium (WTCCC), case-control status was simulated based on an additive polygenic model. We evaluated gene-level methods based on their sensitivity, specificity, and proportion of false positives. Pathway-level methods were evaluated on the relationship between proportion of causal genes within the pathway and the strength of association. RESULTS The gene-level methods had low sensitivity (20-63%), high specificity (89-100%), and low proportion of false positives (0.1-6%). The gene-level program VEGAS using only the top 10% of associated single nucleotide polymorphisms (SNPs) within the gene had the highest sensitivity (28.6%) with less than 1% false positives. The performance of the pathway-level methods depended on their reliance upon asymptotic distributions or if significance was estimated in a competitive manner. The pathway-level programs GenGen, GSA-SNP and MAGENTA had the best performance while accounting for potential confounders. CONCLUSIONS Novel genes and pathways can be identified using the gene and pathway-level methods. These methods may provide valuable insight into the "missing heritability" of traits and provide biological interpretations to GWAS findings.
Collapse
Affiliation(s)
- Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA. .,Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| | - W H Linda Kao
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
40
|
Stingo FC, Swartz MD, Vannucci M. A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data. STATISTICS AND ITS INTERFACE 2015; 8:137-151. [PMID: 28989562 PMCID: PMC5630184 DOI: 10.4310/sii.2015.v8.n2.a2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.
Collapse
Affiliation(s)
- Francesco C Stingo
- Department of Biostatistics, MD Anderson Cancer Center, 1400 Pressler St. Houston, TX 77030, USA
| | - Michael D Swartz
- Department of Biostatistics, UT School of Public Health, 1200 Pressler St. Houston, TX 77030, USA
| | - Marina Vannucci
- Department of Statistics, MS 138, Rice University, 6100 Main St. Houston, TX 77251-1892 USA
| |
Collapse
|
41
|
Quan B, Qi X, Yu Z, Jiang Y, Liao M, Wang G, Feng R, Zhang L, Chen Z, Jiang Q, Liu G. Pathway analysis of genome-wide association study and transcriptome data highlights new biological pathways in colorectal cancer. Mol Genet Genomics 2014; 290:603-10. [PMID: 25362561 DOI: 10.1007/s00438-014-0945-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 10/17/2014] [Indexed: 12/11/2022]
Abstract
Colorectal cancer (CRC) is a common malignancy that meets the definition of a complex disease. Genome-wide association study (GWAS) has identified several loci of weak predictive value in CRC, however, these do not fully explain the occurrence risk. Recently, gene set analysis has allowed enhanced interpretation of GWAS data in CRC, identifying a number of metabolic pathways as important for disease pathogenesis. Whether there are other important pathways involved in CRC, however, remains unclear. We present a systems analysis of KEGG pathways in CRC using (1) a human CRC GWAS dataset and (2) a human whole transcriptome CRC case-control expression dataset. Analysis of the GWAS dataset revealed significantly enriched KEGG pathways related to metabolism, immune system and diseases, cellular processes, environmental information processing, genetic information processing, and neurodegenerative diseases. Altered gene expression was confirmed in these pathways using the transcriptome dataset. Taken together, these findings not only confirm previous work in this area, but also highlight new biological pathways whose deregulation is critical for CRC. These results contribute to our understanding of disease-causing mechanisms and will prove useful for future genetic and functional studies in CRC.
Collapse
Affiliation(s)
- Baoku Quan
- Department of General Surgery, The First Hospital of Harbin, Harbin, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Verma M. Molecular profiling and companion diagnostics: where is personalized medicine in cancer heading? Per Med 2014; 11:761-771. [PMID: 29764045 DOI: 10.2217/pme.14.41] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The goal of personalized medicine is to use the right drug at the right dose - with minimal or no toxicity - for the right patient at the right time. Recent advances in understanding cell biology and pathways, and in using molecular 'omics' technologies to diagnose cancer, offer a strategic bridge to personalized medicine in cancer. Modern personalized medicine takes into account an individual's genetic makeup and disease history before developing a treatment regimen. The future of clinical oncology will be based on the use of predictive and prognostic biomarkers in patient management. Once implemented widely, personalized medicine will benefit patients and the healthcare system greatly.
Collapse
|
43
|
Lin D, Zhang J, Li J, He H, Deng HW, Wang YP. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression. Front Cell Dev Biol 2014; 2:62. [PMID: 25364766 PMCID: PMC4209817 DOI: 10.3389/fcell.2014.00062] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 10/01/2014] [Indexed: 01/10/2023] Open
Abstract
A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Jingyao Li
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Hao He
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| |
Collapse
|
44
|
Mooney MA, Nigg JT, McWeeney SK, Wilmot B. Functional and genomic context in pathway analysis of GWAS data. Trends Genet 2014; 30:390-400. [PMID: 25154796 DOI: 10.1016/j.tig.2014.07.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/18/2014] [Accepted: 07/18/2014] [Indexed: 02/07/2023]
Abstract
Gene set analysis (GSA) is a promising tool for uncovering the polygenic effects associated with complex diseases. However, the available techniques reflect a wide variety of hypotheses about how genetic effects interact to contribute to disease susceptibility. The lack of consensus about the best way to perform GSA has led to confusion in the field and has made it difficult to compare results across methods. A clear understanding of the various choices made during GSA - such as how gene sets are defined, how single-nucleotide polymorphisms (SNPs) are assigned to genes, and how individual SNP-level effects are aggregated to produce gene- or pathway-level effects - will improve the interpretability and comparability of results across methods and studies. In this review we provide an overview of the various data sources used to construct gene sets and the statistical methods used to test for gene set association, as well as provide guidelines for ensuring the comparability of results.
Collapse
Affiliation(s)
- Michael A Mooney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| | - Joel T Nigg
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, OR, USA; Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR, USA
| | - Shannon K McWeeney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA.
| | - Beth Wilmot
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| |
Collapse
|
45
|
Balliu B, Uh HW, Tsonaka R, Boehringer S, Helmer Q, Houwing-Duistermaat JJ. Combining information from linkage and association mapping for next-generation sequencing longitudinal family data. BMC Proc 2014; 8:S34. [PMID: 25519382 PMCID: PMC4143620 DOI: 10.1186/1753-6561-8-s1-s34] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In this analysis, we investigate the contributions that linkage-based methods, such as identical-by-descent mapping, can make to association mapping to identify rare variants in next-generation sequencing data. First, we identify regions in which cases share more segments identical-by-descent around a putative causal variant than do controls. Second, we use a two-stage mixed-effect model approach to summarize the single-nucleotide polymorphism data within each region and include them as covariates in the model for the phenotype. We assess the impact of linkage disequilibrium in determining identical-by-descent states between individuals by using markers with and without linkage disequilibrium for the first part and the impact of imputation in testing for association by using imputed genome-wide association studies or raw sequence markers for the second part. We apply the method to next-generation sequencing longitudinal family data from Genetic Association Workshop 18 and identify a significant region at chromosome 3: 40249244-41025167 (p-value = 2.3 × 10−3).
Collapse
Affiliation(s)
- Brunilda Balliu
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Hae-Won Uh
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands ; Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Roula Tsonaka
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Stefan Boehringer
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | - Quinta Helmer
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands ; Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Jeanine J Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| |
Collapse
|
46
|
Juraeva D, Haenisch B, Zapatka M, Frank J, Witt SH, Mühleisen TW, Treutlein J, Strohmaier J, Meier S, Degenhardt F, Giegling I, Ripke S, Leber M, Lange C, Schulze TG, Mössner R, Nenadic I, Sauer H, Rujescu D, Maier W, Børglum A, Ophoff R, Cichon S, Nöthen MM, Rietschel M, Mattheisen M, Brors B. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia. PLoS Genet 2014; 10:e1004345. [PMID: 24901509 PMCID: PMC4046913 DOI: 10.1371/journal.pgen.1004345] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Accepted: 03/20/2014] [Indexed: 11/19/2022] Open
Abstract
In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.
Collapse
Affiliation(s)
- Dilafruz Juraeva
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Britta Haenisch
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- Federal Institute for Drugs and Medical Devices (BfArM), Bonn, Germany
- Department of Psychiatry, University of Bonn, Bonn, Germany
| | - Marc Zapatka
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Josef Frank
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | | | | | - Stephanie H. Witt
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Thomas W. Mühleisen
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
- Institute for Neuroscience and Medicine (INM-1), Research Centre Juelich, Juelich, Germany
| | - Jens Treutlein
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Jana Strohmaier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Sandra Meier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
- National Centre for Integrated Register-based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
| | - Franziska Degenhardt
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
| | - Ina Giegling
- Division of Molecular and Clinical Neurobiology, Department of Psychiatry, Ludwig-Maximilians-University, Munich, Germany
- Department of Psychiatry, University of Halle-Wittenberg, Halle, Germany
| | - Stephan Ripke
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Markus Leber
- Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, Bonn, Germany
| | - Christoph Lange
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- Department of Genomic Mathematics, University of Bonn, Bonn, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Thomas G. Schulze
- Department of Psychiatry and Psychotherapy, University Medical Center Georg-August-Universität, Göttingen, Germany
| | | | - Igor Nenadic
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Heinrich Sauer
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Dan Rujescu
- Division of Molecular and Clinical Neurobiology, Department of Psychiatry, Ludwig-Maximilians-University, Munich, Germany
- Department of Psychiatry, University of Halle-Wittenberg, Halle, Germany
| | - Wolfgang Maier
- Department of Psychiatry, University of Bonn, Bonn, Germany
| | - Anders Børglum
- Department of Biomedicine, Aarhus University, Aarhus C, Denmark and Center for Integrated Sequencing, iSEQ, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus and Copenhagen, Denmark
- Centre for Psychiatric Research, Aarhus University Hospital, Risskov, Denmark
| | - Roel Ophoff
- UCLA Center for Neurobehavioral Genetics, Los Angeles, California, United States of America
- Department of Psychiatry, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sven Cichon
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
- Institute for Neuroscience and Medicine (INM-1), Research Centre Juelich, Juelich, Germany
- Department of Medical Genetics, University Hospital Basel, Basel, Switzerland
| | - Markus M. Nöthen
- Institute of Human Genetics, University of Bonn, Bonn, Germany
- Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Manuel Mattheisen
- Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
- Department of Genomic Mathematics, University of Bonn, Bonn, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Department of Biomedicine, Aarhus University, Aarhus C, Denmark and Center for Integrated Sequencing, iSEQ, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus and Copenhagen, Denmark
| | - Benedikt Brors
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
47
|
Rodriguez-Fontenla C, Calaza M, Gonzalez A. Genetic distance as an alternative to physical distance for definition of gene units in association studies. BMC Genomics 2014; 15:408. [PMID: 24884992 PMCID: PMC4048458 DOI: 10.1186/1471-2164-15-408] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 05/20/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ±50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. RESULTS Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ±50 Kb offset that has been common in previous studies. A SRR≥2 was selected because it led to gene extensions with median length=45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ±50 Kb and with the SRR≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR≥2 definition only missed 4 of the genes, whereas the based in the ±50 Kb definition missed 10 genes. CONCLUSIONS A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions.
Collapse
Affiliation(s)
| | | | - Antonio Gonzalez
- Laboratorio de Investigacion 10 and Rheumatology Unit, Instituto de Investigacion Sanitaria - Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain.
| |
Collapse
|
48
|
Yan Q, Tiwari HK, Yi N, Lin WY, Gao G, Lou XY, Cui X, Liu N. Kernel-machine testing coupled with a rank-truncation method for genetic pathway analysis. Genet Epidemiol 2014; 38:447-56. [PMID: 24849109 DOI: 10.1002/gepi.21813] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 01/09/2023]
Abstract
Traditional genome-wide association studies (GWASs) usually focus on single-marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single-nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome-sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Huang A, Martin ER, Vance JM, Cai X. Detecting genetic interactions in pathway-based genome-wide association studies. Genet Epidemiol 2014; 38:300-9. [PMID: 24719383 DOI: 10.1002/gepi.21803] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Revised: 01/06/2014] [Accepted: 02/28/2014] [Indexed: 12/13/2022]
Abstract
Pathway-based genome-wide association studies (GWAS) can exploit collective effects of causal variants in a pathway to increase power of detection. However, current methods for pathway-based GWAS do not consider epistatic effects of genetic variants, although interactions between genetic variants may play an important role in influencing complex traits. In this paper, we employed a Bayesian Lasso logistic regression model for pathway-based GWAS to include all possible main effects and a large number of pairwise interactions of single nucleotide polymorphisms (SNPs) in a pathway, and then inferred the model with an efficient group empirical Bayesian Lasso (EBLasso) method. Using the inferred model, the statistical significance of a pathway was tested with the Wald statistics. Reliable effects in a significant pathway were selected using the stability selection technique. Extensive computer simulations demonstrated that our group EBlasso method significantly outperformed two competitive methods in most simulation setups and offered similar performance in other simulation setups. When applying to a GWAS dataset for Parkinson disease, EBLasso identified three significant pathways including the primary bile acid biosynthesis pathway, the neuroactive ligand-receptor interaction, and the MAPK signaling pathway. All effects identified in the primary bile acid biosynthesis pathway and many of effects in the other two pathways were epistatic effects. The group EBLasso method provides a valuable tool for pathway-based GWAS to identify main and epistatic effects of genetic variants.
Collapse
Affiliation(s)
- Anhui Huang
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
| | | | | | | |
Collapse
|
50
|
Pirastu N, Kooyman M, Traglia M, Robino A, Willems SM, Pistis G, d’Adamo P, Amin N, d’Eustacchio A, Navarini L, Sala C, Karssen LC, van Duijn C, Toniolo D, Gasparini P. Association analysis of bitter receptor genes in five isolated populations identifies a significant correlation between TAS2R43 variants and coffee liking. PLoS One 2014; 9:e92065. [PMID: 24647340 PMCID: PMC3960174 DOI: 10.1371/journal.pone.0092065] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 02/19/2014] [Indexed: 01/20/2023] Open
Abstract
Coffee, one of the most popular beverages in the world, contains many different physiologically active compounds with a potential impact on people’s health. Despite the recent attention given to the genetic basis of its consumption, very little has been done in understanding genes influencing coffee preference among different individuals. Given its markedly bitter taste, we decided to verify if bitter receptor genes (TAS2Rs) variants affect coffee liking. In this light, 4066 people from different parts of Europe and Central Asia filled in a field questionnaire on coffee liking. They have been consequently recruited and included in the study. Eighty-eight SNPs covering the 25 TAS2R genes were selected from the available imputed ones and used to run association analysis for coffee liking. A significant association was detected with three SNP: one synonymous and two functional variants (W35S and H212R) on the TAS2R43 gene. Both variants have been shown to greatly reduce in vitro protein activity. Surprisingly the wild type allele, which corresponds to the functional form of the protein, is associated to higher liking of coffee. Since the hTAS2R43 receptor is sensible to caffeine, we verified if the detected variants produced differences in caffeine bitter perception on a subsample of people coming from the FVG cohort. We found a significant association between differences in caffeine perception and the H212R variant but not with the W35S, which suggests that the effect of the TAS2R43 gene on coffee liking is mediated by caffeine and in particular by the H212R variant. No other significant association was found with other TAS2R genes. In conclusion, the present study opens new perspectives in the understanding of coffee liking. Further studies are needed to clarify the role of the TAS2R43 gene in coffee hedonics and to identify which other genes and pathways are involved in its genetics.
Collapse
Affiliation(s)
- Nicola Pirastu
- Institute for Maternal and Child Health, Istituto Di Ricovero e Cura a Carattere Scientifico “Burlo Garofolo,” Trieste, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
- * E-mail:
| | - Maarten Kooyman
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Michela Traglia
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Antonietta Robino
- Institute for Maternal and Child Health, Istituto Di Ricovero e Cura a Carattere Scientifico “Burlo Garofolo,” Trieste, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Sara M. Willems
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Giorgio Pistis
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Pio d’Adamo
- Institute for Maternal and Child Health, Istituto Di Ricovero e Cura a Carattere Scientifico “Burlo Garofolo,” Trieste, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Najaf Amin
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Angela d’Eustacchio
- Institute for Maternal and Child Health, Istituto Di Ricovero e Cura a Carattere Scientifico “Burlo Garofolo,” Trieste, Italy
| | | | - Cinzia Sala
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Lennart C. Karssen
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Cornelia van Duijn
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
- Centre for Medical Systems Biology, Leiden University Medical Center, Leiden, The Netherlands
| | - Daniela Toniolo
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Paolo Gasparini
- Institute for Maternal and Child Health, Istituto Di Ricovero e Cura a Carattere Scientifico “Burlo Garofolo,” Trieste, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| |
Collapse
|