1
|
Cui L, Yang B, Xiao S, Gao J, Baud A, Graham D, McBride M, Dominiczak A, Schafer S, Aumatell RL, Mont C, Teruel AF, Hübner N, Flint J, Mott R, Huang L. Dominance is common in mammals and is associated with trans-acting gene expression and alternative splicing. Genome Biol 2023; 24:215. [PMID: 37773188 PMCID: PMC10540365 DOI: 10.1186/s13059-023-03060-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/18/2023] [Indexed: 10/01/2023] Open
Abstract
BACKGROUND Dominance and other non-additive genetic effects arise from the interaction between alleles, and historically these phenomena play a major role in quantitative genetics. However, most genome-wide association studies (GWAS) assume alleles act additively. RESULTS We systematically investigate both dominance-here representing any non-additive within-locus interaction-and additivity across 574 physiological and gene expression traits in three mammalian stocks: F2 intercross pigs, rat heterogeneous stock, and mice heterogeneous stock. Dominance accounts for about one quarter of heritable variance across all physiological traits in all species. Hematological and immunological traits exhibit the highest dominance variance, possibly reflecting balancing selection in response to pathogens. Although most quantitative trait loci (QTLs) are detectable as additive QTLs, we identify 154, 64, and 62 novel dominance QTLs in pigs, rats, and mice respectively that are undetectable as additive QTLs. Similarly, even though most cis-acting expression QTLs are additive, gene expression exhibits a large fraction of dominance variance, and trans-acting eQTLs are enriched for dominance. Genes causal for dominance physiological QTLs are less likely to be physically linked to their QTLs but instead act via trans-acting dominance eQTLs. In addition, thousands of eQTLs are associated with alternatively spliced isoforms with complex additive and dominant architectures in heterogeneous stock rats, suggesting a possible mechanism for dominance. CONCLUSIONS Although heritability is predominantly additive, many mammalian genetic effects are dominant and likely arise through distinct mechanisms. It is therefore advantageous to consider both additive and dominance effects in GWAS to improve power and uncover causality.
Collapse
Affiliation(s)
- Leilei Cui
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
- Human Aging Research Institute and School of Life Science, Nanchang University, and Jiangxi Key Laboratory of Human Aging, Jiangxi, China
- School of Life Sciences, Nanchang University, Nanchang, China
| | - Bin Yang
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Shijun Xiao
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Jun Gao
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China
| | - Amelie Baud
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Delyth Graham
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Martin McBride
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Anna Dominiczak
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, G12 8TA, UK
| | - Sebastian Schafer
- Cardiovascular and Metabolic Disorders Program, Duke-National University of Singapore Medical School, Singapore, Singapore
| | - Regina Lopez Aumatell
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Carme Mont
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Albert Fernandez Teruel
- Departamento de Psiquiatría y Medicina Legal, Universitat Autonoma de Barcelona, Barcelona, Spain
| | - Norbert Hübner
- Genetics and Genomics of Cardiovascular Diseases Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- DZHK (German Center for Cardiovascular Research) Partner Site Berlin, Berlin, Germany
- Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Jonathan Flint
- Department of Psychiatry and Behavioral Sciences, Brain Research Institute, University of California, Los Angeles, CA, USA
| | - Richard Mott
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| | - Lusheng Huang
- National Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045, People's Republic of China.
| |
Collapse
|
2
|
Bi W, Fritsche LG, Mukherjee B, Kim S, Lee S. A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. Am J Hum Genet 2020; 107:222-233. [PMID: 32589924 DOI: 10.1016/j.ajhg.2020.06.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/03/2020] [Indexed: 12/09/2022] Open
Abstract
With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives, such as gwasurvivr, 185-511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.
Collapse
|
3
|
Lau A, So HC. Turning genome-wide association study findings into opportunities for drug repositioning. Comput Struct Biotechnol J 2020; 18:1639-1650. [PMID: 32670504 PMCID: PMC7334463 DOI: 10.1016/j.csbj.2020.06.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 06/05/2020] [Accepted: 06/05/2020] [Indexed: 02/02/2023] Open
Abstract
Drug development is a very costly and lengthy process, while repositioned or repurposed drugs could be brought into clinical practice within a shorter time-frame and at a much reduced cost. Numerous computational approaches to drug repositioning have been developed, but methods utilizing genome-wide association studies (GWASs) data are less explored. The past decade has observed a massive growth in the amount of data from GWAS; the rich information contained in GWAS has great potential to guide drug repositioning or discovery. While multiple tools are available for finding the most relevant genes from GWAS hits, searching for top susceptibility genes is only one way to guide repositioning, which has its own limitations. Here we provide a comprehensive review of different computational approaches that employ GWAS data to guide drug repositioning. These methods include selecting top candidate genes from GWAS as drug targets, deducing drug candidates based on drug-drug and disease-disease similarities, searching for reversed expression profiles between drugs and diseases, pathway-based methods as well as approaches based on analysis of biological networks. Each method is illustrated with examples, and their respective strengths and limitations are discussed. We also discussed several areas for future research.
Collapse
Affiliation(s)
- Alexandria Lau
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Hon-Cheong So
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Zoology Institute of Zoology and The Chinese University of Hong Kong, Hong Kong SAR, China
- Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong SAR, China
- Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
- Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
- Hong Kong Branch of the Chinese Academy of Sciences Center for Excellence in Animal Evolution and Genetics, The Chinese University of Hong Kong, Hong Kong SAR, China
- Corresponding author at: School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
4
|
Bi W, Zhao Z, Dey R, Fritsche LG, Mukherjee B, Lee S. A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank. Am J Hum Genet 2019; 105:1182-1192. [PMID: 31735295 PMCID: PMC6904814 DOI: 10.1016/j.ajhg.2019.10.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 10/14/2019] [Indexed: 02/06/2023] Open
Abstract
The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhangchen Zhao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Rounak Dey
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Lars G Fritsche
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|