1
|
Nikolitsa EK, Kontou PI, Bagos PG. metacp: a versatile software package for combining dependent or independent p-values. BMC Bioinformatics 2025; 26:109. [PMID: 40253343 PMCID: PMC12008841 DOI: 10.1186/s12859-025-06126-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2025] [Accepted: 04/01/2025] [Indexed: 04/21/2025] Open
Abstract
BACKGROUND We present metacp an open-source software package which implements an abundance of statistical methods for the combination of both independent p-values, with methods such as Fisher's, Stouffer's and Edgington's, and dependent p-values, with methods such as Brown's method and the Cauchy Combination Test. RESULTS The tool is available in Python and STATA, it is very fast, and it is easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values, responding to diverse analytical needs for practitioners performing meta-analyses and bioinformaticians developing tools for a variety of applications. Depending on the input data it can be used for gene-based testing, for analysis of multiple traits in GWAS, or for combining diverse multi-omics data such as those of a TWAS, a colocalization or an RNA-seq study. CONCLUSIONS Compared to other similar packages (like poolr or metap), metacp implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.
Collapse
Affiliation(s)
- Evgenia K Nikolitsa
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35100, Lamia, Greece
| | | | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35100, Lamia, Greece.
| |
Collapse
|
2
|
Song S, Zhang J. In search of the genetic variants of human sex ratio at birth: was Fisher wrong about sex ratio evolution? Proc Biol Sci 2024; 291:20241876. [PMID: 39406345 PMCID: PMC11479764 DOI: 10.1098/rspb.2024.1876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 10/20/2024] Open
Abstract
The human sex ratio (fraction of males) at birth is close to 0.5 at the population level, an observation commonly explained by Fisher's principle. However, past human studies yielded conflicting results regarding the existence of sex ratio-influencing mutations-a prerequisite to Fisher's principle, raising the question of whether the nearly even population sex ratio is instead dictated by the random X/Y chromosome segregation in male meiosis. Here we show that, because a person's offspring sex ratio (OSR) has an enormous measurement error, a gigantic sample is required to detect OSR-influencing genetic variants. Conducting a UK Biobank-based genome-wide association study that is more powerful than previous studies, we detect an OSR-associated genetic variant, which awaits verification in independent samples. Given the abysmal precision in measuring OSR, it is unsurprising that the estimated heritability of OSR is effectively zero. We further show that OSR's estimated heritability would remain virtually zero even if OSR is as genetically variable as the highly heritable human standing height. These analyses, along with simulations of human sex ratio evolution under selection, demonstrate the compatibility of the observed genetic architecture of human OSR with Fisher's principle and render it plausible that multiple OSR-influencing genetic variants segregate among humans.
Collapse
Affiliation(s)
- Siliang Song
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI48109, USA
| |
Collapse
|
3
|
Svishcheva GR, Belonogova NM, Kirichenko AV, Tsepilov YA, Axenovich TI. A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background. Genes (Basel) 2024; 15:1174. [PMID: 39336765 PMCID: PMC11431718 DOI: 10.3390/genes15091174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/27/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024] Open
Abstract
Gene-based association analysis is a powerful tool for identifying genes that explain trait variability. An essential step of this analysis is a conditional analysis. It aims to eliminate the influence of SNPs outside the gene, which are in linkage disequilibrium with intragenic SNPs. The popular conditional analysis method, GCTA-COJO, accounts for the influence of several top independently associated SNPs outside the gene, correcting the z statistics for intragenic SNPs. We suggest a new TauCOR method for conditional gene-based analysis using summary statistics. This method accounts the influence of the full regional polygenic background, correcting the genotype correlations between intragenic SNPs. As a result, the distribution of z statistics for intragenic SNPs becomes conditionally independent of distribution for extragenic SNPs. TauCOR is compatible with any gene-based association test. TauCOR was tested on summary statistics simulated under different scenarios and on real summary statistics for a 'gold standard' gene list from the Open Targets Genetics project. TauCOR proved to be effective in all modelling scenarios and on real data. The TauCOR's strategy showed comparable sensitivity and higher specificity and accuracy than GCTA-COJO on both simulated and real data. The method can be successfully used to improve the effectiveness of gene-based association analyses.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
- Institute of General Genetics, Russian Academy of Sciences, Gubkin St. 3, 119311 Moscow, Russia
| | - Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| | - Anatoly V Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| | - Yakov A Tsepilov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1RQ, UK
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| |
Collapse
|
4
|
Zhu L, Zhang S, Sha Q. Meta-analysis of set-based multiple phenotype association test based on GWAS summary statistics from different cohorts. Front Genet 2024; 15:1359591. [PMID: 39301532 PMCID: PMC11410627 DOI: 10.3389/fgene.2024.1359591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 08/23/2024] [Indexed: 09/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have emerged as popular tools for identifying genetic variants that are associated with complex diseases. Standard analysis of a GWAS involves assessing the association between each variant and a disease. However, this approach suffers from limited reproducibility and difficulties in detecting multi-variant and pleiotropic effects. Although joint analysis of multiple phenotypes for GWAS can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits, most of the multiple phenotype association tests are designed for a single variant, resulting in much lower power, especially when their effect sizes are small and only their cumulative effect is associated with multiple phenotypes. To overcome these limitations, set-based multiple phenotype association tests have been developed to enhance statistical power and facilitate the identification and interpretation of pleiotropic regions. In this research, we propose a new method, named Meta-TOW-S, which conducts joint association tests between multiple phenotypes and a set of variants (such as variants in a gene) utilizing GWAS summary statistics from different cohorts. Our approach applies the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple phenotypes while accommodating overlapping samples from different GWAS cohorts. To assess the performance of Meta-TOW-S, we develop a phenotype simulator package that encompasses a comprehensive simulation scheme capable of modeling multiple phenotypes and multiple variants, including noise structures and diverse correlation patterns among phenotypes. Simulation studies validate that Meta-TOW-S maintains a desirable Type I error rate. Further simulation under different scenarios shows that Meta-TOW-S can improve power compared with other existing meta-analysis methods. When applied to four psychiatric disorders summary data, Meta-TOW-S detects a greater number of significant genes.
Collapse
Affiliation(s)
- Lirong Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
5
|
Cañadas-Garre M, Maqueda JJ, Baños-Jaime B, Hill C, Skelly R, Cappa R, Brennan E, Doyle R, Godson C, Maxwell AP, McKnight AJ. Mitochondrial related variants associated with cardiovascular traits. Front Physiol 2024; 15:1395371. [PMID: 39258111 PMCID: PMC11385366 DOI: 10.3389/fphys.2024.1395371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 08/05/2024] [Indexed: 09/12/2024] Open
Abstract
Introduction Cardiovascular disease (CVD) is responsible for over 30% of mortality worldwide. CVD arises from the complex influence of molecular, clinical, social, and environmental factors. Despite the growing number of autosomal genetic variants contributing to CVD, the cause of most CVDs is still unclear. Mitochondria are crucial in the pathophysiology, development and progression of CVDs; the impact of mitochondrial DNA (mtDNA) variants and mitochondrial haplogroups in the context of CVD has recently been highlighted. Aims We investigated the role of genetic variants in both mtDNA and nuclear-encoded mitochondrial genes (NEMG) in CVD, including coronary artery disease (CAD), hypertension, and serum lipids in the UK Biobank, with sub-group analysis for diabetes. Methods We investigated 371,542 variants in 2,527 NEMG, along with 192 variants in 32 mitochondrial genes in 381,994 participants of the UK Biobank, stratifying by presence of diabetes. Results Mitochondrial variants showed associations with CVD, hypertension, and serum lipids. Mitochondrial haplogroup J was associated with CAD and serum lipids, whereas mitochondrial haplogroups T and U were associated with CVD. Among NEMG, variants within Nitric Oxide Synthase 3 (NOS3) showed associations with CVD, CAD, hypertension, as well as diastolic and systolic blood pressure. We also identified Translocase Of Outer Mitochondrial Membrane 40 (TOMM40) variants associated with CAD; Solute carrier family 22 member 2 (SLC22A2) variants associated with CAD and CVD; and HLA-DQA1 variants associated with hypertension. Variants within these three genes were also associated with serum lipids. Conclusion Our study demonstrates the relevance of mitochondrial related variants in the context of CVD. We have linked mitochondrial haplogroup U to CVD, confirmed association of mitochondrial haplogroups J and T with CVD and proposed new markers of hypertension and serum lipids in the context of diabetes. We have also evidenced connections between the etiological pathways underlying CVDs, blood pressure and serum lipids, placing NOS3, SLC22A2, TOMM40 and HLA-DQA1 genes as common nexuses.
Collapse
Affiliation(s)
- Marisa Cañadas-Garre
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
- MRC Integrative Epidemiology Unit, Bristol Medical School (Population Health Sciences), University of Bristol Oakfield House, Belfast, United Kingdom
| | - Joaquín J Maqueda
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
- Laboratory of Experimental Oncology, IRCCS Istituto Ortopedico Rizzoli, Bologna, Italy
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Italy
| | - Blanca Baños-Jaime
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
- Instituto de Investigaciones Químicas (IIQ), Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja), Universidad de Sevilla, Consejo Superior de Investigaciones Científicas (CSIC), Sevilla, Spain
| | - Claire Hill
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
| | - Ryan Skelly
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
| | - Ruaidhri Cappa
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
| | - Eoin Brennan
- UCD Diabetes Complications Research Centre, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| | - Ross Doyle
- UCD Diabetes Complications Research Centre, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
- Mater Misericordiae University Hospital, Dublin, Ireland
| | - Catherine Godson
- UCD Diabetes Complications Research Centre, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| | - Alexander P Maxwell
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
- Regional Nephrology Unit, Belfast City Hospital Belfast, Belfast, United Kingdom
| | - Amy Jayne McKnight
- Molecular Epidemiology and Public Health Research Group, Centre for Public Health, Queen's University Belfast, Institute for Clinical Sciences A, Royal Victoria Hospital, Belfast, United Kingdom
| |
Collapse
|
6
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 141 risk genes for Alzheimer's disease dementia. Alzheimers Res Ther 2024; 16:120. [PMID: 38824563 PMCID: PMC11144322 DOI: 10.1186/s13195-024-01488-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 05/27/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND Transcriptome-wide association study (TWAS) is an influential tool for identifying genes associated with complex diseases whose genetic effects are likely mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate effect sizes of genetic variants on gene expression (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are employed as variant weights in gene-based association tests, facilitating the mapping of risk genes with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia are limited to studying only cis-eQTL proximal to the test gene. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method to leveraging both cis- and trans- eQTL of brain and blood tissues, in order to enhance mapping risk genes for AD dementia. METHODS We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis- and trans- eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per gene per tissue type. Then we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. RESULTS We identified 85 significant genes in prefrontal cortex, 82 in cortex, and 76 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 141 significant risk genes including 34 genes primarily due to trans-eQTL and 35 mapped risk genes in GWAS Catalog. With these 141 significant risk genes, we detected functional clusters comprised of both known mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes by protein-protein interaction network analysis, as well as several enriched phenotypes related to AD. CONCLUSION We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis- and trans- eQTL data of brain and blood tissues with GWAS summary data, identifying 141 TWAS risk genes of AD dementia. These identified risk genes provide novel insights into the underlying biological mechanisms of AD dementia and potential gene targets for therapeutics development.
Collapse
Affiliation(s)
- Shuyi Guo
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
7
|
Sadeghi-Alavijeh O, Chan MMY, Moochhala SH, Howles S, Gale DP, Böckenhauer D. Rare variants in the sodium-dependent phosphate transporter gene SLC34A3 explain missing heritability of urinary stone disease. Kidney Int 2023; 104:975-984. [PMID: 37414395 DOI: 10.1016/j.kint.2023.06.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 05/10/2023] [Accepted: 06/15/2023] [Indexed: 07/08/2023]
Abstract
Urinary stone disease (USD) is a major health burden affecting over 10% of the United Kingdom population. While stone disease is associated with lifestyle, genetic factors also strongly contribute. Common genetic variants at multiple loci from genome-wide association studies account for 5% of the estimated 45% heritability of the disorder. Here, we investigated the extent to which rare genetic variation contributes to the unexplained heritability of USD. Among participants of the United Kingdom 100,000-genome project, 374 unrelated individuals were identified and assigned diagnostic codes indicative of USD. Whole genome gene-based rare variant testing and polygenic risk scoring against a control population of 24,930 ancestry-matched controls was performed. We observed (and replicated in an independent dataset) exome-wide significant enrichment of monoallelic rare, predicted damaging variants in the SLC34A3 gene for a sodium-dependent phosphate transporter that were present in 5% cases compared with 1.6% of controls. This gene was previously associated with autosomal recessive disease. The effect on USD risk of having a qualifying SLC34A3 variant was greater than that of a standard deviation increase in polygenic risk derived from GWAS. Addition of the rare qualifying variants in SLC34A3 to a linear model including polygenic score increased the liability-adjusted heritability from 5.1% to 14.2% in the discovery cohort. We conclude that rare variants in SLC34A3 represent an important genetic risk factor for USD, with effect size intermediate between the fully penetrant rare variants linked with Mendelian disorders and common variants associated with USD. Thus, our findings explain some of the heritability unexplained by prior common variant genome-wide association studies.
Collapse
Affiliation(s)
| | - Melanie M Y Chan
- Department of Renal Medicine, University College London, London, UK
| | | | - Sarah Howles
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | - Daniel P Gale
- Department of Renal Medicine, University College London, London, UK.
| | | |
Collapse
|
8
|
Zorkoltseva IV, Elgaeva EE, Belonogova NM, Kirichenko AV, Svishcheva GR, Freidin MB, Williams FMK, Suri P, Tsepilov YA, Axenovich TI. Multi-Trait Exome-Wide Association Study of Back Pain-Related Phenotypes. Genes (Basel) 2023; 14:1962. [PMID: 37895311 PMCID: PMC10606006 DOI: 10.3390/genes14101962] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023] Open
Abstract
Back pain (BP) is a major contributor to disability worldwide, with heritability estimated at 40-60%. However, less than half of the heritability is explained by common genetic variants identified by genome-wide association studies. More powerful methods and rare and ultra-rare variant analysis may offer additional insight. This study utilized exome sequencing data from the UK Biobank to perform a multi-trait gene-based association analysis of three BP-related phenotypes: chronic back pain, dorsalgia, and intervertebral disc disorder. We identified the SLC13A1 gene as a contributor to chronic back pain via loss-of-function (LoF) and missense variants. This gene has been previously detected in two studies. A multi-trait approach uncovered the novel FSCN3 gene and its impact on back pain through LoF variants. This gene deserves attention because it is only the second gene shown to have an effect on back pain due to LoF variants and represents a promising drug target for back pain therapy.
Collapse
Affiliation(s)
- Irina V. Zorkoltseva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
| | - Elizaveta E. Elgaeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
| | - Anatoliy V. Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
| | - Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119333 Moscow, Russia
| | - Maxim B. Freidin
- Department of Biology, School of Biological and Behavioural Sciences, Queen Mary University of London, London EC1M 6BQ, UK;
| | - Frances M. K. Williams
- Department of Twin Research and Genetic Epidemiology, King’s College London, London SE1 7EH, UK;
| | - Pradeep Suri
- Seattle Epidemiologic Research and Information Center, VA Puget Sound Health Care System, Seattle, WA 98108, USA
- Division of Rehabilitation Care Services, Seattle, WA 98208, USA
- Clinical Learning, Evidence, and Research Center, University of Washington, Seattle, WA 98195, USA
- Department of Rehabilitation Medicine, University of Washington, Seattle, WA 98195, USA
| | - Yakov A. Tsepilov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia; (I.V.Z.); (E.E.E.); (N.M.B.); (A.V.K.); (G.R.S.); (Y.A.T.)
| |
Collapse
|
9
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 93 risk genes for Alzheimer's disease dementia. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.06.23292336. [PMID: 37503151 PMCID: PMC10370241 DOI: 10.1101/2023.07.06.23292336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Transcriptome-wide association study (TWAS) is an influential tool for identifying novel genes associated with complex diseases, where their genetic effects may be mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate genetic effect sizes on expression quantitative traits of target genes (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are then employed as variant weights in burden gene-based association test statistics, facilitating the mapping of risk genes for complex diseases with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia have primarily focused on cis -eQTL, disregarding potential trans -eQTL. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method which incorporated both cis - and trans -eQTL of brain and blood tissues to enhance mapping risk genes for AD dementia. Methods We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis - and trans -eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Subsequently, estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per tissue type. Finally, we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. Results We identified 37 genes in prefrontal cortex, 55 in cortex, and 51 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 93 significant risk genes including 29 genes primarily due to trans -eQTL and 50 novel genes. Utilizing protein-protein interaction network and phenotype enrichment analyses with these 93 significant risk genes, we detected 5 functional clusters comprised of both known and novel AD risk genes and 7 enriched phenotypes. Conclusion We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis - and trans -eQTL data of brain and blood tissues with GWAS summary data to identify risk genes of AD dementia. The risk genes we identified provide novel insights into the underlying biological pathways implicated in AD dementia.
Collapse
|
10
|
Belonogova NM, Kirichenko AV, Freidin MB, Williams FMK, Suri P, Aulchenko YS, Axenovich TI, Tsepilov YA. Noncoding rare variants in PANX3 are associated with chronic back pain. Pain 2023; 164:864-869. [PMID: 36448979 PMCID: PMC10014492 DOI: 10.1097/j.pain.0000000000002781] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/31/2022] [Indexed: 12/05/2022]
Abstract
ABSTRACT Back pain is the leading cause of years lived with disability worldwide, yet surprisingly, little is known regarding the biology underlying this condition. The impact of genetics is known for chronic back pain: its heritability is estimated to be at least 40%. Large genome-wide association studies have shown that common variation may account for up to 35% of chronic back pain heritability; rare variants may explain a portion of the heritability not explained by common variants. In this study, we performed the first gene-based association analysis of chronic back pain using UK Biobank imputed data including rare variants with moderate imputation quality. We discovered 2 genes, SOX5 and PANX3 , influencing chronic back pain. The SOX5 gene is a well-known back pain gene. The PANX3 gene has not previously been described as having a role in chronic back pain. We showed that the association of PANX3 with chronic back pain is driven by rare noncoding intronic polymorphisms. This result was replicated in an independent sample from UK Biobank and validated using a similar phenotype, dorsalgia, from FinnGen Biobank. We also found that the PANX3 gene is associated with intervertebral disk disorders. We can speculate that a possible mechanism of action of PANX3 on back pain is due to its effect on the intervertebral disks.
Collapse
Affiliation(s)
- Nadezhda M. Belonogova
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia
| | - Anatoly V. Kirichenko
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia
- Kurchatov genomics center of the Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - Maxim B. Freidin
- Department of Twin Research and Genetic Epidemiology, King’s College London, London SE1 7EH, UK
| | - Frances M. K. Williams
- Department of Twin Research and Genetic Epidemiology, King’s College London, London SE1 7EH, UK
| | - Pradeep Suri
- Seattle Epidemiologic Research and Information Center, VA Puget Sound Health Care System, 1660 S. Columbian Way, Seattle, WA 98108, USA
- Division of Rehabilitation Care Services, 1660 S. Columbian Way, Seattle, WA 98108, USA
- Clinical Learning, Evidence, and Research Center, University of Washington, 325 Ninth Avenue, Box 359612 Seattle, WA 98104, USA
| | - Yurii S. Aulchenko
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia
- PolyOmica, Het Vlaggeschip 61, 5237 PA ‘s-Hertogenbosch, the Netherlands
| | - Tatiana I. Axenovich
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia
| | - Yakov A. Tsepilov
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, 10 Lavrentiev Avenue, Novosibirsk, 630090, Russia
| |
Collapse
|
11
|
Zigarelli AM, Venera HM, Receveur BA, Wolf JM, Westra J, Tintle NL. Multimarker omnibus tests by leveraging individual marker summary statistics from large biobanks. Ann Hum Genet 2023; 87:125-136. [PMID: 36683423 DOI: 10.1111/ahg.12495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 12/24/2022] [Accepted: 01/04/2023] [Indexed: 01/24/2023]
Abstract
As biobanks become increasingly popular, access to genotypic and phenotypic data continues to increase in the form of precomputed summary statistics (PCSS). Widespread accessibility of PCSS alleviates many issues related to biobank data, including that of data privacy and confidentiality, as well as high computational costs. However, questions remain about how to maximally leverage PCSS for downstream statistical analyses. Here we present a novel method for testing the association of an arbitrary number of single nucleotide variants (SNVs) on a linear combination of phenotypes after adjusting for covariates for common multimarker tests (e.g., SKAT, SKAT-O) without access to individual patient-level data (IPD). We validate exact formulas for each method, and demonstrate their accuracy through simulation studies and an application to fatty acid phenotypic data from the Framingham Heart Study.
Collapse
Affiliation(s)
- Angela M Zigarelli
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Massachusetts, USA
| | - Hanna M Venera
- Division of Biostatistics, University of Michigan, Michigan, USA
| | - Brody A Receveur
- Department of Statistics, George Mason University, Virginia, USA
| | - Jack M Wolf
- Division of Biostatistics, University of Minnesota, Minnesota, USA
| | - Jason Westra
- Department of Math, Computer Science, and Statistics, Dordt University, Iowa, USA
| | - Nathan L Tintle
- Department of Population Health Nursing Sciences, University of Illinois Chicago, Chicago, Illinois, USA
| |
Collapse
|
12
|
Berrandou TE, Balding D, Speed D. LDAK-GBAT: Fast and powerful gene-based association testing using summary statistics. Am J Hum Genet 2023; 110:23-29. [PMID: 36480927 PMCID: PMC9892699 DOI: 10.1016/j.ajhg.2022.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
We present LDAK-GBAT, a tool for gene-based association testing using summary statistics from genome-wide association studies that is computationally efficient, produces well-calibrated p values, and is significantly more powerful than existing tools. LDAK-GBAT takes approximately 30 min to analyze imputed data (2.9M common, genic SNPs), requiring less than 10 Gb memory. It shows good control of type 1 error given an appropriate reference panel. Across 109 phenotypes (82 from the UK Biobank, 18 from the Million Veteran Program, and nine from the Psychiatric Genetics Consortium), LDAK-GBAT finds on average 19% (SE: 1%) more significant genes than the existing tool sumFREGAT-ACAT, with even greater gains in comparison with MAGMA, GCTA-fastBAT, sumFREGAT-SKAT-O, and sumFREGAT-PCA.
Collapse
Affiliation(s)
- Takiy-Eddine Berrandou
- Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark,Corresponding author
| | - David Balding
- Melbourne Integrative Genomics, Melbourne University, Melbourne, VIC, Australia
| | - Doug Speed
- Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark,Corresponding author
| |
Collapse
|
13
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
14
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
15
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
16
|
Belonogova NM, Svishcheva GR, Kirichenko AV, Zorkoltseva IV, Tsepilov YA, Axenovich TI. sumSTAAR: A flexible framework for gene-based association studies using GWAS summary statistics. PLoS Comput Biol 2022; 18:e1010172. [PMID: 35653402 PMCID: PMC9197066 DOI: 10.1371/journal.pcbi.1010172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 06/14/2022] [Accepted: 05/05/2022] [Indexed: 11/19/2022] Open
Abstract
Gene-based association analysis is an effective gene-mapping tool. Many gene-based methods have been proposed recently. However, their power depends on the underlying genetic architecture, which is rarely known in complex traits, and so it is likely that a combination of such methods could serve as a universal approach. Several frameworks combining different gene-based methods have been developed. However, they all imply a fixed set of methods, weights and functional annotations. Moreover, most of them use individual phenotypes and genotypes as input data. Here, we introduce sumSTAAR, a framework for gene-based association analysis using summary statistics obtained from genome-wide association studies (GWAS). It is an extended and modified version of STAAR framework proposed by Li and colleagues in 2020. The sumSTAAR framework offers a wider range of gene-based methods to combine. It allows the user to arbitrarily define a set of these methods, weighting functions and probabilities of genetic variants being causal. The methods used in the framework were adapted to analyse genes with large number of SNPs to decrease the running time. The framework includes the polygene pruning procedure to guard against the influence of the strong GWAS signals outside the gene. We also present new improved matrices of correlations between the genotypes of variants within genes. These matrices estimated on a sample of 265,000 individuals are a state-of-the-art replacement of widely used matrices based on the 1000 Genomes Project data.
Collapse
Affiliation(s)
- Nadezhda M. Belonogova
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Gulnara R. Svishcheva
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Laboratory of Animal Genetics, Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - Anatoly V. Kirichenko
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Irina V. Zorkoltseva
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Yakov A. Tsepilov
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Laboratory of Segregation and Recombination Analyses, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
17
|
Legault MA, Perreault LPL, Tardif JC, Dubé MP. ExPheWas: a platform for cis-Mendelian randomization and gene-based association scans. Nucleic Acids Res 2022; 50:W305-W311. [PMID: 35474380 PMCID: PMC9252780 DOI: 10.1093/nar/gkac289] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 03/31/2022] [Accepted: 04/13/2022] [Indexed: 11/17/2022] Open
Abstract
Establishing the relationship between protein-coding genes and phenotypes has the potential to inform on the molecular etiology of diseases. Here, we describe ExPheWas (exphewas.ca), a gene-based phenome-wide association study browser and platform that enables the conduct of gene-based Mendelian randomization. The ExPheWas data repository includes sex-stratified and sex-combined gene-based association results from 26 616 genes with 1746 phenotypes measured in up to 413 133 individuals from the UK Biobank. Interactive visualizations are provided through a browser to facilitate data exploration supported by false discovery rate control, and it includes tools for enrichment analysis. The interactive Mendelian randomization module in ExPheWas allows the estimation of causal effects of a genetically predicted exposure on an outcome by using genetic variation in a single gene as the instrumental variable.
Collapse
Affiliation(s)
- Marc-André Legault
- Montreal Heart Institute, Montreal, QC H1T 1C8, Canada.,Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal, QC H1T 1C8, Canada.,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
| | - Louis-Philippe Lemieux Perreault
- Montreal Heart Institute, Montreal, QC H1T 1C8, Canada.,Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal, QC H1T 1C8, Canada
| | - Jean-Claude Tardif
- Montreal Heart Institute, Montreal, QC H1T 1C8, Canada.,Department of Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
| | - Marie-Pierre Dubé
- Montreal Heart Institute, Montreal, QC H1T 1C8, Canada.,Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal, QC H1T 1C8, Canada.,Department of Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
| |
Collapse
|
18
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
19
|
An adaptive combination method for Cauchy variable based on optimal threshold. J Genet 2022. [DOI: 10.1007/s12041-021-01351-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
20
|
Zorkoltseva I, Shadrina A, Belonogova N, Kirichenko A, Tsepilov Y, Axenovich T. In silico genome-wide gene-based association analysis reveals new genes predisposing to coronary artery disease. Clin Genet 2021; 101:78-86. [PMID: 34687547 DOI: 10.1111/cge.14073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 09/29/2021] [Accepted: 10/13/2021] [Indexed: 11/30/2022]
Abstract
Genome-wide association study (GWAS) have identified more than 300 single nucleotide polymorphisms at 163 independent loci associated with coronary artery disease (CAD). However, there is no full understanding about the causal genes for CAD and the mechanisms of their action. We aimed to perform a post GWAS analysis to identify genes whose polymorphism may influence the risk of CAD. Using the UK Biobank GWAS summary statistics, we performed a gene-based association analysis. We found 63 genes significantly associated with CAD due to their within-gene polymorphisms. Many of these genes are well known. Some known CAD genes such as FURIN and SORT1 did not show the gene-based association because their variants had low GWAS signals or gene-based association was inflated by the strong GWAS signal outside the gene. For several known CAD genes, we demonstrated that their effects could be explained not only or not at all by their own variants but by the variants within the neighboring genes controlling their expression. Using several bioinformatics techniques, we suggested potential mechanisms underlying gene-CAD associations. Three genes, CDK19, NCALD, and ARHGEF12 were not previously associated with CAD. The role of these genes should be clarified in further studies.
Collapse
Affiliation(s)
- Irina Zorkoltseva
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Alexandra Shadrina
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Nadezhda Belonogova
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Anatoly Kirichenko
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Yakov Tsepilov
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Tatiana Axenovich
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
21
|
Lyra DH, Griffiths CA, Watson A, Joynson R, Molero G, Igna AA, Hassani-Pak K, Reynolds MP, Hall A, Paul MJ. Gene-based mapping of trehalose biosynthetic pathway genes reveals association with source- and sink-related yield traits in a spring wheat panel. Food Energy Secur 2021; 10:e292. [PMID: 34594548 PMCID: PMC8459250 DOI: 10.1002/fes3.292] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/12/2021] [Accepted: 04/12/2021] [Indexed: 12/11/2022] Open
Abstract
Trehalose 6‐phosphate (T6P) signalling regulates carbon use and allocation and is a target to improve crop yields. However, the specific contributions of trehalose phosphate synthase (TPS) and trehalose phosphate phosphatase (TPP) genes to source‐ and sink‐related traits remain largely unknown. We used enrichment capture sequencing on TPS and TPP genes to estimate and partition the genetic variation of yield‐related traits in a spring wheat (Triticum aestivum) breeding panel specifically built to capture the diversity across the 75,000 CIMMYT wheat cultivar collection. Twelve phenotypes were correlated to variation in TPS and TPP genes including plant height and biomass (source), spikelets per spike, spike growth and grain filling traits (sink) which showed indications of both positive and negative gene selection. Individual genes explained proportions of heritability for biomass and grain‐related traits. Three TPS1 homologues were particularly significant for trait variation. Epistatic interactions were found within and between the TPS and TPP gene families for both plant height and grain‐related traits. Gene‐based prediction improved predictive ability for grain weight when gene effects were combined with the whole‐genome markers. Our study has generated a wealth of information on natural variation of TPS and TPP genes related to yield potential which confirms the role for T6P in resource allocation and in affecting traits such as grain number and size confirming other studies which now opens up the possibility of harnessing natural genetic variation more widely to better understand the contribution of native genes to yield traits for incorporation into breeding programmes.
Collapse
Affiliation(s)
- Danilo H Lyra
- Computational & Analytical Sciences Rothamsted Research Harpenden UK
| | | | - Amy Watson
- Plant Sciences Rothamsted Research Harpenden UK
| | | | - Gemma Molero
- Global Wheat Program, International Maize and Wheat Improvement Centre (CIMMYT) Texcoco Mexico
| | | | | | - Matthew P Reynolds
- Global Wheat Program, International Maize and Wheat Improvement Centre (CIMMYT) Texcoco Mexico
| | | | | |
Collapse
|
22
|
Mapping gene and gene pathways associated with coronary artery disease: a CARDIoGRAM exome and multi-ancestry UK biobank analysis. Sci Rep 2021; 11:16461. [PMID: 34385509 PMCID: PMC8361107 DOI: 10.1038/s41598-021-95637-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 07/28/2021] [Indexed: 02/07/2023] Open
Abstract
Coronary artery disease (CAD) genome-wide association studies typically focus on single nucleotide variants (SNVs), and many potentially associated SNVs fail to reach the GWAS significance threshold. We performed gene and pathway-based association (GBA) tests on publicly available Coronary ARtery DIsease Genome wide Replication and Meta-analysis consortium Exome (n = 120,575) and multi ancestry pan UK Biobank study (n = 442,574) summary data using versatile gene-based association study (VEGAS2) and Multi-marker analysis of genomic annotation (MAGMA) to identify novel genes and pathways associated with CAD. We included only exonic SNVs and excluded regulatory regions. VEGAS2 and MAGMA ranked genes and pathways based on aggregated SNV test statistics. We used Bonferroni corrected gene and pathway significance threshold at 3.0 × 10-6 and 1.0 × 10-5, respectively. We also report the top one percent of ranked genes and pathways. We identified 17 top enriched genes with four genes (PCSK9, FAM177, LPL, ARGEF26), reaching statistical significance (p ≤ 3.0 × 10-6) using both GBA tests in two GWAS studies. In addition, our analyses identified ten genes (DUSP13, KCNJ11, CD300LF/RAB37, SLCO1B1, LRRFIP1, QSER1, UBR2, MOB3C, MST1R, and ABCC8) with previously unreported associations with CAD, although none of the single SNV associations within the genes were genome-wide significant. Among the top 1% non-lipid pathways, we detected pathways regulating coagulation, inflammation, neuronal aging, and wound healing.
Collapse
|
23
|
Axenovich TI, Belonogova NM, Zorkoltseva IV, Tsepilov YA. Number of Genes Associated with Neuroticism due to Their Polymorphisms. RUSS J GENET+ 2021. [DOI: 10.1134/s1022795421070024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
Tang Y, Zhou Y, Chen L, Bao Y, Zhang R. A Powerful Adaptive Cauchy-Variable Combination Method for Rare-Variant Association Analysis. RUSS J GENET+ 2021. [DOI: 10.1134/s1022795421020125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
25
|
Belonogova NM, Zorkoltseva IV, Tsepilov YA, Axenovich TI. Gene-based association analysis identifies 190 genes affecting neuroticism. Sci Rep 2021; 11:2484. [PMID: 33510330 PMCID: PMC7844228 DOI: 10.1038/s41598-021-82123-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 01/15/2021] [Indexed: 11/25/2022] Open
Abstract
Neuroticism is a personality trait, which is an important risk factor for psychiatric disorders. Recent genome-wide studies reported about 600 genes potentially influencing neuroticism. Little is known about the mechanisms of their action. Here, we aimed to conduct a more detailed analysis of genes that can regulate the level of neuroticism. Using UK Biobank-based GWAS summary statistics, we performed a gene-based association analysis using four sets of within-gene variants, each set possessing specific protein-coding properties. To guard against the influence of strong GWAS signals outside the gene, we used a specially designed procedure called “polygene pruning”. As a result, we identified 190 genes associated with neuroticism due to the effect of within-gene variants rather than strong GWAS signals outside the gene. Thirty eight of these genes are new. Within all genes identified, we distinguished two slightly overlapping groups obtained from using protein-coding and non-coding variants. Many genes in the former group included potentially pathogenic variants. For some genes in the latter group, we found evidence of pleiotropy with gene expression. Using a bioinformatics analysis, we prioritized the neuroticism genes and showed that the genes that contribute to neuroticism through their within-gene variants are the most appropriate candidate genes.
Collapse
Affiliation(s)
- Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Irina V Zorkoltseva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Yakov A Tsepilov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia. .,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia.
| |
Collapse
|
26
|
Lai YP, Ioerger TR. Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes. Evol Bioinform Online 2020; 16:1176934320944932. [PMID: 32782426 PMCID: PMC7385850 DOI: 10.1177/1176934320944932] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 06/30/2020] [Indexed: 12/23/2022] Open
Abstract
Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.
Collapse
Affiliation(s)
- Yi-Pin Lai
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| | - Thomas R Ioerger
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
27
|
Shadrina AS, Shashkova TI, Torgasheva AA, Sharapov SZ, Klarić L, Pakhomov ED, Alexeev DG, Wilson JF, Tsepilov YA, Joshi PK, Aulchenko YS. Prioritization of causal genes for coronary artery disease based on cumulative evidence from experimental and in silico studies. Sci Rep 2020; 10:10486. [PMID: 32591598 PMCID: PMC7320185 DOI: 10.1038/s41598-020-67001-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 06/01/2020] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies have led to a significant progress in identification of genomic loci affecting coronary artery disease (CAD) risk. However, revealing the causal genes responsible for the observed associations is challenging. In the present study, we aimed to prioritize CAD-relevant genes based on cumulative evidence from the published studies and our own study of colocalization between eQTLs and loci associated with CAD using SMR/HEIDI approach. Prior knowledge of candidate genes was extracted from both experimental and in silico studies, employing different prioritization algorithms. Our review systematized information for a total of 51 CAD-associated loci. We pinpointed 37 genes in 36 loci. For 27 genes we infer they are causal for CAD, and for 10 further genes we judge them most likely causal. Colocalization analysis showed that for 18 out of these loci, association with CAD can be explained by changes in gene expression in one or more CAD-relevant tissues. Furthermore, for 8 out of 36 loci, existing evidence suggested additional CAD-associated genes. For the remaining 15 loci, we concluded that evidence for gene prioritization remains inconsistent, insufficient, or absent. Our results provide deeper insights into the genetic etiology of CAD and demonstrate knowledge gaps where further research is warranted.
Collapse
Affiliation(s)
- Alexandra S Shadrina
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia. .,Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia.
| | - Tatiana I Shashkova
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia.,Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Moscow, 117303, Russia.,Research and Training Center on Bioinformatics, A.A. Kharkevich Institute for Information Transmission Problems, Moscow, 127051, Russia
| | - Anna A Torgasheva
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia.,Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia
| | - Sodbo Z Sharapov
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia.,Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia
| | - Lucija Klarić
- Genos Glycoscience Research Laboratory, Zagreb, Croatia.,MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, Scotland, UK
| | - Eugene D Pakhomov
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia
| | - Dmitry G Alexeev
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia
| | - James F Wilson
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, Scotland, UK.,Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, Scotland, UK
| | - Yakov A Tsepilov
- Laboratory of Theoretical and Applied Functional Genomics, Novosibirsk State University, Novosibirsk, 630090, Russia.,Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia
| | - Peter K Joshi
- Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, Scotland, UK
| | - Yurii S Aulchenko
- Laboratory of Recombination and Segregation Analysis, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia. .,PolyOmica, 's-Hertogenbosch, 5237 PA, The Netherlands.
| |
Collapse
|
28
|
Zorkoltseva IV, Belonogova NM, Svishcheva GR, Kirichenko AV, Axenovich TI. <i>In silico</i> mapping of coronary artery disease genes. Vavilovskii Zhurnal Genet Selektsii 2020. [DOI: 10.18699/vj19.585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
To date, more than 100 loci associated with coronary artery disease (CAD) have been detected in large-scale genome-wide studies. For some of the several hundreds of genes located in these loci, roles in the pathogenesis of the disease have been shown. However, the genetic mechanisms and specific genes controlling this disease are still not fully understood. This study is aimed at in silico search for new CAD genes. We performed a gene-based association analysis, where all polymorphic variants within a gene are analyzed simultaneously. The analysis was based on the results of the genome-wide association studies (GWAS) available from the open databases MICAD (120,575 people, 85,112 markers) and UK Biobank (337,199 people, 10,894,597 markers). We used the sumFREGAT package implementing a wide range of new methods for gene-based association analysis using summary statistics. We found 88 genes demonstrating significant gene-based associations. Forty-four of the identified genes were already known as CAD genes. Furthermore, we identified 28 additional genes in the known CAD loci. They can be considered as new candidate genes. Finally, we identified sixteen new genes (AGPAT4, ARHGEF12, BDP1, DHX58, EHBP1, FBF1, HSPB9, NPBWR2, PDLIM5, PLCB3, PLEKHM2, POU2F3, PRKD2, TMEM136, TTC29 and UTP20) outside the known loci. Information about the functional role of these genes allows us to consider many of them as candidates for CAD. The 41 identified genes did not have significant GWAS signals and they were identified only due to simultaneous consideration of all variants within the gene in the framework of gene-based analysis. These results demonstrate that gene-based association analysis is a powerful tool for gene mapping. The method can utilize huge amounts of GWAS results accumulated in the world to map different traits and diseases. This type of studies is widely available, as it does not require additional material costs.
Collapse
Affiliation(s)
| | | | - G. R. Svishcheva
- Institute of Cytology and Genetics, SB RAS; Vavilov Institute of General Genetics, RAS
| | | | - T. I. Axenovich
- Institute of Cytology and Genetics, SB RAS; Novosibirsk State University
| |
Collapse
|
29
|
Beck T, Shorter T, Brookes AJ. GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies. Nucleic Acids Res 2020; 48:D933-D940. [PMID: 31612961 PMCID: PMC7145571 DOI: 10.1093/nar/gkz895] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 09/30/2019] [Accepted: 10/02/2019] [Indexed: 12/31/2022] Open
Abstract
The GWAS Central resource provides a toolkit for integrative access and visualization of a uniquely extensive collection of genome-wide association study data, while ensuring safe open access to prevent research participant identification. GWAS Central is the world's most comprehensive openly accessible repository of summary-level GWAS association information, providing over 70 million P-values for over 3800 studies investigating over 1400 unique phenotypes. The database content comprises direct submissions received from GWAS authors and consortia, in addition to actively gathered data sets from various public sources. GWAS data are discoverable from the perspective of genetic markers, genes, genome regions or phenotypes, via graphical visualizations and detailed downloadable data reports. Tested genetic markers and relevant genomic features can be visually interrogated across up to sixteen multiple association data sets in a single view using the integrated genome browser. The semantic standardization of phenotype descriptions with Medical Subject Headings and the Human Phenotype Ontology allows the precise identification of genetic variants associated with diseases, phenotypes and traits of interest. Harmonization of the phenotype descriptions used across several GWAS-related resources has extended the phenotype search capabilities to enable cross-database study discovery using a range of ontologies. GWAS Central is updated regularly and available at https://www.gwascentral.org.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| | - Tom Shorter
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
30
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|