1
|
Kim A, Zhang Z, Legros C, Lu Z, de Smith A, Moore JE, Mancuso N, Gazal S. Inferring causal cell types of human diseases and risk variants from candidate regulatory elements. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.17.24307556. [PMID: 38798383 PMCID: PMC11118635 DOI: 10.1101/2024.05.17.24307556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The heritability of human diseases is extremely enriched in candidate regulatory elements (cRE) from disease-relevant cell types. Critical next steps are to infer which and how many cell types are truly causal for a disease (after accounting for co-regulation across cell types), and to understand how individual variants impact disease risk through single or multiple causal cell types. Here, we propose CT-FM and CT-FM-SNP, two methods that leverage cell-type-specific cREs to fine-map causal cell types for a trait and for its candidate causal variants, respectively. We applied CT-FM to 63 GWAS summary statistics (average N = 417K) using nearly one thousand cRE annotations, primarily coming from ENCODE4. CT-FM inferred 81 causal cell types with corresponding SNP-annotations explaining a high fraction of trait SNP-heritability (~2/3 of the SNP-heritability explained by existing cREs), identified 16 traits with multiple causal cell types, highlighted cell-disease relationships consistent with known biology, and uncovered previously unexplored cellular mechanisms in psychiatric and immune-related diseases. Finally, we applied CT-FM-SNP to 39 UK Biobank traits and predicted high confidence causal cell types for 2,798 candidate causal non-coding SNPs. Our results suggest that most SNPs impact a phenotype through a single cell type, and that pleiotropic SNPs target different cell types depending on the phenotype context. Altogether, CT-FM and CT-FM-SNP shed light on how genetic variants act collectively and individually at the cellular level to impact disease risk.
Collapse
Affiliation(s)
- Artem Kim
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zixuan Zhang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Come Legros
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zeyun Lu
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Adam de Smith
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Jill E Moore
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Nicholas Mancuso
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
2
|
Liu L, Yan R, Guo P, Ji J, Gong W, Xue F, Yuan Z, Zhou X. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 2024; 56:348-356. [PMID: 38279040 DOI: 10.1038/s41588-023-01645-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 12/08/2023] [Indexed: 01/28/2024]
Abstract
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16-91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
Collapse
Affiliation(s)
- Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ran Yan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ping Guo
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, China
| | - Weiming Gong
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
3
|
Jung J, Lu Z, de Smith A, Mancuso N. Novel insight into the etiology of ischemic stroke gained by integrative multiome-wide association study. Hum Mol Genet 2024; 33:170-181. [PMID: 37824084 PMCID: PMC10772041 DOI: 10.1093/hmg/ddad174] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/14/2023] [Accepted: 10/09/2023] [Indexed: 10/13/2023] Open
Abstract
Stroke, characterized by sudden neurological deficits, is the second leading cause of death worldwide. Although genome-wide association studies (GWAS) have successfully identified many genomic regions associated with ischemic stroke (IS), the genes underlying risk and their regulatory mechanisms remain elusive. Here, we integrate a large-scale GWAS (N = 1 296 908) for IS together with molecular QTLs data, including mRNA, splicing, enhancer RNA (eRNA), and protein expression data from up to 50 tissues (total N = 11 588). We identify 136 genes/eRNA/proteins associated with IS risk across 60 independent genomic regions and find IS risk is most enriched for eQTLs in arterial and brain-related tissues. Focusing on IS-relevant tissues, we prioritize 9 genes/proteins using probabilistic fine-mapping TWAS analyses. In addition, we discover that blood cell traits, particularly reticulocyte cells, have shared genetic contributions with IS using TWAS-based pheWAS and genetic correlation analysis. Lastly, we integrate our findings with a large-scale pharmacological database and identify a secondary bile acid, deoxycholic acid, as a potential therapeutic component. Our work highlights IS risk genes/splicing-sites/enhancer activity/proteins with their phenotypic consequences using relevant tissues as well as identify potential therapeutic candidates for IS.
Collapse
Affiliation(s)
- Junghyun Jung
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 1450 Biggy Street, Los Angeles, CA 90033, United States
| | - Zeyun Lu
- Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2001 North Soto Street, Los Angeles, CA 90033, United States
| | - Adam de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 1450 Biggy Street, Los Angeles, CA 90033, United States
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 1450 Biggy Street, Los Angeles, CA 90033, United States
- Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2001 North Soto Street, Los Angeles, CA 90033, United States
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
4
|
Jung J, Lu Z, de Smith A, Mancuso N. Novel insight into the etiology of ischemic stroke gained by integrative transcriptome-wide association study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.30.23287918. [PMID: 37034585 PMCID: PMC10081428 DOI: 10.1101/2023.03.30.23287918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Stroke, characterized by sudden neurological deficits, is the second leading cause of death worldwide. Although genome-wide association studies (GWAS) have successfully identified many genomic regions associated with ischemic stroke (IS), the genes underlying risk and their regulatory mechanisms remain elusive. Here, we integrate a large-scale GWAS (N=1,296,908) for IS together with mRNA, splicing, enhancer RNA (eRNA) and protein expression data (N=11,588) from 50 tissues. We identify 136 genes/eRNA/proteins associated with IS risk across 54 independent genomic regions and find IS risk is most enriched for eQTLs in arterial and brain-related tissues. Focusing on IS-relevant tissues, we prioritize 9 genes/proteins using probabilistic fine-mapping TWAS analyses. In addition, we discover that blood cell traits, particularly reticulocyte cells, have shared genetic contributions with IS using TWAS-based pheWAS and genetic correlation analysis. Lastly, we integrate our findings with a large-scale pharmacological database and identify a secondary bile acid, deoxycholic acid, as a potential therapeutic component. Our work highlights IS risk genes/splicing-sites/enhancer activity/proteins with their phenotypic consequences using relevant tissues as well as identify potential therapeutic candidates for IS.
Collapse
Affiliation(s)
- Junghyun Jung
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zeyun Lu
- Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Adam de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Biostatistics Division, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
5
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
6
|
Hao X, Wang K, Dai C, Ding Z, Yang W, Wang C, Cheng S. Integrative analysis of scRNA-seq and GWAS data pinpoints periportal hepatocytes as the relevant liver cell types for blood lipids. Hum Mol Genet 2021; 29:3145-3153. [PMID: 32821946 DOI: 10.1093/hmg/ddaa188] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 08/10/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Liver, a heterogeneous tissue consisting of various cell types, is known to be relevant for blood lipid traits. By integrating summary statistics from genome-wide association studies (GWAS) of lipid traits and single-cell transcriptome data of the liver, we sought to identify specific cell types in the liver that were most relevant for blood lipid levels. We conducted differential expression analyses for 40 cell types from human and mouse livers in order to construct the cell-type specifically expressed gene sets, which we refer to as construction of the liver cell-type specifically expressed gene sets (CT-SEGS). Under the assumption that CT-SEGS represented specific functions of each cell type, we applied stratified linkage disequilibrium score regression to determine cell types that were most relevant for complex traits and diseases. We first confirmed the validity of this method (of delineating functionally relevant cell types) by identifying the immune cell types as relevant for autoimmune diseases. We further showed that lipid GWAS signals were enriched in the human and mouse periportal hepatocytes. Our results provide important information to facilitate future cellular studies of the metabolic mechanism affecting blood lipid levels.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | - Kai Wang
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | - Chengguqiu Dai
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| | | | - Wei Yang
- Department of Nutrition and Food Hygiene, School of Public Health
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health.,Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Shanshan Cheng
- Department of Epidemiology and Biostatistics, Key Laboratory for Environment and Health, School of Public Health
| |
Collapse
|
7
|
Gleason KJ, Yang F, Chen LS. A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics. Genet Epidemiol 2021; 45:353-371. [PMID: 33834509 DOI: 10.1002/gepi.22380] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 01/25/2021] [Accepted: 02/08/2021] [Indexed: 02/06/2023]
Abstract
By treating genetic variants as instrumental variables (IVs), two-sample Mendelian randomization (MR) methods detect genetically regulated risk exposures for complex diseases using only summary statistics. When considering gene expression as exposure in transcriptome-wide MR (TWMR) analyses, the eQTLs (expression-quantitative-trait-loci) may have pleiotropic effects or be correlated with variants that have effects on disease not via expression, and the presence of those invalid IVs would lead to biased inference. Moreover, the number of eQTLs as IVs for a gene is generally limited, making the detection of invalid IVs challenging. We propose a method, "MR-MtRobin," for accurate TWMR inference in the presence of invalid IVs. By leveraging multi-tissue eQTL data in a mixed model, the proposed method makes identifiable the IV-specific random effects due to pleiotropy from estimation errors of eQTL summary statistics, and can provide accurate inference on the dependence (fixed effects) between eQTL and GWAS (genome-wide association study) effects in the presence of invalid IVs. Moreover, our method can improve power and precision in inference by selecting cross-tissue eQTLs as IVs that have improved consistency of effects across eQTL and GWAS data. We applied MR-MtRobin to detect genes associated with schizophrenia risk by integrating summary-level data from the Psychiatric Genomics Consortium and the Genotype-Tissue Expression project (V8).
Collapse
Affiliation(s)
- Kevin J Gleason
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| | - Fan Yang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Lin S Chen
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
8
|
Using Collaborative Mixed Models to Account for Imputation Uncertainty in Transcriptome-Wide Association Studies. Methods Mol Biol 2021. [PMID: 33733352 DOI: 10.1007/978-1-0716-0947-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Transcriptome-wide association studies (TWASs) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. TWASs have become increasingly popular. They have been used to analyze many complex traits with expression profiles from different tissues, successfully enhancing the discovery of genetic risk loci for complex traits. Though conceptually straightforward, some steps are required to perform the TWAS properly. Here we provide a step-by-step guide to integrate eQTL data with both GWAS individual-level data and GWAS summary statistics from complex traits.
Collapse
|
9
|
Zhu H, Shang L, Zhou X. A Review of Statistical Methods for Identifying Trait-Relevant Tissues and Cell Types. Front Genet 2021; 11:587887. [PMID: 33584792 PMCID: PMC7874162 DOI: 10.3389/fgene.2020.587887] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 12/30/2020] [Indexed: 11/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified and replicated many genetic variants that are associated with diseases and disease-related complex traits. However, the biological mechanisms underlying these identified associations remain largely elusive. Exploring the biological mechanisms underlying these associations requires identifying trait-relevant tissues and cell types, as genetic variants likely influence complex traits in a tissue- and cell type-specific manner. Recently, several statistical methods have been developed to integrate genomic data with GWASs for identifying trait-relevant tissues and cell types. These methods often rely on different genomic information and use different statistical models for trait-tissue relevance inference. Here, we present a comprehensive technical review to summarize ten existing methods for trait-tissue relevance inference. These methods make use of different genomic information that include functional annotation information, expression quantitative trait loci information, genetically regulated gene expression information, as well as gene co-expression network information. These methods also use different statistical models that range from linear mixed models to covariance network models. We hope that this review can serve as a useful reference both for methodologists who develop methods and for applied analysts who apply these methods for identifying trait relevant tissues and cell types.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
10
|
Chen H, Wang T, Huang S, Zeng P. New novel non-MHC genes were identified for cervical cancer with an integrative analysis approach of transcriptome-wide association study. J Cancer 2021; 12:840-848. [PMID: 33403041 PMCID: PMC7778537 DOI: 10.7150/jca.47918] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 10/18/2020] [Indexed: 12/28/2022] Open
Abstract
Although genome-wide association studies (GWAS) have successfully identified multiple genetic variants associated with cervical cancer, the functional role of those variants is not well understood. To bridge such gap, we integrated the largest cervical cancer GWAS (N = 9,347) with gene expression measured in six human tissues to perform a multi-tissue transcriptome-wide association study (TWAS). We identified a total of 20 associated genes in the European population, especially four novel non-MHC genes (i.e. WDR19, RP11-384K6.2, RP11-384K6.6 and ITSN1). Further, we attempted to validate our results in another independent cervical cancer GWAS from the East Asian population (N = 3,314) and re-discovered four genes including WDR19, HLA-DOB, MICB and OR2B8P. In our subsequent co-expression analysis, we discovered SLAMF7 and LTA were co-expressed in TCGA tumor samples and showed both WDR19 and ITSN1 were enriched in "plasma membrane". Using the protein-protein interaction analysis we observed strong interactions between the proteins produced by genes that are associated with cervical cancer. Overall, our study identified multiple candidate genes, especially four non-MHC genes, which may be causally associated with the risk of cervical cancer. However, further investigations with larger sample size are warranted to validate our findings in diverse populations.
Collapse
Affiliation(s)
- Haimiao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
11
|
Li B, Dong J, Yu J, Fan Y, Shang L, Zhou X, Bai Y. Pinpointing miRNA and genes enrichment over trait-relevant tissue network in Genome-Wide Association Studies. BMC Med Genomics 2020; 13:191. [PMID: 33371893 PMCID: PMC7771066 DOI: 10.1186/s12920-020-00830-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 11/17/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Understanding gene regulation is important but difficult. Elucidating tissue-specific gene regulation mechanism is even more challenging and requires gene co-expression network assembled from protein-protein interaction, transcription factor and gene binding, and post-transcriptional regulation (e.g., miRNA targeting) information. The miRNA binding affinity could therefore be changed by SNP(s) located at the 3' untranslated regions (3'UTR) of the target messenger RNA (mRNA) which miRNA(s) interacts with. Genome-wide association study (GWAS) has reported significant numbers of loci hosting SNPs associated with many traits. The goal of this study is to pinpoint GWAS functional variants located in 3'UTRs and elucidate if the genes harboring these variants along with their targeting miRNAs are associated with genetic traits relevant to certain tissues. METHODS By applying MIGWAS, CoCoNet, ANNOVAR, and DAVID bioinformatics software and utilizing the gene expression database (e.g. GTEx data) to study GWAS summary statistics for 43 traits from 28 GWAS studies, we have identified a list of miRNAs and targeted genes harboring 3'UTR variants, which could contribute to trait-relevant tissue over miRNA-target gene network. RESULTS Our result demonstrated that strong association between traits and tissues exists, and in particular, the Primary Biliary Cirrhosis (PBC) trait has the most significant p-value for all 180 tissues among all 43 traits used for this study. We reported SNPs located in 3'UTR regions of genes (SFMBT2, ZC3HAV1, and UGT3A1) targeted by miRNAs for PBC trait and its tissue association network. After employing Gene Ontology (GO) analysis for PBC trait, we have also identified a very important miRNA targeted gene over miRNA-target gene network, PFKL, which encodes the liver subunit of an enzyme. CONCLUSIONS The non-coding variants identified from GWAS studies are casually assumed to be not critical to translated protein product. However, 3' untranslated regions (3'UTRs) of genes harbor variants can often change the binding affinity of targeting miRNAs playing important roles in protein translation degree. Our study has shown that GWAS variants could play important roles on miRNA-target gene networks by contributing the association between traits and tissues. Our analysis expands our knowledge on trait-relevant tissue network and paves way for future human disease studies.
Collapse
Affiliation(s)
- Binze Li
- Bellaire High School, 5100 Maple St, Bellaire, TX, 77401, USA
| | - Julian Dong
- Northville High School, 45700 Six Mile Road, Northville, MI, 48168, USA
| | - Jiaqi Yu
- College Preparatory School, 6100 Broadway, Oakland, CA, 94618, USA
| | - Yuqi Fan
- The Master's Academy, 1500 Lukas Ln, Oviedo, FL, 32765, USA
| | - Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Yongsheng Bai
- Department of Biology, Eastern Michigan University, Ypsilanti, MI, 48197, USA. .,Next-Gen Intelligent Science Training, Ann Arbor, MI, 48105, USA.
| |
Collapse
|
12
|
Xiao L, Yuan Z, Jin S, Wang T, Huang S, Zeng P. Multiple-Tissue Integrative Transcriptome-Wide Association Studies Discovered New Genes Associated With Amyotrophic Lateral Sclerosis. Front Genet 2020; 11:587243. [PMID: 33329728 PMCID: PMC7714931 DOI: 10.3389/fgene.2020.587243] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/26/2020] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified multiple causal genes associated with amyotrophic lateral sclerosis (ALS); however, the genetic architecture of ALS remains completely unknown and a large number of causal genes have yet been discovered. To full such gap in part, we implemented an integrative analysis of transcriptome-wide association study (TWAS) for ALS to prioritize causal genes with summary statistics from 80,610 European individuals and employed 13 GTEx brain tissues as reference transcriptome panels. The summary-level TWAS analysis with single brain tissue was first undertaken and then a flexible p-value combination strategy, called summary data-based Cauchy Aggregation TWAS (SCAT), was proposed to pool association signals from single-tissue TWAS analysis while protecting against highly positive correlation among tests. Extensive simulations demonstrated SCAT can produce well-calibrated p-value for the control of type I error and was often much more powerful to identify association signals across various scenarios compared with single-tissue TWAS analysis. Using SCAT, we replicated three ALS-associated genes (i.e., ATXN3, SCFD1, and C9orf72) identified in previous GWASs and discovered additional five genes (i.e., SLC9A8, FAM66D, TRIP11, JUP, and RP11-529H20.6) which were not reported before. Furthermore, we discovered the five associations were largely driven by genes themselves and thus might be new genes which were likely related to the risk of ALS. However, further investigations are warranted to verify these results and untangle the pathophysiological function of the genes in developing ALS.
Collapse
Affiliation(s)
- Lishun Xiao
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
13
|
Yang T, Tang H, Risch HA, Olson SH, Petersen G, Bracci PM, Gallinger S, Hung R, Neale RE, Scelo G, Duell EJ, Kurtz RC, Khaw KT, Severi G, Sund M, Wareham N, Amos CI, Li D, Wei P. Incorporating multiple sets of eQTL weights into gene-by-environment interaction analysis identifies novel susceptibility loci for pancreatic cancer. Genet Epidemiol 2020; 44:880-892. [PMID: 32779232 PMCID: PMC7657998 DOI: 10.1002/gepi.22348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 11/11/2022]
Abstract
It is of great scientific interest to identify interactions between genetic variants and environmental exposures that may modify the risk of complex diseases. However, larger sample sizes are usually required to detect gene-by-environment interaction (G × E) than required to detect genetic main association effects. To boost the statistical power and improve the understanding of the underlying molecular mechanisms, we incorporate functional genomics information, specifically, expression quantitative trait loci (eQTLs), into a data-adaptive G × E test, called aGEw. This test adaptively chooses the best eQTL weights from multiple tissues and provides an extra layer of weighting at the genetic variant level. Extensive simulations show that the aGEw test can control the Type 1 error rate, and the power is resilient to the inclusion of neutral variants and noninformative external weights. We applied the proposed aGEw test to the Pancreatic Cancer Case-Control Consortium (discovery cohort of 3,585 cases and 3,482 controls) and the PanScan II genome-wide association study data (replication cohort of 2,021 cases and 2,105 controls) with smoking as the exposure of interest. Two novel putative smoking-related pancreatic cancer susceptibility genes, TRIP10 and KDM3A, were identified. The aGEw test is implemented in an R package aGE.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Divison of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Hongwei Tang
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Sara H. Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, US
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Paige M. Bracci
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rayjean Hung
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rachel E. Neale
- Cancer Aetiology and Prevention Group, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - Eric J. Duell
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program Catalan Institute of Oncology - Bellvitge Biomedical Research Institute (ICO-IDIBELL) Avda. Gran Via 199-203 08908 L’Hospitalet de Llobregat, Barcelona, Spain
| | - Robert C. Kurtz
- Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kay-Tee Khaw
- Department of Public Health and Primary Care, University of Cambridge, UK
| | - Gianluca Severi
- Gustave Roussy, F-94805, Villejuif, France
- CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, 94805, Villejuif, France
| | - Malin Sund
- Department of Surgical and Perioperative Sciences, Umeå University, Sweden
| | - Nick Wareham
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Christopher I Amos
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
14
|
Timshel PN, Thompson JJ, Pers TH. Genetic mapping of etiologic brain cell types for obesity. eLife 2020; 9:55851. [PMID: 32955435 PMCID: PMC7505664 DOI: 10.7554/elife.55851] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 09/04/2020] [Indexed: 12/11/2022] Open
Abstract
The underlying cell types mediating predisposition to obesity remain largely obscure. Here, we integrated recently published single-cell RNA-sequencing (scRNA-seq) data from 727 peripheral and nervous system cell types spanning 17 mouse organs with body mass index (BMI) genome-wide association study (GWAS) data from >457,000 individuals. Developing a novel strategy for integrating scRNA-seq data with GWAS data, we identified 26, exclusively neuronal, cell types from the hypothalamus, subthalamus, midbrain, hippocampus, thalamus, cortex, pons, medulla, pallidum that were significantly enriched for BMI heritability (p<1.6×10−4). Using genes harboring coding mutations associated with obesity, we replicated midbrain cell types from the anterior pretectal nucleus and periaqueductal gray (p<1.2×10−4). Together, our results suggest that brain nuclei regulating integration of sensory stimuli, learning and memory are likely to play a key role in obesity and provide testable hypotheses for mechanistic follow-up studies.
Collapse
Affiliation(s)
- Pascal N Timshel
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Jonatan J Thompson
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Tune H Pers
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
15
|
Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR, Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, Lin X. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 2020; 52:969-983. [PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 07/02/2020] [Indexed: 12/13/2022]
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Stella Aslibekyan
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jai G Broome
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - George Hindy
- Department of Population Medicine, Qatar University College of Medicine, QU Health, Doha, Qatar
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Sekar Kathiresan
- Verve Therapeutics, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alyna T Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Charles L Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - X Shirley Liu
- Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Akhil Pampana
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Juan M Peralta
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Michael Y Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Fei Fei Wang
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Daniel E Weeks
- Department of Human Genetics and Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gonçalo R Abecasis
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
16
|
Liu S, Yu Y, Zhang S, Cole JB, Tenesa A, Wang T, McDaneld TG, Ma L, Liu GE, Fang L. Epigenomics and genotype-phenotype association analyses reveal conserved genetic architecture of complex traits in cattle and human. BMC Biol 2020; 18:80. [PMID: 32620158 PMCID: PMC7334855 DOI: 10.1186/s12915-020-00792-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 05/12/2020] [Indexed: 02/01/2023] Open
Abstract
Background Lack of comprehensive functional annotations across a wide range of tissues and cell types severely hinders the biological interpretations of phenotypic variation, adaptive evolution, and domestication in livestock. Here we used a combination of comparative epigenomics, genome-wide association study (GWAS), and selection signature analysis, to shed light on potential adaptive evolution in cattle. Results We cross-mapped 8 histone marks of 1300 samples from human to cattle, covering 178 unique tissues/cell types. By uniformly analyzing 723 RNA-seq and 40 whole genome bisulfite sequencing (WGBS) datasets in cattle, we validated that cross-mapped histone marks captured tissue-specific expression and methylation, reflecting tissue-relevant biology. Through integrating cross-mapped tissue-specific histone marks with large-scale GWAS and selection signature results, we for the first time detected relevant tissues and cell types for 45 economically important traits and artificial selection in cattle. For instance, immune tissues are significantly associated with health and reproduction traits, multiple tissues for milk production and body conformation traits (reflecting their highly polygenic architecture), and thyroid for the different selection between beef and dairy cattle. Similarly, we detected relevant tissues for 58 complex traits and diseases in humans and observed that immune and fertility traits in humans significantly correlated with those in cattle in terms of relevant tissues, which facilitated the identification of causal genes for such traits. For instance, PIK3CG, a gene highly specifically expressed in mononuclear cells, was significantly associated with both age-at-menopause in human and daughter-still-birth in cattle. ICAM, a T cell-specific gene, was significantly associated with both allergic diseases in human and metritis in cattle. Conclusion Collectively, our results highlighted that comparative epigenomics in conjunction with GWAS and selection signature analyses could provide biological insights into the phenotypic variation and adaptive evolution. Cattle may serve as a model for human complex traits, by providing additional information beyond laboratory model organisms, particularly when more novel phenotypes become available in the near future.
Collapse
Affiliation(s)
- Shuli Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ying Yu
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Shengli Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - John B Cole
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA
| | - Albert Tenesa
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.,The Roslin Institute, University of Edinburgh, Edinburgh, EH25 9RG, UK
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Tara G McDaneld
- US Meat Animal Research Center, Agricultural Research Service, USDA, Clay Center, NE, 68933, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.
| | - Lingzhao Fang
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA. .,MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK. .,Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
17
|
Zhu H, Zhou X. Statistical methods for SNP heritability estimation and partition: A review. Comput Struct Biotechnol J 2020; 18:1557-1568. [PMID: 32637052 PMCID: PMC7330487 DOI: 10.1016/j.csbj.2020.06.011] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 06/03/2020] [Accepted: 06/07/2020] [Indexed: 02/06/2023] Open
Abstract
In GWAS studies, SNP heritability measures the proportion of phenotypic variance explained by all measured SNPs. Accurate estimation of SNP heritability can help us better understand the degree to which measured genetic variants influence phenotypes. Over the last decade, a variety of statistical methods and software tools have been developed for SNP heritability estimation with different data types including genotype array data, imputed genotype data, whole-genome sequencing data, RNA sequencing data, and bisulfite sequencing data. However, a thorough technical review of these methods, especially from a statistical and computational viewpoint, is currently missing. To fill this knowledge gap, we present a comprehensive review on a broad category of recently developed and commonly used SNP heritability estimation methods. We focus on their modeling assumptions; their interconnected relationships; their applicability to quantitative, binary and count phenotypes; their use of individual level data versus summary statistics, as well as their utility for SNP heritability partitioning. We hope that this review will serve as a useful reference for both methodologists who develop heritability estimation methods and practitioners who perform heritability analysis.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
18
|
Yang C, Wan X, Lin X, Chen M, Zhou X, Liu J. CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics 2020; 35:1644-1652. [PMID: 30295737 DOI: 10.1093/bioinformatics/bty865] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Revised: 09/15/2018] [Accepted: 10/05/2018] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Genome-wide association studies (GWASs) have been successful in identifying many genetic variants associated with complex traits. However, the mechanistic links between these variants and complex traits remain elusive. A scientific hypothesis is that genetic variants influence complex traits at the organismal level via affecting cellular traits, such as regulating gene expression and altering protein abundance. Although earlier works have already presented some scientific insights about this hypothesis and their findings are very promising, statistical methods that effectively harness multilayered data (e.g. genetic variants, cellular traits and organismal traits) on a large scale for functional and mechanistic exploration are highly demanding. RESULTS In this study, we propose a collaborative mixed model (CoMM) to investigate the mechanistic role of associated variants in complex traits. The key idea is built upon the emerging scientific evidence that genetic effects at the cellular level are much stronger than those at the organismal level. Briefly, CoMM combines two models: the first model relating gene expression with genotype and the second model relating phenotype with predicted gene expression using the first model. The two models are fitted jointly in CoMM, such that the uncertainty in predicting gene expression has been fully accounted. To demonstrate the advantages of CoMM over existing methods, we conducted extensive simulation studies, and also applied CoMM to analyze 25 traits in NFBC1966 and Genetic Epidemiology Research on Aging (GERA) studies by integrating transcriptome information from the Genetic European in Health and Disease (GEUVADIS) Project. The results indicate that by leveraging regulatory information, CoMM can effectively improve the power of prioritizing risk variants. Regarding the computational efficiency, CoMM can complete the analysis of NFBC1966 dataset and GERA datasets in 2 and 18 min, respectively. AVAILABILITY AND IMPLEMENTATION The developed R package is available at https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Can Yang
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen, China
| | - Xinyi Lin
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Mengjie Chen
- Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Jin Liu
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| |
Collapse
|
19
|
Shang L, Smith JA, Zhao W, Kho M, Turner ST, Mosley TH, Kardia SLR, Zhou X. Genetic Architecture of Gene Expression in European and African Americans: An eQTL Mapping Study in GENOA. Am J Hum Genet 2020; 106:496-512. [PMID: 32220292 PMCID: PMC7118581 DOI: 10.1016/j.ajhg.2020.03.002] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Accepted: 03/06/2020] [Indexed: 12/20/2022] Open
Abstract
Most existing expression quantitative trait locus (eQTL) mapping studies have been focused on individuals of European ancestry and are underrepresented in other populations including populations with African ancestry. Lack of large-scale well-powered eQTL mapping studies in populations with African ancestry can both impede the dissemination of eQTL mapping results that would otherwise benefit individuals with African ancestry and hinder the comparable analysis for understanding how gene regulation is shaped through evolution. We fill this critical knowledge gap by performing a large-scale in-depth eQTL mapping study on 1,032 African Americans (AA) and 801 European Americans (EA) in the GENOA cohort. We identified a total of 354,931 eSNPs in AA and 371,309 eSNPs in EA, with 112,316 eSNPs overlapped between the two. We found that eQTL harboring genes (eGenes) are enriched in metabolic pathways and tend to have higher SNP heritability compared to non-eGenes. We found that eGenes that are common in the two populations tend to be less conserved than eGenes that are unique to one population, which are less conserved than non-eGenes. Through conditional analysis, we found that eGenes in AA tend to harbor more independent eQTLs than eGenes in EA, suggesting potentially diverse genetic architecture underlying expression variation in the two populations. Finally, the large sample sizes in GENOA allow us to construct accurate expression prediction models in both AA and EA, facilitating powerful transcriptome-wide association studies. Overall, our results represent an important step toward revealing the genetic architecture underlying expression variation in African Americans.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Minjung Kho
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Stephen T Turner
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN 55905, USA
| | - Thomas H Mosley
- Memory Impairment and Neurodegenerative Dementia (MIND) Center, University of Mississippi Medical Center, Jackson, MS 39126, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
20
|
Shang L, Smith JA, Zhou X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet 2020; 16:e1008734. [PMID: 32310941 PMCID: PMC7192514 DOI: 10.1371/journal.pgen.1008734] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 04/30/2020] [Accepted: 03/24/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many SNPs associated with various common diseases. Understanding the biological functions of these identified SNP associations requires identifying disease/trait relevant tissues or cell types. Here, we develop a network method, CoCoNet, to facilitate the identification of trait-relevant tissues or cell types. Different from existing approaches, CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing (RNAseq) studies with GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect measurements for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes. We validate the performance of CoCoNet through extensive simulations. We apply CoCoNet for an in-depth analysis of four neurological disorders and four autoimmune diseases, where we integrate the corresponding GWASs with bulk RNAseq data from 38 tissues and single cell RNAseq data from 10 cell types. In the real data applications, we show how CoCoNet can help identify specific glial cell types relevant for neurological disorders and identify disease-targeted colon tissues as relevant for autoimmune diseases.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, United States of America
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
21
|
Duggal P, Ladd-Acosta C, Ray D, Beaty TH. The Evolving Field of Genetic Epidemiology: From Familial Aggregation to Genomic Sequencing. Am J Epidemiol 2019; 188:2069-2077. [PMID: 31509181 PMCID: PMC7036654 DOI: 10.1093/aje/kwz193] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 12/21/2022] Open
Abstract
The field of genetic epidemiology is relatively young and brings together genetics, epidemiology, and biostatistics to identify and implement the best study designs and statistical analyses for identifying genes controlling risk for complex and heterogeneous diseases (i.e., those where genes and environmental risk factors both contribute to etiology). The field has moved quickly over the past 40 years partly because the technology of genotyping and sequencing has forced it to adapt while adhering to the fundamental principles of genetics. In the last two decades, the available tools for genetic epidemiology have expanded from a genetic focus (considering 1 gene at a time) to a genomic focus (considering the entire genome), and now they must further expand to integrate information from other “-omics” (e.g., epigenomics, transcriptomics as measured by RNA expression) at both the individual and the population levels. Additionally, we can now also evaluate gene and environment interactions across populations to better understand exposure and the heterogeneity in disease risk. The future challenges facing genetic epidemiology are considerable both in scale and techniques, but the importance of the field will not diminish because by design it ties scientific goals with public health applications.
Collapse
Affiliation(s)
- Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Debashree Ray
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| |
Collapse
|
22
|
Jiang L, Xue C, Dai S, Chen S, Chen P, Sham PC, Wang H, Li M. DESE: estimating driver tissues by selective expression of genes associated with complex diseases or traits. Genome Biol 2019; 20:233. [PMID: 31694669 PMCID: PMC6836538 DOI: 10.1186/s13059-019-1801-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 08/25/2019] [Indexed: 02/08/2023] Open
Abstract
The driver tissues or cell types in which susceptibility genes initiate diseases remain elusive. We develop a unified framework to detect the causal tissues of complex diseases or traits according to selective expression of disease-associated genes in genome-wide association studies (GWASs). This framework consists of three components which run iteratively to produce a converged prioritization list of driver tissues. Additionally, this framework also outputs a list of prioritized genes as a byproduct. We apply the framework to six representative complex diseases or traits with GWAS summary statistics, which leads to the estimation of the lung as an associated tissue of rheumatoid arthritis.
Collapse
Affiliation(s)
- Lin Jiang
- Zhongshan School of Medicine, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, 510080, China.,Department of Pituitary Tumour Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, China
| | - Chao Xue
- Zhongshan School of Medicine, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, 510080, China.,Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, 510080, China
| | - Sheng Dai
- Zhongshan School of Medicine, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Shangzhen Chen
- Zhongshan School of Medicine, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Peikai Chen
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Haijun Wang
- Department of Pituitary Tumour Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, China.
| | - Miaoxin Li
- Zhongshan School of Medicine, Center for Precision Medicine, Sun Yat-sen University, Guangzhou, 510080, China. .,Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, 510080, China. .,Department of Psychiatry, The Centre for Genomic Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
23
|
Rahmani E, Schweiger R, Rhead B, Criswell LA, Barcellos LF, Eskin E, Rosset S, Sankararaman S, Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat Commun 2019; 10:3417. [PMID: 31366909 PMCID: PMC6668473 DOI: 10.1038/s41467-019-11052-9] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 06/17/2019] [Indexed: 02/07/2023] Open
Abstract
High costs and technical limitations of cell sorting and single-cell techniques currently restrict the collection of large-scale, cell-type-specific DNA methylation data. This, in turn, impedes our ability to tackle key biological questions that pertain to variation within a population, such as identification of disease-associated genes at a cell-type-specific resolution. Here, we show mathematically and empirically that cell-type-specific methylation levels of an individual can be learned from its tissue-level bulk data, conceptually emulating the case where the individual has been profiled with a single-cell resolution and then signals were aggregated in each cell population separately. Provided with this unprecedented way to perform powerful large-scale epigenetic studies with cell-type-specific resolution, we revisit previous studies with tissue-level bulk methylation and reveal novel associations with leukocyte composition in blood and with rheumatoid arthritis. For the latter, we further show consistency with validation data collected from sorted leukocyte sub-types.
Collapse
Affiliation(s)
- Elior Rahmani
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel
- MyHeritage Ltd., Or Yehuda, 6037606, Israel
| | - Brooke Rhead
- Computational Biology Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Lindsey A Criswell
- Russell/Engleman Rheumatology Research Center, Department of Medicine, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Lisa F Barcellos
- School of Public Health, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Saharon Rosset
- Department of Statistics, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
24
|
Fang L, Zhou Y, Liu S, Jiang J, Bickhart DM, Null DJ, Li B, Schroeder SG, Rosen BD, Cole JB, Van Tassell CP, Ma L, Liu GE. Comparative analyses of sperm DNA methylomes among human, mouse and cattle provide insights into epigenomic evolution and complex traits. Epigenetics 2019; 14:260-276. [PMID: 30810461 PMCID: PMC6557555 DOI: 10.1080/15592294.2019.1582217] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Sperm DNA methylation is crucial for fertility and viability of offspring but epigenome evolution in mammals is largely understudied. By comparing sperm DNA methylomes and large-scale genome-wide association study (GWAS) signals between human and cattle, we aimed to examine the DNA methylome evolution and its associations with complex phenotypes in mammals. Our analysis revealed that genes with conserved non-methylated promoters (e.g., ANKS1A and WNT7A) among human and cattle were involved in common system and embryo development, and enriched for GWAS signals of body conformation traits in both species, while genes with conserved hypermethylated promoters (e.g., TCAP and CD80) were engaged in immune responses and highlighted by immune-related traits. On the other hand, genes with human-specific hypomethylated promoters (e.g., FOXP2 and HYDIN) were engaged in neuron system development and enriched for GWAS signals of brain-related traits, while genes with cattle-specific hypomethylated promoters (e.g., LDHB and DGAT2) mainly participated in lipid storage and metabolism. We validated our findings using sperm-retained nucleosome, preimplantation transcriptome, and adult tissue transcriptome data, as well as sequence evolutionary features, including motif binding sites, mutation rates, recombination rates and evolution signatures. In conclusion, our results demonstrate important roles of epigenome evolution in shaping the genetic architecture underlying complex phenotypes, hence enhance signal prioritization in GWAS and provide valuable information for human neurological disorders and livestock genetic improvement.
Collapse
Affiliation(s)
- Lingzhao Fang
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA.,b Department of Animal and Avian Sciences , University of Maryland , College Park , MD , USA
| | - Yang Zhou
- c Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China , Huazhong Agricultural University , Wuhan , Hubei , China
| | - Shuli Liu
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA.,d Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture & National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology , China Agricultural University , Beijing , China
| | - Jicai Jiang
- b Department of Animal and Avian Sciences , University of Maryland , College Park , MD , USA
| | - Derek M Bickhart
- e Dairy Forage Research Center , Agricultural Research Service, USDA , Madison , WI , USA
| | - Daniel J Null
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - Bingjie Li
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - Steven G Schroeder
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - Benjamin D Rosen
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - John B Cole
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - Curtis P Van Tassell
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| | - Li Ma
- b Department of Animal and Avian Sciences , University of Maryland , College Park , MD , USA
| | - George E Liu
- a Animal Genomics and Improvement Laboratory, BARC , Agricultural Research Service, USDA , Beltsville , MD , USA
| |
Collapse
|
25
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|