1
|
Zhang J, Zhao H. eQTL studies: from bulk tissues to single cells. J Genet Genomics 2023; 50:925-933. [PMID: 37207929 PMCID: PMC10656365 DOI: 10.1016/j.jgg.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/02/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of specific genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to a better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detection of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA 30322, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 208034, USA.
| |
Collapse
|
2
|
Lee DJ, Kim Y, Dinh PTN, Chung Y, Lee D, Kim Y, Lee SH, Choi I, Lee SH. Identification of Missense Variants Affecting Carcass Traits for Hanwoo Precision Breeding. Genes (Basel) 2023; 14:1839. [PMID: 37895191 PMCID: PMC10606632 DOI: 10.3390/genes14101839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/29/2023] Open
Abstract
This study aimed to identify causal variants associated with important carcass traits such as weight and meat quality in Hanwoo cattle. We analyzed missense mutations extracted from imputed sequence data (ARS-UCD1.2) and performed an exon-specific association test on the carcass traits of 16,970 commercial Hanwoo. We found 33, 2, 1, and 3 significant SNPs associated with carcass weight (CW), backfat thickness (BFT), eye muscle area (EMA), and marbling score (MS), respectively. In CW and EMA, the most significant missense SNP was identified at 19,524,263 on BTA14 and involved the PRKDC. A missense SNP in the ZFAND2B, located at 107,160,304 on BTA2 was identified as being involved in BFT. For MS, missense SNP in the ACVR2B gene, located at 11,849,704 in BTA22 was identified as the most significant marker. The contribution of the most significant missense SNPs to genetic variance was confirmed to be 8.47%, 2.08%, 1.73%, and 1.19% in CW, BFT, EMA, and MS, respectively. We generated favorable and unfavorable haplotype combinations based on the significant SNPs for CW. Significant differences in GEBV (Genomic Estimated Breeding Values) were observed between groups with each favorable and unfavorable haplotype combination. In particular, the missense SNPs in PRKDC, MRPL9, and ANKFN1 appear to significantly affect the protein's function and structure, making them strong candidates as causal mutations. These missense SNPs have the potential to serve as valuable markers for improving carcass traits in Hanwoo commercial farms.
Collapse
Affiliation(s)
- Dong Jae Lee
- Division of Animal & Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (D.J.L.); (Y.C.); (D.L.); (S.H.L.)
| | - Yoonsik Kim
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Republic of Korea; (Y.K.); (P.T.N.D.)
| | - Phuong Thanh N. Dinh
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Republic of Korea; (Y.K.); (P.T.N.D.)
| | - Yoonji Chung
- Division of Animal & Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (D.J.L.); (Y.C.); (D.L.); (S.H.L.)
| | - Dooho Lee
- Division of Animal & Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (D.J.L.); (Y.C.); (D.L.); (S.H.L.)
| | - Yeongkuk Kim
- Quantomic Research & Solution, Daejeon 34134, Republic of Korea;
| | - Soo Hyun Lee
- Division of Animal & Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (D.J.L.); (Y.C.); (D.L.); (S.H.L.)
| | - Inchul Choi
- Division of Animal & Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (D.J.L.); (Y.C.); (D.L.); (S.H.L.)
| | - Seung Hwan Lee
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Republic of Korea; (Y.K.); (P.T.N.D.)
| |
Collapse
|
3
|
Zhang J, Zhao H. eQTL Studies: from Bulk Tissues to Single Cells. ARXIV 2023:arXiv:2302.11662v1. [PMID: 36866231 PMCID: PMC9980190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|
4
|
Young RS, Talmane L, Marion de Procé S, Taylor MS. The contribution of evolutionarily volatile promoters to molecular phenotypes and human trait variation. Genome Biol 2022; 23:89. [PMID: 35379293 PMCID: PMC8978360 DOI: 10.1186/s13059-022-02634-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 02/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Promoters are sites of transcription initiation that harbour a high concentration of phenotype-associated genetic variation. The evolutionary gain and loss of promoters between species (collectively, termed turnover) is pervasive across mammalian genomes and may play a prominent role in driving human phenotypic diversity. RESULTS We classified human promoters by their evolutionary history during the divergence of mouse and human lineages from a common ancestor. This defined conserved, human-inserted and mouse-deleted promoters, and a class of functional-turnover promoters that align between species but are only active in humans. We show that promoters of all evolutionary categories are hotspots for substitution and often, insertion mutations. Loci with a history of insertion and deletion continue that mode of evolution within contemporary humans. The presence of an evolutionary volatile promoter within a gene is associated with increased expression variance between individuals, but only in the case of human-inserted and mouse-deleted promoters does that correspond to an enrichment of promoter-proximal genetic effects. Despite the enrichment of these molecular quantitative trait loci (QTL) at evolutionarily volatile promoters, this does not translate into a corresponding enrichment of phenotypic traits mapping to these loci. CONCLUSIONS Promoter turnover is pervasive in the human genome, and these promoters are rich in molecularly quantifiable but phenotypically inconsequential variation in gene expression. However, since evolutionarily volatile promoters show evidence of selection, coupled with high mutation rates and enrichment of QTLs, this implicates them as a source of evolutionary innovation and phenotypic variation, albeit with a high background of selectively neutral expression variation.
Collapse
Affiliation(s)
- Robert S Young
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK. .,Zhejiang University - University of Edinburgh Institute, Zhejiang University, 718 East Haizhou Road, 314400, Haining, China. .,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| | - Lana Talmane
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Sophie Marion de Procé
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK.,MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Martin S Taylor
- MRC Human Genetics Unit, Institute for Genetics and Cancer, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
5
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
6
|
Xu L, Gao N, Wang Z, Xu L, Liu Y, Chen Y, Xu L, Gao X, Zhang L, Gao H, Zhu B, Li J. Incorporating Genome Annotation Into Genomic Prediction for Carcass Traits in Chinese Simmental Beef Cattle. Front Genet 2020; 11:481. [PMID: 32499816 PMCID: PMC7243208 DOI: 10.3389/fgene.2020.00481] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 04/17/2020] [Indexed: 01/08/2023] Open
Abstract
Various methods have been proposed for genomic prediction (GP) in livestock. These methods have mainly focused on statistical considerations and did not include genome annotation information. In this study, to improve the predictive performance of carcass traits in Chinese Simmental beef cattle, we incorporated the genome annotation information into GP. Single nucleotide polymorphisms (SNPs) were annotated to five genomic classes: intergenic, gene, exon, protein coding sequences, and 3'/5' untranslated region. Haploblocks were constructed for all markers and these five genomic classes by defining a biologically functional unit, and haplotype effects were modeled in both numerical dosage and categorical coding strategies. The first-order epistatic effects among SNPs and haplotypes were modeled using a categorical epistasis model. For all makers, the extension from the SNP-based model to a haplotype-based model improved the accuracy by 5.4-9.8% for carcass weight (CW), live weight (LW), and striploin (SI). For the five genomic classes using the haplotype-based prediction model, the incorporation of gene class information into the model improved the accuracies by an average of 1.4, 2.1, and 1.3% for CW, LW, and SI, respectively, compared with their corresponding results for all markers. Including the first-order epistatic effects into the prediction models improved the accuracies in some traits and genomic classes. Therefore, for traits with moderate-to-high heritability, incorporating genome annotation information of gene class into haplotype-based prediction models could be considered as a promising tool for GP in Chinese Simmental beef cattle, and modeling epistasis in prediction can further increase the accuracy to some degree.
Collapse
Affiliation(s)
- Ling Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ying Liu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| |
Collapse
|
7
|
Srikanth K, Lee SH, Chung KY, Park JE, Jang GW, Park MR, Kim NY, Kim TH, Chai HH, Park WC, Lim D. A Gene-Set Enrichment and Protein-Protein Interaction Network-Based GWAS with Regulatory SNPs Identifies Candidate Genes and Pathways Associated with Carcass Traits in Hanwoo Cattle. Genes (Basel) 2020; 11:E316. [PMID: 32188084 PMCID: PMC7140899 DOI: 10.3390/genes11030316] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/06/2020] [Accepted: 03/12/2020] [Indexed: 02/06/2023] Open
Abstract
Non-synonymous SNPs and protein coding SNPs within the promoter region of genes (regulatory SNPs) might have a significant effect on carcass traits. Imputed sequence level data of 10,215 Hanwoo bulls, annotated and filtered to include only regulatory SNPs (450,062 SNPs), were used in a genome-wide association study (GWAS) to identify loci associated with backfat thickness (BFT), carcass weight (CWT), eye muscle area (EMA), and marbling score (MS). A total of 15, 176, and 1 SNPs were found to be significantly associated (p < 1.11 × 10-7) with BFT, CWT, and EMA, respectively. The significant loci were BTA4 (CWT), BTA6 (CWT), BTA14 (CWT and EMA), and BTA19 (BFT). BayesR estimated that 1.1%~1.9% of the SNPs contributed to more than 0.01% of the phenotypic variance. So, the GWAS was complemented by a gene-set enrichment (GSEA) and protein-protein interaction network (PPIN) analysis in identifying the pathways affecting carcass traits. At p < 0.005 (~2,261 SNPs), 25 GO and 18 KEGG categories, including calcium signaling, cell proliferation, and folate biosynthesis, were found to be enriched through GSEA. The PPIN analysis showed enrichment for 81 candidate genes involved in various pathways, including the PI3K-AKT, calcium, and FoxO signaling pathways. Our finding provides insight into the effects of regulatory SNPs on carcass traits.
Collapse
Affiliation(s)
- Krishnamoorthy Srikanth
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Seung-Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea;
| | - Ki-Yong Chung
- Department of Beef Science, Korea National College of Agriculture and Fisheries, Jeonju 54874, Korea;
| | - Jong-Eun Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Gul-Won Jang
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Mi-Rim Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Na Yeon Kim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Tae-Hun Kim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Won Cheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| |
Collapse
|
8
|
Do DN, Schenkel F, Miglior F, Zhao X, Ibeagha-Awemu EM. Targeted genotyping to identify potential functional variants associated with cholesterol content in bovine milk. Anim Genet 2020; 51:200-209. [PMID: 31913546 DOI: 10.1111/age.12901] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 12/03/2019] [Accepted: 12/10/2019] [Indexed: 01/04/2023]
Abstract
High blood cholesterol concentration, mainly caused by high dietary cholesterol, is a potential risk factor for human health. Dairy products are important sources of human dietary cholesterol intake. Therefore, monitoring bovine milk cholesterol concentration is important for human health benefit. Genetic selection for improvement of cow milk cholesterol content requires understanding of the genetics of milk cholesterol. For this purpose, we performed analyses of additive and dominance effects of 126 potentially functional SNPs within 43 candidate genes with milk cholesterol content [expressed as mg of cholesterol in 100 g of fat (CHL_fat) or in 100 mg of milk (CHL_milk)]. The additive and dominance effects of SNPs rs380643365 in AGPAT1 (P = 0.04) and rs134357240 in SOAT1 (P = 0.035) genes associated significantly with CHL_fat. Moreover, five (rs109326954 and rs523413537 in DGAT1, rs109376747 in LDLR, rs42781651 in FAM198B and rs109967779 in ACAT2) and four (rs137347384 in RBM19, rs109376747 in LDLR, rs42016945 in PPARG and rs110862179 in SCAP) SNPs were significantly associated with CHL_milk (P < 0.05) based on additive and dominance effect analyses respectively. Rs109326954 and rs523413537 in DGAT1 explained a considerable portion of the phenotypic variance of CHL_milk (7.54 and 6.84% respectively), and might be useful in selection programs for reduced milk cholesterol content. Several significantly associated SNPs were in genes (such as ACAT2 and LDLR) involved in cholesterol metabolism in the liver or cholesterol transport, suggesting multiple mechanisms regulating milk cholesterol content. Nine and seven SNPs identified by additive or dominance effect analyses associated significantly with milk yield and fat yield respectively. Further analyses are required to better understand the consequences of these variants and their potential use in genomic selection of the studied traits.
Collapse
Affiliation(s)
- D N Do
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC,, J1M 0C8, Canada.,Department of Animal Science and Aquaculture, Dalhousie University, 58 River Road, Truro, NS, B2N 5E3, Canada
| | - F Schenkel
- Department of Animal Biosciences, Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - F Miglior
- Department of Animal Biosciences, Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - X Zhao
- Department of Animal Science, McGill University, Ste-Anne-de-Bellevue, Montreal, QC, H9X 3V9, Canada
| | - E M Ibeagha-Awemu
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC,, J1M 0C8, Canada
| |
Collapse
|
9
|
Do DN, Bissonnette N, Lacasse P, Miglior F, Zhao X, Ibeagha-Awemu EM. A targeted genotyping approach to enhance the identification of variants for lactation persistency in dairy cows. J Anim Sci 2019; 97:4066-4075. [PMID: 31581300 DOI: 10.1093/jas/skz279] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 09/13/2019] [Indexed: 12/19/2022] Open
Abstract
Lactation persistency (LP), defined as the ability of a cow to maintain milk production at a high level after milk peak, is an important phenotype for the dairy industry. In this study, we used a targeted genotyping approach to scan for potentially functional single nucleotide polymorphisms (SNPs) within 57 potential candidate genes derived from our previous genome wide association study on LP and from the literature. A total of 175,490 SNPs were annotated within 10-kb flanking regions of the selected candidate genes. After applying several filtering steps, a total of 105 SNPs were retained for genotyping using target genotyping arrays. SNP association analyses were performed in 1,231 Holstein cows with 69 polymorphic SNPs using the univariate liner mixed model with polygenic effects using DMU package. Six SNPs including rs43770847, rs208794152, and rs208332214 in ADRM1; rs209443540 in C5orf34; rs378943586 in DDX11; and rs385640152 in GHR were suggestively significantly associated with LP based on additive effects and associations with 4 of them (rs43770847, rs208794152, rs208332214, and rs209443540) were based on dominance effects at P < 0.05. However, none of the associations remained significant at false discovery rate adjusted P (FDR) < 0.05. The additive variances explained by each suggestively significantly associated SNP ranged from 0.15% (rs43770847 in ADRM1) to 5.69% (rs209443540 in C5orf34), suggesting that these SNPs might be used in genetic selection for enhanced LP. The percentage of phenotypic variance explained by dominance effect ranged from 0.24% to 1.35% which suggests that genetic selection for enhanced LP might be more efficient by inclusion of dominance effects. Overall, this study identified several potentially functional variants that might be useful for selection programs for higher LP. Finally, a combination of identification of potentially functional variants followed by targeted genotyping and association analysis is a cost-effective approach for increasing the power of genetic association studies.
Collapse
Affiliation(s)
- Duy Ngoc Do
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada.,Department of Animal Science and Aquaculture, Dalhousie University, Truro, Canada
| | - Nathalie Bissonnette
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| | - Pierre Lacasse
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| | - Filippo Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Canada
| | - Xin Zhao
- Department of Animal Science, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Eveline M Ibeagha-Awemu
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| |
Collapse
|
10
|
Halachev M, Meynert A, Taylor MS, Vitart V, Kerr SM, Klaric L, Aitman TJ, Haley CS, Prendergast JG, Pugh C, Hume DA, Harris SE, Liewald DC, Deary IJ, Semple CA, Wilson JF. Increased ultra-rare variant load in an isolated Scottish population impacts exonic and regulatory regions. PLoS Genet 2019; 15:e1008480. [PMID: 31765389 PMCID: PMC6901239 DOI: 10.1371/journal.pgen.1008480] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 12/09/2019] [Accepted: 10/15/2019] [Indexed: 01/03/2023] Open
Abstract
Human population isolates provide a snapshot of the impact of historical demographic processes on population genetics. Such data facilitate studies of the functional impact of rare sequence variants on biomedical phenotypes, as strong genetic drift can result in higher frequencies of variants that are otherwise rare. We present the first whole genome sequencing (WGS) study of the VIKING cohort, a representative collection of samples from the isolated Shetland population in northern Scotland, and explore how its genetic characteristics compare to a mainland Scottish population. Our analyses reveal the strong contributions played by the founder effect and genetic drift in shaping genomic variation in the VIKING cohort. About one tenth of all high-quality variants discovered are unique to the VIKING cohort or are seen at frequencies at least ten fold higher than in more cosmopolitan control populations. Multiple lines of evidence also suggest relaxation of purifying selection during the evolutionary history of the Shetland isolate. We demonstrate enrichment of ultra-rare VIKING variants in exonic regions and for the first time we also show that ultra-rare variants are enriched within regulatory regions, particularly promoters, suggesting that gene expression patterns may diverge relatively rapidly in human isolates.
Collapse
Affiliation(s)
- Mihail Halachev
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Alison Meynert
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Martin S. Taylor
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Veronique Vitart
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Shona M. Kerr
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Lucija Klaric
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | | | - Timothy J. Aitman
- Centre for Genomic and Experimental Medicine, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - Chris S. Haley
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian, United Kingdom
| | - James G. Prendergast
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian, United Kingdom
| | - Carys Pugh
- Centre for Clinical Brain Sciences, Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, United Kingdom
| | - David A. Hume
- Mater Research Institute, University of Queensland, Woolloongabba, Australia
| | - Sarah E. Harris
- Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, George Square, Edinburgh, United Kingdom
| | - David C. Liewald
- Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, George Square, Edinburgh, United Kingdom
| | - Ian J. Deary
- Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, George Square, Edinburgh, United Kingdom
| | - Colin A. Semple
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
| | - James F. Wilson
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Crewe Road, Edinburgh, United Kingdom
- Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh, United Kingdom
| |
Collapse
|
11
|
Dozmorov MG. Disease classification: from phenotypic similarity to integrative genomics and beyond. Brief Bioinform 2019; 20:1769-1780. [DOI: 10.1093/bib/bby049] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Abstract
A fundamental challenge of modern biomedical research is understanding how diseases that are similar on the phenotypic level are similar on the molecular level. Integration of various genomic data sets with the traditionally used phenotypic disease similarity revealed novel genetic and molecular mechanisms and blurred the distinction between monogenic (Mendelian) and complex diseases. Network-based medicine has emerged as a complementary approach for identifying disease-causing genes, genetic mediators, disruptions in the underlying cellular functions and for drug repositioning. The recent development of machine and deep learning methods allow for leveraging real-life information about diseases to refine genetic and phenotypic disease relationships. This review describes the historical development and recent methodological advancements for studying disease classification (nosology).
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main Street, Richmond, VA, USA
| |
Collapse
|
12
|
Kanduri C, Bock C, Gundersen S, Hovig E, Sandve GK. Colocalization analyses of genomic elements: approaches, recommendations and challenges. Bioinformatics 2019; 35:1615-1624. [PMID: 30307532 PMCID: PMC6499241 DOI: 10.1093/bioinformatics/bty835] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 09/03/2018] [Accepted: 10/10/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Many high-throughput methods produce sets of genomic regions as one of their main outputs. Scientists often use genomic colocalization analysis to interpret such region sets, for example to identify interesting enrichments and to understand the interplay between the underlying biological processes. Although widely used, there is little standardization in how these analyses are performed. Different practices can substantially affect the conclusions of colocalization analyses. RESULTS Here, we describe the different approaches and provide recommendations for performing genomic colocalization analysis, while also discussing common methodological challenges that may influence the conclusions. As illustrated by concrete example cases, careful attention to analysis details is needed in order to meet these challenges and to obtain a robust and biologically meaningful interpretation of genomic region set data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Sveinung Gundersen
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo, Norway
- Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo, Norway, UK
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| |
Collapse
|
13
|
O'Mara TA, Glubb DM, Kho PF, Thompson DJ, Spurdle AB. Genome-Wide Association Studies of Endometrial Cancer: Latest Developments and Future Directions. Cancer Epidemiol Biomarkers Prev 2019; 28:1095-1102. [DOI: 10.1158/1055-9965.epi-18-1031] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 11/29/2018] [Accepted: 04/19/2019] [Indexed: 11/16/2022] Open
|
14
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Abstract
OBJECTIVES Glucocorticoids such as dexamethasone have pleiotropic effects, including desired antileukemic, anti-inflammatory, or immunosuppressive effects, and undesired metabolic or toxic effects. The most serious adverse effects of dexamethasone among patients with acute lymphoblastic leukemia are osteonecrosis and thrombosis. To identify inherited genomic variation involved in these severe adverse effects, we carried out genome-wide association studies (GWAS) by analyzing 14 pleiotropic glucocorticoid phenotypes in 391 patients with acute lymphoblastic leukemia. PATIENTS AND METHODS We used the Projection Onto the Most Interesting Statistical Evidence integrative analysis technique to identify genetic variants associated with pleiotropic dexamethasone phenotypes, stratifying for age, sex, race, and treatment, and compared the results with conventional single-phenotype GWAS. The phenotypes were osteonecrosis, central nervous system toxicity, hyperglycemia, hypokalemia, thrombosis, dexamethasone exposure, BMI, growth trajectory, and levels of cortisol, albumin, and asparaginase antibodies, and changes in cholesterol, triglycerides, and low-density lipoproteins after dexamethasone. RESULTS The integrative analysis identified more pleiotropic single nucleotide polymorphism variants (P=1.46×10(-215), and these variants were more likely to be in gene-regulatory regions (P=1.22×10(-6)) than traditional single-phenotype GWAS. The integrative analysis yielded genomic variants (rs2243057 and rs6453253) in F2RL1, a receptor that functions in hemostasis, thrombosis, and inflammation, which were associated with pleiotropic effects, including osteonecrosis and thrombosis, and were in regulatory gene regions. CONCLUSION The integrative pleiotropic analysis identified risk variants for osteonecrosis and thrombosis not identified by single-phenotype analysis that may have importance for patients with underlying sensitivity to multiple dexamethasone adverse effects.
Collapse
|
16
|
Bhuiyan MSA, Lim D, Park M, Lee S, Kim Y, Gondro C, Park B, Lee S. Functional Partitioning of Genomic Variance and Genome-Wide Association Study for Carcass Traits in Korean Hanwoo Cattle Using Imputed Sequence Level SNP Data. Front Genet 2018; 9:217. [PMID: 29988410 PMCID: PMC6024024 DOI: 10.3389/fgene.2018.00217] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 05/28/2018] [Indexed: 11/25/2022] Open
Abstract
Quantitative traits are usually controlled by numerous genomic variants with small individual effects, and variances associated with those traits are explained in a continuous manner. However, the relative contributions of genomic regions to observed genetic variations have not been well explored using sequence level single nucleotide polymorphism (SNP) information. Here, imputed sequence level SNP data (11,278,153 SNPs) of 2109 Hanwoo steers (Korean native cattle) were partitioned according to functional annotation, chromosome, and minor allele frequency (MAF). Genomic relationship matrices (GRMs) were constructed for each classified region and fitted in the model both separately and together for carcass weight (CWT), eye muscle area (EMA), backfat thickness (BFT), and marbling score (MS) traits. A genome-wide association study (GWAS) was performed to identify significantly associated variants in genic and exon regions using a linear mixed model, and the genetic contribution of each exonic SNP was determined using a Bayesian mixture model. Considering all SNPs together, the heritability estimates for CWT, EMA, BFT, and MS were 0.57 ± 0.05, 0.46 ± 0.05, 0.45 ± 0.05, and 0.49 ± 0.05, respectively, which reflected substantial genomic contributions. Joint analysis revealed that the variance explained by each chromosome was proportional to its physical length with weak linear relationships for all traits. Moreover, genomic variances explained by functional category and MAF class differed greatly among the traits studied in joint analysis. For example, exon regions had larger contributions for BFT (0.13 ± 0.08) and MS (0.22 ± 0.08), whereas intron and intergenic regions explained most of the total genomic variances for CWT and EMA (0.22 ± 0.09–0.32 ± 0.11). Considering different functional classes of exon regions and the per SNP contribution revealed the largest proportion of genetic variance was attributable to synonymous variants. GWAS detected 206 and 27 SNPs in genic and exon regions, respectively, on BTA4, BTA6, and BTA14 that were significantly associated with CWT and EMA. These SNPs were harbored by 31 candidate genes, among which TOX, FAM184B, PPARGC1A, PRKDC, LCORL, and COL1A2 were noteworthy. BayesR analysis found that most SNPs (>93%) had very small effects and the 4.02–6.92% that had larger effects (10-4 × σA2, 10-3 × σA2, and 10-2 × σA2) explained most of the total genetic variance, confirming polygenic components of the traits studied.
Collapse
Affiliation(s)
- Mohammad S A Bhuiyan
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea.,Department of Animal Breeding and Genetics, Bangladesh Agricultural University, Mymensingh, Bangladesh
| | - Dajeong Lim
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Mina Park
- Animal Genetic Improvement Division, National Institute of Animal Science, Rural Development Administration, Seonghwan, South Korea
| | - Soohyun Lee
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea
| | - Yeongkuk Kim
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea
| | - Cedric Gondro
- College of Agriculture and Natural Resources, Michigan State University, East Lansing, MI, United States
| | - Byoungho Park
- Animal Genetic Improvement Division, National Institute of Animal Science, Rural Development Administration, Seonghwan, South Korea
| | - Seunghwan Lee
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea
| |
Collapse
|
17
|
Koufariotis LT, Chen YPP, Stothard P, Hayes BJ. Variance explained by whole genome sequence variants in coding and regulatory genome annotations for six dairy traits. BMC Genomics 2018; 19:237. [PMID: 29618315 PMCID: PMC5885354 DOI: 10.1186/s12864-018-4617-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 03/22/2018] [Indexed: 02/03/2023] Open
Abstract
Background There are an exceedingly large number of sequence variants discovered through whole genome sequencing in most populations, including cattle. Deciphering which of these affect complex traits is a major challenge. In this study we hypothesize that variants in some functional classes, such as splice site regions, coding regions, DNA methylated regions and long noncoding RNA will explain more variance in complex traits than others. Two variance component approaches were used to test this hypothesis – the first determines if variants in a functional class capture a greater proportion of the variance, than expected by chance, the second uses the proportion of variance explained when variants in all annotations are fitted simultaneously. Results Our data set consisted of 28.3 million imputed whole genome sequence variants in 16,581 dairy cattle with records for 6 complex trait phenotypes, including production and fertility. We found that sequence variants in splice site regions and synonymous classes captured the greatest proportion of the variance, explaining up to 50% of the variance across all traits. We also found sequence variants in target sites for DNA methylation (genomic regions that are found be highly methylated in bovine placentas), captured a significant proportion of the variance. Per sequence variant, splice site variants explain the highest proportion of variance in this study. The proportion of variance captured by the missense predicted deleterious (from SIFT) and missense tolerated classes was relatively small. Conclusion The results demonstrate using functional annotations to filter whole genome sequence variants into more informative subsets could be useful for prioritization of the variants that are more likely to be associated with complex traits. In addition to variants found in splice sites and protein coding genes regulatory variants and those found in DNA methylated regions, explained considerable variation in milk production and fertility traits. In our analysis synonymous variants captured a significant proportion of the variance, which raises the possible explanation that synonymous mutations might have some effects, or more likely that these variants are miss-annotated, or alternatively the results reflect imperfect imputation of the actual causative variants. Electronic supplementary material The online version of this article (10.1186/s12864-018-4617-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lambros T Koufariotis
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, Building 80, 306 Carmody Road, Brisbane, St Lucia, QLD, 4072, Australia. .,Collage of Science, Health and Engineering, La Trobe University, Melbourne, VIC, 3086, Australia. .,Department of Economic Development, Jobs, Transport and Resources, AgriBio Building, 5 Ring Road, Bundoora, VIC, 3086, Australia. .,Dairy Bio, 5 Ring Road, Bundoora, VIC, 3086, Australia.
| | - Yi-Ping Phoebe Chen
- Collage of Science, Health and Engineering, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2C8, Canada
| | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, Building 80, 306 Carmody Road, Brisbane, St Lucia, QLD, 4072, Australia.,Department of Economic Development, Jobs, Transport and Resources, AgriBio Building, 5 Ring Road, Bundoora, VIC, 3086, Australia.,Dairy Bio, 5 Ring Road, Bundoora, VIC, 3086, Australia
| |
Collapse
|
18
|
Savova V, Vinogradova S, Pruss D, Gimelbrant AA, Weiss LA. Risk alleles of genes with monoallelic expression are enriched in gain-of-function variants and depleted in loss-of-function variants for neurodevelopmental disorders. Mol Psychiatry 2017; 22:1785-1794. [PMID: 28265118 PMCID: PMC5589474 DOI: 10.1038/mp.2017.13] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Revised: 12/01/2016] [Accepted: 01/09/2017] [Indexed: 02/06/2023]
Abstract
Over 3000 human genes can be expressed from a single allele in one cell, and from the other allele-or both-in neighboring cells. Little is known about the consequences of this epigenetic phenomenon, monoallelic expression (MAE). We hypothesized that MAE increases expression variability, with a potential impact on human disease. Here, we use a chromatin signature to infer MAE for genes in lymphoblastoid cell lines and human fetal brain tissue. We confirm that across clones MAE status correlates with expression level, and that in human tissue data sets, MAE genes show increased expression variability. We then compare mono- and biallelic genes at three distinct scales. In the human population, we observe that genes with polymorphisms influencing expression variance are more likely to be MAE (P<1.1 × 10-6). At the trans-species level, we find gene expression differences and directional selection between humans and chimpanzees more common among MAE genes (P<0.05). Extending to human disease, we show that MAE genes are under-represented in neurodevelopmental copy number variants (CNVs) (P<2.2 × 10-10), suggesting that pathogenic variants acting via expression level are less likely to involve MAE genes. Using neuropsychiatric single-nucleotide polymorphism (SNP) and single-nucleotide variant (SNV) data, we see that genes with pathogenic expression-altering or loss-of-function variants are less likely MAE (P<7.5 × 10-11) and genes with only missense or gain-of-function variants are more likely MAE (P<1.4 × 10-6). Together, our results suggest that MAE genes tolerate a greater range of expression level than biallelic expression (BAE) genes, and this information may be useful in prediction of pathogenicity.
Collapse
Affiliation(s)
- V Savova
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - S Vinogradova
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - D Pruss
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - A A Gimelbrant
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - L A Weiss
- Department of Psychiatry and Institute for Human Genetics, University of California San Francisco, Langley Porter Psychiatric Institute, Nina Ireland Lab, San Francisco, CA, USA
| |
Collapse
|
19
|
Leveraging genome characteristics to improve gene discovery for putamen subcortical brain structure. Sci Rep 2017; 7:15736. [PMID: 29147026 PMCID: PMC5691156 DOI: 10.1038/s41598-017-15705-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 10/31/2017] [Indexed: 12/21/2022] Open
Abstract
Discovering genetic variants associated with human brain structures is an on-going effort. The ENIGMA consortium conducted genome-wide association studies (GWAS) with standard multi-study analytical methodology and identified several significant single nucleotide polymorphisms (SNPs). Here we employ a novel analytical approach that incorporates functional genome annotations (e.g., exon or 5′UTR), total linkage disequilibrium (LD) scores and heterozygosity to construct enrichment scores for improved identification of relevant SNPs. The method provides increased power to detect associated SNPs by estimating stratum-specific false discovery rate (FDR), where strata are classified according to enrichment scores. Applying this approach to the GWAS summary statistics of putamen volume in the ENIGMA cohort, a total of 15 independent significant SNPs were identified (conditional FDR < 0.05). In contrast, 4 SNPs were found based on standard GWAS analysis (P < 5 × 10−8). These 11 novel loci include GATAD2B, ASCC3, DSCAML1, and HELZ, which are previously implicated in various neural related phenotypes. The current findings demonstrate the boost in power with the annotation-informed FDR method, and provide insight into the genetic architecture of the putamen.
Collapse
|
20
|
Markunas CA, Johnson EO, Hancock DB. Comprehensive evaluation of disease- and trait-specific enrichment for eight functional elements among GWAS-identified variants. Hum Genet 2017; 136:911-919. [PMID: 28567521 DOI: 10.1007/s00439-017-1815-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 05/22/2017] [Indexed: 01/17/2023]
Abstract
Genome-wide association study (GWAS)-identified variants are enriched for functional elements. However, we have limited knowledge of how functional enrichment may differ by disease/trait and tissue type. We tested a broad set of eight functional elements for enrichment among GWAS-identified SNPs (p < 5×10-8) from the NHGRI-EBI Catalog across seven disease/trait categories: cancer, cardiovascular disease, diabetes, autoimmune disease, psychiatric disease, neurological disease, and anthropometric traits. SNPs were annotated using HaploReg for the eight functional elements across any tissue: DNase sites, expression quantitative trait loci (eQTL), sequence conservation, enhancers, promoters, missense variants, sequence motifs, and protein binding sites. In addition, tissue-specific annotations were considered for brain vs. blood. Disease/trait SNPs were compared to a control set of 4809 SNPs matched to the GWAS SNPs (N = 1639) on allele frequency, gene density, distance to nearest gene, and linkage disequilibrium at ~3:1 ratio. Enrichment analyses were conducted using logistic regression, with Bonferroni correction. Overall, a significant enrichment was observed for all functional elements, except sequence motifs. Missense SNPs showed the strongest magnitude of enrichment. eQTLs were the only functional element significantly enriched across all diseases/traits. Magnitudes of enrichment were generally similar across diseases/traits, where enrichment was statistically significant. Blood vs. brain tissue effects on enrichment were dependent on disease/trait and functional element (e.g., cardiovascular disease: eQTLs P TissueDifference = 1.28 × 10-6 vs. enhancers P TissueDifference = 0.94). Identifying disease/trait-relevant functional elements and tissue types could provide new insight into the underlying biology, by guiding a priori GWAS analyses (e.g., brain enhancer elements for psychiatric disease) or facilitating post hoc interpretation.
Collapse
Affiliation(s)
- Christina A Markunas
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA.
| | - Eric O Johnson
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA.,Fellow Program, RTI International, Research Triangle Park, NC, USA
| | - Dana B Hancock
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA
| |
Collapse
|
21
|
Bastami M, Nariman-Saleh-Fam Z, Saadatian Z, Nariman-Saleh-Fam L, Omrani MD, Ghaderian SMH, Masotti A. The miRNA targetome of coronary artery disease is perturbed by functional polymorphisms identified and prioritized by in-depth bioinformatics analyses exploiting genome-wide association studies. Gene 2016; 594:74-81. [DOI: 10.1016/j.gene.2016.08.054] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Revised: 08/27/2016] [Accepted: 08/31/2016] [Indexed: 12/22/2022]
|
22
|
Do DN, Janss LLG, Jensen J, Kadarmideen HN. SNP annotation-based whole genomic prediction and selection: an application to feed efficiency and its component traits in pigs. J Anim Sci 2016; 93:2056-63. [PMID: 26020301 DOI: 10.2527/jas.2014-8640] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The study investigated genetic architecture and predictive ability using genomic annotation of residual feed intake (RFI) and its component traits (daily feed intake [DFI], ADG, and back fat [BF]). A total of 1,272 Duroc pigs had both genotypic and phenotypic records, and the records were split into a training (968 pigs) and a validation dataset (304 pigs) by assigning records as before and after January 1, 2012, respectively. SNP were annotated by 14 different classes using Ensembl variant effect prediction. Predictive accuracy and prediction bias were calculated using Bayesian Power LASSO, Bayesian A, B, and Cπ, and genomic BLUP (GBLUP) methods. Predictive accuracy ranged from 0.508 to 0.531, 0.506 to 0.532, 0.276 to 0.357, and 0.308 to 0.362 for DFI, RFI, ADG, and BF, respectively. BayesCπ100.1 increased accuracy slightly compared to the GBLUP model and other methods. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP groups. Genomic prediction has accuracy comparable to observed phenotype, and use of genomic prediction can be cost effective by replacing feed intake measurement. Genomic annotation had less impact on predictive accuracy traits considered here but may be different for other traits. It is the first study to provide useful insights into biological classes of SNP driving the whole genomic prediction for complex traits in pigs.
Collapse
|
23
|
Abdollahi-Arpanahi R, Morota G, Valente BD, Kranis A, Rosa GJM, Gianola D. Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens. Genet Sel Evol 2016; 48:10. [PMID: 26842494 PMCID: PMC4739338 DOI: 10.1186/s12711-016-0187-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 01/15/2016] [Indexed: 11/15/2022] Open
Abstract
Background Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. Methods A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5′ and 3′ untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Results Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. Conclusions All genic and non-genic regions contributed to phenotypic variation for the three traits studied. Overall, the contribution of additive genetic variance to the total genetic variance was much greater than that of dominance variance. Our results show that all genomic regions are important for the prediction of the targeted traits, and the whole-genome approach was reaffirmed as the best tool for genome-enabled prediction of quantitative traits.
Collapse
Affiliation(s)
- Rostam Abdollahi-Arpanahi
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA. .,Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, Pakdasht, Iran.
| | - Gota Morota
- Department of Animal Science, University of Nebraska, Lincoln, NE, USA.
| | - Bruno D Valente
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA. .,Department of Dairy Science, University of Wisconsin, Madison, WI, USA.
| | - Andreas Kranis
- Aviagen Ltd, Midlothian, UK. .,The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK.
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA. .,Department of Dairy Science, University of Wisconsin, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.
| |
Collapse
|
24
|
Wang Q, Yang C, Gelernter J, Zhao H. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Hum Genet 2015; 134:1195-209. [PMID: 26340901 DOI: 10.1007/s00439-015-1596-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Accepted: 08/23/2015] [Indexed: 02/01/2023]
Abstract
Although some existing epidemiological observations and molecular experiments suggested that brain disorders in the realm of psychiatry may be influenced by immune dysregulation, the degree of genetic overlap between psychiatric disorders and immune disorders has not been well established. We investigated this issue by integrative analysis of genome-wide association studies of 18 complex human traits/diseases (five psychiatric disorders, seven immune disorders, and others) and multiple genome-wide annotation resources (central nervous system genes, immune-related expression-quantitative trait loci (eQTL) and DNase I hypertensive sites from 98 cell lines). We detected pleiotropy in 24 of the 35 psychiatric-immune disorder pairs. The strongest pleiotropy was observed for schizophrenia-rheumatoid arthritis with MHC region included in the analysis (p = 3.9 x 10(-285), and schizophrenia-Crohn's disease with MHC region excluded (p = 1.1 x 10(-36). Significant enrichment (> 1.4 fold) of immune-related eQTL was observed in four psychiatric disorders. Genomic regions responsible for pleiotropy between psychiatric disorders and immune disorders were detected. The MHC region on chromosome 6 appears to be the most important with other regions, such as cytoband 1p13.2, also playing significant roles in pleiotropy. We also found that most alleles shared between schizophrenia and Crohn's disease have the same effect direction, with similar trend found for other disorder pairs, such as bipolar-Crohn's disease. Our results offer a novel bird's-eye view of the genetic relationship and demonstrate strong evidence for pervasive pleiotropy between psychiatric disorders and immune disorders. Our findings might open new routes for prevention and treatment strategies for these disorders based on a new appreciation of the importance of immunological mechanisms in mediating risk of many psychiatric diseases.
Collapse
Affiliation(s)
- Qian Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA CT Healthcare Center, West Haven, CT, USA
| | - Can Yang
- VA CT Healthcare Center, West Haven, CT, USA.,Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.,Department of Mathematics, Hong Kong Baptist University, Hong Kong, Hong Kong SAR
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.,VA CT Healthcare Center, West Haven, CT, USA.,Department of Neurobiology, Yale School of Medicine, New Haven, CT, USA.,Department of Genetics, Yale School of Medicine, West Haven, CT, USA
| | - Hongyu Zhao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. .,Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA. .,Department of Genetics, Yale School of Medicine, West Haven, CT, USA. .,VA Cooperative Studies Program Coordinating Center, West Haven, CT, USA.
| |
Collapse
|
25
|
Young RS, Hayashizaki Y, Andersson R, Sandelin A, Kawaji H, Itoh M, Lassmann T, Carninci P, Bickmore WA, Forrest AR, Taylor MS. The frequent evolutionary birth and death of functional promoters in mouse and human. Genome Res 2015; 25:1546-57. [PMID: 26228054 PMCID: PMC4579340 DOI: 10.1101/gr.190546.115] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/28/2015] [Indexed: 12/04/2022]
Abstract
Promoters are central to the regulation of gene expression. Changes in gene regulation are thought to underlie much of the adaptive diversification between species and phenotypic variation within populations. In contrast to earlier work emphasizing the importance of enhancer evolution and subtle sequence changes at promoters, we show that dramatic changes such as the complete gain and loss (collectively, turnover) of functional promoters are common. Using quantitative measures of transcription initiation in both humans and mice across 52 matched tissues, we discriminate promoter sequence gains from losses and resolve the lineage of changes. We also identify expression divergence and functional turnover between orthologous promoters, finding only the latter is associated with local sequence changes. Promoter turnover has occurred at the majority (>56%) of protein-coding genes since humans and mice diverged. Tissue-restricted promoters are the most evolutionarily volatile where retrotransposition is an important, but not the sole, source of innovation. There is considerable heterogeneity of turnover rates between promoters in different tissues, but the consistency of these in both lineages suggests that the same biological systems are similarly inclined to transcriptional rewiring. The genes affected by promoter turnover show evidence of adaptive evolution. In mice, promoters are primarily lost through deletion of the promoter containing sequence, whereas in humans, many promoters appear to be gradually decaying with weak transcriptional output and relaxed selective constraint. Our results suggest that promoter gain and loss is an important process in the evolutionary rewiring of gene regulation and may be a significant source of phenotypic diversification.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Yoshihide Hayashizaki
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan
| | - Robin Andersson
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Hideya Kawaji
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Masayoshi Itoh
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Timo Lassmann
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | | | - Wendy A Bickmore
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Alistair R Forrest
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan; Systems Biology and Genomics, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, Western Australia 6009, Australia
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| |
Collapse
|
26
|
Kirsten H, Al-Hasani H, Holdt L, Gross A, Beutner F, Krohn K, Horn K, Ahnert P, Burkhardt R, Reiche K, Hackermüller J, Löffler M, Teupser D, Thiery J, Scholz M. Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci†. Hum Mol Genet 2015; 24:4746-63. [PMID: 26019233 PMCID: PMC4512630 DOI: 10.1093/hmg/ddv194] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 05/21/2015] [Indexed: 12/24/2022] Open
Abstract
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes.
Collapse
Affiliation(s)
- Holger Kirsten
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases, Cognitive Genetics, Department of Cell Therapy
| | - Hoor Al-Hasani
- Department for Computer Science, Analysis Strategies Group, Department of Diagnostics, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Lesca Holdt
- Institute of Laboratory Medicine, Ludwig-Maximilians-University, Munich, Germany
| | - Arnd Gross
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Frank Beutner
- LIFE - Leipzig Research Center for Civilization Diseases, Department of Internal Medicine/Cardiology, Heart Center
| | - Knut Krohn
- Interdisciplinary Center for Clinical Research, Faculty of Medicine and
| | - Katrin Horn
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Peter Ahnert
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Ralph Burkhardt
- LIFE - Leipzig Research Center for Civilization Diseases, Institute of Laboratory Medicine, University of Leipzig, Leipzig, Germany
| | - Kristin Reiche
- Department for Computer Science, RNomics Group, Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology- IZI, Leipzig, Germany, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Jörg Hackermüller
- Department for Computer Science, RNomics Group, Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology- IZI, Leipzig, Germany, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Markus Löffler
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Daniel Teupser
- Institute of Laboratory Medicine, Ludwig-Maximilians-University, Munich, Germany
| | - Joachim Thiery
- LIFE - Leipzig Research Center for Civilization Diseases, Institute of Laboratory Medicine, University of Leipzig, Leipzig, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases,
| |
Collapse
|
27
|
Gagliano SA, Paterson AD, Weale ME, Knight J. Assessing models for genetic prediction of complex traits: a comparison of visualization and quantitative methods. BMC Genomics 2015; 16:405. [PMID: 25997848 PMCID: PMC4440290 DOI: 10.1186/s12864-015-1616-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Accepted: 05/05/2015] [Indexed: 11/13/2022] Open
Abstract
Background In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models. Methods We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not. Results We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1616-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah A Gagliano
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada. .,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada. .,Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada.
| | - Andrew D Paterson
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada. .,Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada. .,Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada. .,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. .,Epidemiology Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
| | - Michael E Weale
- Department of Medical & Molecular Genetics, King's College London, Guy's Hospital, London, UK.
| | - Jo Knight
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada. .,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada. .,Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada. .,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
28
|
Strike LT, Couvy-Duchesne B, Hansell NK, Cuellar-Partida G, Medland SE, Wright MJ. Genetics and Brain Morphology. Neuropsychol Rev 2015; 25:63-96. [DOI: 10.1007/s11065-015-9281-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/08/2015] [Indexed: 12/17/2022]
|
29
|
Foroughi Asl H, Talukdar HA, Kindt ASD, Jain RK, Ermel R, Ruusalepp A, Nguyen KDH, Dobrin R, Reilly DF, Schunkert H, Samani NJ, Braenne I, Erdmann J, Melander O, Qi J, Ivert T, Skogsberg J, Schadt EE, Michoel T, Björkegren JLM. Expression quantitative trait Loci acting across multiple tissues are enriched in inherited risk for coronary artery disease. ACTA ACUST UNITED AC 2015; 8:305-15. [PMID: 25578447 DOI: 10.1161/circgenetics.114.000640] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 12/16/2014] [Indexed: 12/13/2022]
Abstract
BACKGROUND Despite recent discoveries of new genetic risk factors, the majority of risk for coronary artery disease (CAD) remains elusive. As the most proximal sensor of DNA variation, RNA abundance can help identify subpopulations of genetic variants active in and across tissues mediating CAD risk through gene expression. METHODS AND RESULTS By generating new genomic data on DNA and RNA samples from the Stockholm Atherosclerosis Gene Expression (STAGE) study, 8156 cis-acting expression quantitative trait loci (eQTLs) for 6450 genes across 7 CAD-relevant tissues were detected. The inherited risk enrichments of tissue-defined sets of these eQTLs were assessed using 2 independent genome-wide association data sets. eQTLs acting across increasing numbers of tissues were found increasingly enriched for CAD risk and resided at regulatory hot spots. The risk enrichment of 42 eQTLs acting across 5 to 6 tissues was particularly high (≤7.3-fold) and confirmed in the combined genome-wide association data from Coronary Artery Disease Genome Wide Replication And Meta-Analysis Consortium. Sixteen of the 42 eQTLs associated with 19 master regulatory genes and 29 downstream gene sets (n>30) were further risk enriched comparable to that of the 153 genome-wide association risk single-nucleotide polymorphisms established for CAD (8.4-fold versus 10-fold). Three gene sets, governed by the master regulators FLYWCH1, PSORSIC3, and G3BP1, segregated the STAGE patients according to extent of CAD, and small interfering RNA targeting of these master regulators affected cholesterol-ester accumulation in foam cells of the THP1 monocytic cell line. CONCLUSIONS eQTLs acting across multiple tissues are significant carriers of inherited risk for CAD. FLYWCH1, PSORSIC3, and G3BP1 are novel master regulatory genes in CAD that may be suitable targets.
Collapse
|
30
|
Stainton JJ, Haley CS, Charlesworth B, Kranis A, Watson K, Wiener P. Detecting signatures of selection in nine distinct lines of broiler chickens. Anim Genet 2014; 46:37-49. [PMID: 25515710 DOI: 10.1111/age.12252] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2014] [Indexed: 01/26/2023]
Abstract
Modern commercial chickens have been bred for one of two specific purposes: meat production (broilers) or egg production (layers). This has led to large phenotypic changes, so that the genomic signatures of selection may be detectable using statistical techniques. Genetic differentiation between nine distinct broiler lines was calculated using Weir and Cockerham's pairwise FST estimator for 11 003 genome-wide markers to identify regions showing evidence of differential selection across lines. Differentiation measures were averaged into overlapping sliding windows for each line, and a permutation approach was used to determine the significance of each window. A total of 51 regions were found to show significant differentiation between the lines. Several lines were consistently found to share significant regions, suggesting that the pattern of line divergence is related to selection for broiler traits. The majority of the 51 regions contain QTL relating to broiler traits, but only five of them were found to be significantly enriched for broiler QTL, including a region on chromosome 27 containing 39 broiler QTL and 114 genes. Additionally, a number of these regions have been identified by other selection mapping studies. This study has identified a large number of potential selection signatures, and further tests with higher-density marker data may narrow these regions down to individual genes.
Collapse
Affiliation(s)
- John J Stainton
- The Roslin Institute and R(D)SVS, University of Edinburgh, Midlothian, EH25 9RG, UK
| | | | | | | | | | | |
Collapse
|
31
|
Sadee W, Hartmann K, Seweryn M, Pietrzak M, Handelman SK, Rempala GA. Missing heritability of common diseases and treatments outside the protein-coding exome. Hum Genet 2014; 133:1199-1215. [PMID: 25107510 PMCID: PMC4169001 DOI: 10.1007/s00439-014-1476-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 07/23/2014] [Indexed: 02/07/2023]
Abstract
Genetic factors strongly influence risk of common human diseases and treatment outcomes but the causative variants remain largely unknown; this gap has been called the 'missing heritability'. We propose several hypotheses that in combination have the potential to narrow the gap. First, given a multi-stage path from wellness to disease, we propose that common variants under positive evolutionary selection represent normal variation and gate the transition between wellness and an 'off-well' state, revealing adaptations to changing environmental conditions. In contrast, genome-wide association studies (GWAS) focus on deleterious variants conveying disease risk, accelerating the path from off-well to illness and finally specific diseases, while common 'normal' variants remain hidden in the noise. Second, epistasis (dynamic gene-gene interactions) likely assumes a central role in adaptations and evolution; yet, GWAS analyses currently are poorly designed to reveal epistasis. As gene regulation is germane to adaptation, we propose that epistasis among common normal regulatory variants, or between common variants and less frequent deleterious variants, can have strong protective or deleterious phenotypic effects. These gene-gene interactions can be highly sensitive to environmental stimuli and could account for large differences in drug response between individuals. Residing largely outside the protein-coding exome, common regulatory variants affect either transcription of coding and non-coding RNAs (regulatory SNPs, or rSNPs) or RNA functions and processing (structural RNA SNPs, or srSNPs). Third, with the vast majority of causative variants yet to be discovered, GWAS rely on surrogate markers, a confounding factor aggravated by the presence of more than one causative variant per gene and by epistasis. We propose that the confluence of these factors may be responsible to large extent for the observed heritability gap.
Collapse
Affiliation(s)
- Wolfgang Sadee
- Department of Pharmacology, Center for Pharmacogenomics, College of Medicine, The Ohio State University Wexner Medical Center, 5184A Graves Hall, 333 West 10th Avenue, Columbus, OH, 43210, USA,
| | | | | | | | | | | |
Collapse
|
32
|
Koufariotis L, Chen YPP, Bolormaa S, Hayes BJ. Regulatory and coding genome regions are enriched for trait associated variants in dairy and beef cattle. BMC Genomics 2014; 15:436. [PMID: 24903263 PMCID: PMC4070550 DOI: 10.1186/1471-2164-15-436] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Accepted: 05/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In livestock, as in humans, the number of genetic variants that can be tested for association with complex quantitative traits, or used in genomic predictions, is increasing exponentially as whole genome sequencing becomes more common. The power to identify variants associated with traits, particularly those of small effects, could be increased if certain regions of the genome were known a priori to be enriched for associations. Here, we investigate whether twelve genomic annotation classes were enriched or depleted for significant associations in genome wide association studies for complex traits in beef and dairy cattle. We also describe a variance component approach to determine the proportion of genetic variance captured by each annotation class. RESULTS P-values from large GWAS using 700K SNP in both dairy and beef cattle were available for 11 and 10 traits respectively. We found significant enrichment for trait associated variants (SNP significant in the GWAS) in the missense class along with regions 5 kilobases upstream and downstream of coding genes. We found that the non-coding conserved regions (across mammals) were not enriched for trait associated variants. The results from the enrichment or depletion analysis were not in complete agreement with the results from variance component analysis, where the missense and synonymous classes gave the greatest increase in variance explained, while the upstream and downstream classes showed a more modest increase in the variance explained. CONCLUSION Our results indicate that functional annotations could assist in prioritization of variants to a subset more likely to be associated with complex traits; including missense variants, and upstream and downstream regions. The differences in two sets of results (GWAS enrichment depletion versus variance component approaches) might be explained by the fact that the variance component approach has greater power to capture the cumulative effect of mutations of small effect, while the enrichment or depletion approach only captures the variants that are significant in GWAS, which is restricted to a limited number of common variants of moderate effects.
Collapse
Affiliation(s)
- Lambros Koufariotis
- Faculty of Science, Technology and Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.
| | | | | | | |
Collapse
|
33
|
Gagliano SA, Barnes MR, Weale ME, Knight J. A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization. PLoS One 2014; 9:e98122. [PMID: 24844982 PMCID: PMC4028284 DOI: 10.1371/journal.pone.0098122] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 04/28/2014] [Indexed: 01/09/2023] Open
Abstract
The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals ("hits") to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data.
Collapse
Affiliation(s)
- Sarah A. Gagliano
- Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Michael R. Barnes
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Michael E. Weale
- Department of Medical & Molecular Genetics, King’s College London, Guy’s Hospital, London, United Kingdom
| | - Jo Knight
- Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
34
|
Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am J Hum Genet 2014; 94:559-73. [PMID: 24702953 DOI: 10.1016/j.ajhg.2014.03.004] [Citation(s) in RCA: 389] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 03/11/2014] [Indexed: 01/23/2023] Open
Abstract
Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWASs). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. I describe a statistical model that uses association statistics computed across the genome to identify classes of genomic elements that are enriched with or depleted of loci influencing a trait. The model naturally incorporates multiple types of annotations. I applied the model to GWASs of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, body mass index, and Crohn disease. For each trait, I used the model to evaluate the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over 100 tissues and cell lines. The fraction of phenotype-associated SNPs influencing protein sequence ranged from around 2% (for platelet volume) up to around 20% (for low-density lipoprotein cholesterol), repressed chromatin was significantly depleted for SNPs associated with several traits, and cell-type-specific DNase-I hypersensitive sites were enriched with SNPs associated with several traits (for example, the spleen in platelet volume). Finally, reweighting each GWAS by using information from functional genomics increased the number of loci with high-confidence associations by around 5%.
Collapse
|
35
|
Morota G, Abdollahi-Arpanahi R, Kranis A, Gianola D. Genome-enabled prediction of quantitative traits in chickens using genomic annotation. BMC Genomics 2014; 15:109. [PMID: 24502227 PMCID: PMC3922252 DOI: 10.1186/1471-2164-15-109] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 02/04/2014] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Genome-wide association studies have been deemed successful for identifying statistically associated genetic variants of large effects on complex traits. Past studies have found enrichment of trait-associated SNPs in functionally annotated regions, while depletion was reported for intergenic regions (IGR). However, no systematic examination of connections between genomic regions and predictive ability of complex phenotypes has been carried out. RESULTS In this study, we partitioned SNPs based on their annotation to characterize genomic regions that deliver low and high predictive power for three broiler traits in chickens using a whole-genome approach. Additive genomic relationship kernels were constructed for each of the genic regions considered, and a kernel-based Bayesian ridge regression was employed as prediction machine. We found that the predictive performance for ultrasound area of breast meat from using genic regions marked by SNPs was consistently better than that from SNPs in IGR, while IGR tagged by SNPs were better than the genic regions for body weight and hen house egg production. We also noted that predictive ability delivered by the whole battery of markers was close to the best prediction achieved by one of the genomic regions. CONCLUSIONS Whole-genome regression methods use all available quality filtered SNPs into a model, contrary to accommodating only validated SNPs from exonic or coding regions. Our results suggest that, while differences among genomic regions in terms of predictive ability were observed, the whole-genome approach remains as a promising tool if interest is on prediction of complex traits.
Collapse
Affiliation(s)
- Gota Morota
- Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin, USA.
| | | | | | | |
Collapse
|
36
|
Hou L, Zhao H. A review of post-GWAS prioritization approaches. Front Genet 2013; 4:280. [PMID: 24367376 PMCID: PMC3856625 DOI: 10.3389/fgene.2013.00280] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 11/23/2013] [Indexed: 12/13/2022] Open
Abstract
In the recent decade, high-throughput genotyping and next-generation sequencing platforms have enabled genome-wide association studies (GWAS) of many complex human diseases. These studies have discovered many disease susceptible loci, and unveiled unexpected disease mechanisms. Despite these successes, these identified variants only explain a small proportion of the genetic contributions to these diseases and many more remain to be found. This is largely due to the small effect sizes of most disease-associated variants and limited sample size. As a result, it is critical to leverage other information to more effectively prioritize GWAS signals to increase replication rates and better understand disease mechanisms. In this review, we introduce the biological/genomic features that have been found to be informative for post-GWAS prioritization, and discuss available tools to utilize these features for prioritization
Collapse
Affiliation(s)
- Lin Hou
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| |
Collapse
|
37
|
Underestimation of heritability using a mixed model with a polygenic covariance structure in a genome-wide association study for complex traits. Eur J Hum Genet 2013; 22:851-4. [PMID: 24149545 DOI: 10.1038/ejhg.2013.236] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 09/05/2013] [Accepted: 09/10/2013] [Indexed: 11/09/2022] Open
Abstract
Recently, the use of a mixed model methodology in genome-wide association studies (GWAS) has been considered effective for controlling population stratification and explaining the polygenic effects of complex traits. However, estimating polygenic variance components and heritability was biased when the mixed model was used. This bias results from a diluted genetic relationship covariance structure, particularly with a limited number of underlying causal variants. We simulated disease and quantitative phenotypes with a variety of heritabilities (0.1, 0.2, 0.3, 0.4, and 0.5), prevalence rates (0.1, 0.2, 0.3, and 0.5), and causal variant numbers (10, 30, 50, and 100). Heritabilities from the simulated data using restricted maximum likelihood were underestimated in many populations (P<0.05). The underestimation increased with a large heritability, a small prevalence, and a small number of causal variants. The underestimation was larger in analyzing disease traits compared with quantitative traits. This study suggests an underestimated heritability in GWAS upon using the mixed model methodology with an excessively larger number of variants versus causal variants.
Collapse
|