1
|
Zhao Y, Gao J, Feng H, Jiang L. GRAMMAR-Lambda Delivers Efficient Understanding of the Genetic Basis for Head Size in Catfish. BIOLOGY 2025; 14:63. [PMID: 39857294 PMCID: PMC11760490 DOI: 10.3390/biology14010063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 12/30/2024] [Accepted: 01/07/2025] [Indexed: 01/27/2025]
Abstract
The shape of the skull plays a crucial role in the evolution and adaptation of species to their environments. In the case of aquaculture fish, the size of the head is also an important economic trait, as it is linked to fillet yield and ornamental value. This study applies our GRAMMAR-Lambda method to perform a genome-wide association study analysis on loci related to head size in catfish. Compared with traditional GWAS methods, the GRAMMAR-Lambda method offers higher computational efficiency, statistical power, and stability, especially in complex population structures. This research identifies many candidate genes closely related to cranial morphology in terms of head length, width, and depth in catfish, including bmpr1bb, fgfrl1b, nipbl, foxp2, and pax5, etc. Based on the results of gene-gene interaction analysis, we speculate that there may be frequent genetic interactions between chromosome 19 and chromosome 29 in bone development. Additionally, many candidate genes, gene families, and mechanisms (such as SOCE mechanisms) affecting skeletal development and morphology have been identified. These findings contribute to our understanding of the genetic architecture of head size and will support marker-assisted breeding in aquaculture, also reflecting the potential application of the GRAMMAR-Lambda method in genetic studies of complex traits.
Collapse
Affiliation(s)
- Yunfeng Zhao
- Hainan Fisheries Innovation Research Institute, Chinese Academy of Fishery Sciences, Sanya 572024, China
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, China
| | - Jin Gao
- Hainan Academy of Ocean and Fisheries Sciences, Haikou 571126, China;
| | - Hong Feng
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China;
| | - Li Jiang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, China
| |
Collapse
|
2
|
Kazemzadeh S, Farrokhi N, Ahmadikhah A, Tabar Heydar K, Gilani A, Askari H, Ingvarsson PK. Genome-wide association study and genotypic variation for the major tocopherol content in rice grain. FRONTIERS IN PLANT SCIENCE 2024; 15:1426321. [PMID: 39439508 PMCID: PMC11493719 DOI: 10.3389/fpls.2024.1426321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 09/03/2024] [Indexed: 10/25/2024]
Abstract
Rice tocopherols, vitamin E compounds with antioxidant activity, play essential roles in human health. Even though the key genes involved in vitamin E biosynthetic pathways have been identified in plants, the genetic architecture of vitamin E content in rice grain remains unclear. A genome-wide association study (GWAS) on 179 genotypically diverse rice accessions with 34,323 SNP markers was conducted to detect QTLs that define total and α- tocopherol contents in rice grains. Total and α-tocopherol contents had a strong positive correlation and varied greatly across the accessions, ranging from 0.230-31.76 and 0.011-30.83 (μg/g), respectively. A total of 13 QTLs were identified, which were spread across five of the rice chromosomes. Among the 13 QTLs, 11 were considered major with phenotypic variation explained (PVE) greater than 10%. Twelve transcription factor (TF) genes, one microprotein (miP), and a transposon were found to be associated with the QTLs with putative roles in controlling tocopherol contents. Moreover, intracellular transport proteins, ABC transporters, nonaspanins, and SNARE, were identified as associated genes on chromosomes 1 and 8. In the vicinity of seven QTLs, protein kinases were identified as key signaling factors. Haplotype analysis revealed the QTLs qAlph1.1, qTot1.1, qAlph2.1, qAlph6.1, qTot6.1, and qTot8.3 to have significant haplogroups. Quantitative RT-PCR validated the expression direction and magnitude of WRKY39 (Os02g0265200), PIP5Ks (Os08g0450800), and MADS59 (Os06g0347700) in defining the major tocopherol contents. This study provides insights for ongoing biofortification efforts to breed and/or engineer vitamin E and antioxidant levels in rice and other cereals.
Collapse
Affiliation(s)
- Sara Kazemzadeh
- Department of Cell and Molecular Biology, Faculty of Life Sciences & Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Naser Farrokhi
- Department of Cell and Molecular Biology, Faculty of Life Sciences & Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Asadollah Ahmadikhah
- Department of Cell and Molecular Biology, Faculty of Life Sciences & Biotechnology, Shahid Beheshti University, Tehran, Iran
| | | | - Abdolali Gilani
- Agricultural and Natural Resources Research Institute of Khuzestan, Ahwaz, Iran
| | - Hossein Askari
- Department of Cell and Molecular Biology, Faculty of Life Sciences & Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Pär K. Ingvarsson
- Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
3
|
Han L, Shen B, Wu X, Zhang J, Wen YJ. Compressed variance component mixed model reveals epistasis associated with flowering in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2024; 14:1283642. [PMID: 38259933 PMCID: PMC10800901 DOI: 10.3389/fpls.2023.1283642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 12/15/2023] [Indexed: 01/24/2024]
Abstract
Introduction Epistasis is currently a topic of great interest in molecular and quantitative genetics. Arabidopsis thaliana, as a model organism, plays a crucial role in studying the fundamental biology of diverse plant species. However, there have been limited reports about identification of epistasis related to flowering in genome-wide association studies (GWAS). Therefore, it is of utmost importance to conduct epistasis in Arabidopsis. Method In this study, we employed Levene's test and compressed variance component mixed model in GWAS to detect quantitative trait nucleotides (QTNs) and QTN-by-QTN interactions (QQIs) for 11 flowering-related traits of 199 Arabidopsis accessions with 216,130 markers. Results Our analysis detected 89 QTNs and 130 pairs of QQIs. Around these loci, 34 known genes previously reported in Arabidopsis were confirmed to be associated with flowering-related traits, such as SPA4, which is involved in regulating photoperiodic flowering, and interacts with PAP1 and PAP2, affecting growth of Arabidopsis under light conditions. Then, we observed significant and differential expression of 35 genes in response to variations in temperature, photoperiod, and vernalization treatments out of unreported genes. Functional enrichment analysis revealed that 26 of these genes were associated with various biological processes. Finally, the haplotype and phenotypic difference analysis revealed 20 candidate genes exhibiting significant phenotypic variations across gene haplotypes, of which the candidate genes AT1G12990 and AT1G09950 around QQIs might have interaction effect to flowering time regulation in Arabidopsis. Discussion These findings may offer valuable insights for the identification and exploration of genes and gene-by-gene interactions associated with flowering-related traits in Arabidopsis, that may even provide valuable reference and guidance for the research of epistasis in other species.
Collapse
Affiliation(s)
- Le Han
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Bolin Shen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Xinyi Wu
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, China
| | - Yang-Jun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
4
|
Gselman S, Fabjan TH, Bizjak A, Potočnik U, Gorenjak M. Cholecalciferol Supplementation Induced Up-Regulation of SARAF Gene and Down-Regulated miR-155-5p Expression in Slovenian Patients with Multiple Sclerosis. Genes (Basel) 2023; 14:1237. [PMID: 37372417 DOI: 10.3390/genes14061237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/05/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
Multiple sclerosis is a common immune-mediated inflammatory and demyelinating disease. Lower cholecalciferol levels are an established environmental risk factor in multiple sclerosis. Although cholecalciferol supplementation in multiple sclerosis is widely accepted, optimal serum levels are still debated. Moreover, how cholecalciferol affects pathogenic disease mechanisms is still unclear. In the present study, we enrolled 65 relapsing-remitting multiple sclerosis patients who were double-blindly divided into two groups with low and high cholecalciferol supplementation, respectively. In addition to clinical and environmental parameters, we obtained peripheral blood mononuclear cells to analyze DNA, RNA, and miRNA molecules. Importantly, we investigated miRNA-155-5p, a previously published pro-inflammatory miRNA in multiple sclerosis known to be correlated to cholecalciferol levels. Our results show a decrease in miR-155-5p expression after cholecalciferol supplementation in both dosage groups, consistent with previous observations. Subsequent genotyping, gene expression, and eQTL analyses reveal correlations between miR-155-5p and the SARAF gene, which plays a role in the regulation of calcium release-activated channels. As such, the present study is the first to explore and suggest that the SARAF miR-155-5p axis hypothesis might be another mechanism by which cholecalciferol supplementation might decrease miR-155 expression. This association highlights the importance of cholecalciferol supplementation in multiple sclerosis and encourages further investigation and functional cell studies.
Collapse
Affiliation(s)
- Saša Gselman
- Clinic of Neurology, University Clinical Centre Maribor, 2000 Maribor, Slovenia
| | - Tanja Hojs Fabjan
- Clinic of Neurology, University Clinical Centre Maribor, 2000 Maribor, Slovenia
- Department of Neurology, Faculty of Medicine, University of Maribor, 2000 Maribor, Slovenia
| | - Anja Bizjak
- Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, 2000 Maribor, Slovenia
| | - Uroš Potočnik
- Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, 2000 Maribor, Slovenia
- Laboratory of Biochemistry, Molecular Biology and Genomics, Faculty of Chemistry and Chemical Engineering, University of Maribor, 2000 Maribor, Slovenia
- Department for Science and Research, University Clinical Centre Maribor, 2000 Maribor, Slovenia
| | - Mario Gorenjak
- Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, 2000 Maribor, Slovenia
| |
Collapse
|
5
|
Wang J, Zhou F, Li C, Yin N, Liu H, Zhuang B, Huang Q, Wen Y. Gene Association Analysis of Quantitative Trait Based on Functional Linear Regression Model with Local Sparse Estimator. Genes (Basel) 2023; 14:genes14040834. [PMID: 37107592 PMCID: PMC10137544 DOI: 10.3390/genes14040834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/27/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Functional linear regression models have been widely used in the gene association analysis of complex traits. These models retain all the genetic information in the data and take full advantage of spatial information in genetic variation data, which leads to brilliant detection power. However, the significant association signals identified by the high-power methods are not all the real causal SNPs, because it is easy to regard noise information as significant association signals, leading to a false association. In this paper, a method based on the sparse functional data association test (SFDAT) of gene region association analysis is developed based on a functional linear regression model with local sparse estimation. The evaluation indicators CSR and DL are defined to evaluate the feasibility and performance of the proposed method with other indicators. Simulation studies show that: (1) SFDAT performs well under both linkage equilibrium and linkage disequilibrium simulation; (2) SFDAT performs successfully for gene regions (including common variants, low-frequency variants, rare variants and mix variants); (3) With power and type I error rates comparable to OLS and Smooth, SFDAT has a better ability to handle the zero regions. The Oryza sativa data set is analyzed by SFDAT. It is shown that SFDAT can better perform gene association analysis and eliminate the false positive of gene localization. This study showed that SFDAT can lower the interference caused by noise while maintaining high power. SFDAT provides a new method for the association analysis between gene regions and phenotypic quantitative traits.
Collapse
Affiliation(s)
- Jingyu Wang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Fujie Zhou
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Cheng Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ning Yin
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Huiming Liu
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Binxian Zhuang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qingyu Huang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Correspondence:
| |
Collapse
|
6
|
Chen D, Cremona MA, Qi Z, Mitra RD, Chiaromonte F, Makova KD. Human L1 Transposition Dynamics Unraveled with Functional Data Analysis. Mol Biol Evol 2021; 37:3576-3600. [PMID: 32722770 DOI: 10.1093/molbev/msaa194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.
Collapse
Affiliation(s)
- Di Chen
- Intercollege Graduate Degree Program in Genetics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA.,Department of Operations and Decision Systems, Université Laval, Québec, Canada
| | - Zongtai Qi
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Robi D Mitra
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA.,EMbeDS, Sant'Anna School of Advanced Studies, Pisa, Italy.,The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA
| | - Kateryna D Makova
- The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA.,Department of Biology, The Pennsylvania State University, University Park, PA
| |
Collapse
|
7
|
Jiang Y, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, Lakhal-Chaieb ML, Cook RJ, Amos CI, Wilson AF, Bailey-Wilson JE, McMahon FJ, Vazquez AI, Yuan A, Zhong X, Xiong M, Weeks DE, Fan R. Gene-Based Association Testing of Dichotomous Traits With Generalized Functional Linear Mixed Models Using Extended Pedigrees: Applications to Age-Related Macular Degeneration. J Am Stat Assoc 2020; 116:531-545. [PMID: 34321704 PMCID: PMC8315575 DOI: 10.1080/01621459.2020.1799809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 07/09/2020] [Accepted: 07/17/2020] [Indexed: 10/23/2022]
Abstract
Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Yingda Jiang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Michael B. Gorin
- Department of Ophthalmology, David Geffen School of Medicine, UCLA Stein Eye Institute, Los Angeles, CA
| | - Yvette P. Conley
- Department of Health Promotion and Development, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | | | - Richard J. Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | | | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Francis J. McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, National Institute of Mental Health, NIH, Bethesda, MD
| | - Ana I. Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Xiaogang Zhong
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, TX
| | - Daniel E. Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| |
Collapse
|
8
|
Genetic control of non-genetic inheritance in mammals: state-of-the-art and perspectives. Mamm Genome 2020; 31:146-156. [PMID: 32529318 PMCID: PMC7369129 DOI: 10.1007/s00335-020-09841-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/03/2020] [Indexed: 12/12/2022]
Abstract
Thought to be directly and uniquely dependent from genotypes, the ontogeny of individual phenotypes is much more complicated. Individual genetics, environmental exposures, and their interaction are the three main determinants of individual's phenotype. This picture has been further complicated a decade ago when the Lamarckian theory of acquired inheritance has been rekindled with the discovery of epigenetic inheritance, according to which acquired phenotypes can be transmitted through fertilization and affect phenotypes across generations. The results of Genome-Wide Association Studies have also highlighted a big degree of missing heritability in genetics and have provided hints that not only acquired phenotypes, but also individual's genotypes affect phenotypes intergenerationally through indirect genetic effects. Here, we review available examples of indirect genetic effects in mammals, what is known of the underlying molecular mechanisms and their potential impact for our understanding of missing heritability, phenotypic variation. and individual disease risk.
Collapse
|
9
|
Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
Affiliation(s)
- K Van Steen
- WELBIO, GIGA-R Medical Genomics-BIO3, University of Liège, Liege, Belgium.
- Department of Human Genetics, University of Leuven, Leuven, Belgium.
| | - J H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
10
|
Costanzo M, Kuzmin E, van Leeuwen J, Mair B, Moffat J, Boone C, Andrews B. Global Genetic Networks and the Genotype-to-Phenotype Relationship. Cell 2019; 177:85-100. [PMID: 30901552 PMCID: PMC6817365 DOI: 10.1016/j.cell.2019.01.033] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 01/09/2019] [Accepted: 01/21/2019] [Indexed: 01/25/2023]
Abstract
Genetic interactions identify combinations of genetic variants that impinge on phenotype. With whole-genome sequence information available for thousands of individuals within a species, a major outstanding issue concerns the interpretation of allelic combinations of genes underlying inherited traits. In this Review, we discuss how large-scale analyses in model systems have illuminated the general principles and phenotypic impact of genetic interactions. We focus on studies in budding yeast, including the mapping of a global genetic network. We emphasize how information gained from work in yeast translates to other systems, and how a global genetic network not only annotates gene function but also provides new insights into the genotype-to-phenotype relationship.
Collapse
Affiliation(s)
- Michael Costanzo
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada.
| | - Elena Kuzmin
- Goodman Cancer Research Centre, McGill University, Montreal QC, Canada
| | | | - Barbara Mair
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada
| | - Jason Moffat
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada; Department of Molecular Genetics, University of Toronto, 1 Kings College Circle, Toronto ON, Canada
| | - Charles Boone
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada; Department of Molecular Genetics, University of Toronto, 1 Kings College Circle, Toronto ON, Canada.
| | - Brenda Andrews
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada; Department of Molecular Genetics, University of Toronto, 1 Kings College Circle, Toronto ON, Canada.
| |
Collapse
|
11
|
Burillo-Sanz S, Montes-Cano MA, García-Lozano JR, Olivas-Martínez I, Ortego-Centeno N, García-Hernández FJ, Espinosa G, Graña-Gil G, Sánchez-Bursón J, Juliá MR, Solans R, Blanco R, Barnosi-Marín AC, Gómez de la Torre R, Fanlo P, Rodríguez-Carballeira M, Rodríguez-Rodríguez L, Camps T, Castañeda S, Alegre-Sancho JJ, Martín J, González-Escribano MF. Behçet's disease and genetic interactions between HLA-B*51 and variants in genes of autoinflammatory syndromes. Sci Rep 2019; 9:2777. [PMID: 30808881 PMCID: PMC6391494 DOI: 10.1038/s41598-019-39113-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 01/14/2019] [Indexed: 12/16/2022] Open
Abstract
Behçet’s disease (BD) is an immune-mediated systemic disorder with a well-established genetic base. In a previous study, using a next generation sequencing approach, we found many rare variants and some functional polymorphisms in genes related to autoinflammatory syndromes (AID): CECR1, MEFV, MVK, NLRP3, NOD2, PSTPIP1 and TNFRSF1A in our BD cohort. Our strategy did not allow us to establish either number of patients with variants, proportion of individuals accumulating them or relationship with other genetic factors. With the goal to answer these questions, the individual samples were sequenced. Additionally, three functional polymorphisms: NLRP3 p.Gln703Lys, NOD2 p.Arg702Trp and p.Val955Ile were genotyped using TaqMan assays. A total of 98 patients (27.6%) carried at least one rare variant and 13 of them (3.7%) accumulated two or three. Functional regression model analysis suggests epistatic interaction between B51 and MEFV (P = 0.003). A suggestive protective association of the minor allele of NOD2 p.Arg702Trp (P = 0.01) was found in both, B51 positive and negative individuals. Therefore, a high percentage of patients with BD have rare variants in AID genes. Our results suggest that the association of MEFV with BD could be modulated by the HLA molecules; whereas the protective effect of NOD2 p.Arg702Trp would be independent of HLA.
Collapse
Affiliation(s)
- Sergio Burillo-Sanz
- Department of Immunology, Hospital Universitario Virgen del Rocío (IBiS, CSIC, US), Sevilla, 41013, Spain
| | - Marco-Antonio Montes-Cano
- Department of Immunology, Hospital Universitario Virgen del Rocío (IBiS, CSIC, US), Sevilla, 41013, Spain
| | - José-Raúl García-Lozano
- Department of Immunology, Hospital Universitario Virgen del Rocío (IBiS, CSIC, US), Sevilla, 41013, Spain
| | - Israel Olivas-Martínez
- Department of Immunology, Hospital Universitario Virgen del Rocío (IBiS, CSIC, US), Sevilla, 41013, Spain
| | | | | | - Gerard Espinosa
- Department Autoimmune Diseases, Hospital Clinic Universitari, Barcelona, 08036, Spain
| | - Genaro Graña-Gil
- Department of Rheumatology, Complejo Hospitalario Universitario, A Coruña, 15006, Spain
| | - Juan Sánchez-Bursón
- Department of Rheumatology, Hospital Universitario de Valme, Sevilla, 41014, Spain
| | - María Rosa Juliá
- Department of Immunology, Hospital Universitari Son Espases, Palma de Mallorca, Illes Balears, 07120, Spain
| | - Roser Solans
- Department of Internal Medicine, Autoimmune Systemic Diseases Unit, Hospital Vall d'Hebron, Universidad Autonoma de Barcelona, Barcelona, 08035, Spain
| | - Ricardo Blanco
- Department of Rheumatology, Hospital Universitario Marqués de Valdecilla, Santander, 39008, Spain
| | | | | | - Patricia Fanlo
- Department of Internal Medicine, Hospital Virgen del Camino, Pamplona, 31008, Spain
| | | | | | - Teresa Camps
- Department of Internal Medicine, Hospital Regional Universitario, Málaga, 29010, Spain
| | - Santos Castañeda
- Department of Rheumatology, Hospital de la Princesa, IIS-Princesa, Madrid, 28006, Spain
| | | | - Javier Martín
- Instituto de Parasitología y Biomedicina "López-Neyra", CSIC, PTS, Granada, 18016, Spain
| | | |
Collapse
|
12
|
Ning C, Wang D, Kang H, Mrode R, Zhou L, Xu S, Liu JF. A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values. Bioinformatics 2018; 34:1817-1825. [PMID: 29342229 PMCID: PMC5972602 DOI: 10.1093/bioinformatics/bty017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/07/2018] [Accepted: 01/10/2018] [Indexed: 12/16/2022] Open
Abstract
Motivation Epistasis provides a feasible way for probing potential genetic mechanism of complex traits. However, time-consuming computation challenges successful detection of interaction in practice, especially when linear mixed model (LMM) is used to control type I error in the presence of population structure and cryptic relatedness. Results A rapid epistatic mixed-model association analysis (REMMA) method was developed to overcome computational limitation. This method first estimates individuals' epistatic effects by an extended genomic best linear unbiased prediction (EG-BLUP) model with additive and epistatic kinship matrix, then pairwise interaction effects are obtained by linear retransformations of individuals' epistatic effects. Simulation studies showed that REMMA could control type I error and increase statistical power in detecting epistatic QTNs in comparison with existing LMM-based FaST-LMM. We applied REMMA to two real datasets, a mouse dataset and the Wellcome Trust Case Control Consortium (WTCCC) data. Application to the mouse data further confirmed the performance of REMMA in controlling type I error. For the WTCCC data, we found most epistatic QTNs for type 1 diabetes (T1D) located in a major histocompatibility complex (MHC) region, from which a large interacting network with 12 hub genes (interacting with ten or more genes) was established. Availability and implementation Our REMMA method can be freely accessed at https://github.com/chaoning/REMMA. Contact liujf@cau.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Ning
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Dan Wang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Huimin Kang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Raphael Mrode
- Animal Biosciences, International Livestock Institute, Nairobi, Kenya
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Science, University of California, Riverside, CA, USA
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
13
|
Tai HH, De Koeyer D, Sønderkær M, Hedegaard S, Lagüe M, Goyer C, Nolan L, Davidson C, Gardner K, Neilson J, Paudel JR, Murphy A, Bizimungu B, Wang HY, Xiong X, Halterman D, Nielsen KL. Verticillium dahliae Disease Resistance and the Regulatory Pathway for Maturity and Tuberization in Potato. THE PLANT GENOME 2018; 11. [PMID: 29505631 DOI: 10.3835/plantgenome2017.05.0040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Kleb. is a pathogenic fungus causing wilting, chlorosis, and early dying in potato ( L.). Genetic mapping of resistance to was done using a diploid population of potato. The major quantitative trait locus (QTL) for resistance was found on chromosome 5. The gene, controlling earliness of maturity and tuberization, was mapped within the interval. Another QTL on chromosome 9 co-localized with the wilt resistance gene marker. Epistasis analysis indicated that the loci on chromosomes 5 and 9 had a highly significant interaction, and that functioned downstream of The alleles were sequenced and found to encode StCDF1.1 and StCDF1.3. Interaction between the resistance allele and the was demonstrated, but not for Genome-wide expression QTL (eQTL) analysis was performed and genes with eQTL at the and loci were both found to have similar functions involving the chloroplast, including photosynthesis, which declines in both maturity and wilt. Among the gene ontology (GO) terms that were specific to genes with eQTL at the , but not the locus, were those associated with fungal defense. These results suggest that controls fungal defense and reduces early dying in wilt through affecting genetic pathway controlling tuberization timing.
Collapse
|
14
|
Xu K, Jin L, Xiong M. Functional regression method for whole genome eQTL epistasis analysis with sequencing data. BMC Genomics 2017; 18:385. [PMID: 28521784 PMCID: PMC5436462 DOI: 10.1186/s12864-017-3777-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 05/09/2017] [Indexed: 12/02/2022] Open
Abstract
Background Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. Methods We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. Results By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. Conclusions The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kelin Xu
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China.,School of Data Science and Institute for Big Data, Fudan University, Shanghai, 200433, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Momiao Xiong
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China. .,Department of Biostatistics, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
15
|
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models. Eur J Hum Genet 2016; 25:350-359. [PMID: 28000696 DOI: 10.1038/ejhg.2016.170] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 07/26/2016] [Accepted: 09/27/2016] [Indexed: 11/09/2022] Open
Abstract
To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Collapse
|
16
|
Fan R, Chiu CY, Jung J, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Chen Z, Mills JL, Xiong M. A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 2016; 40:702-721. [PMID: 27374056 DOI: 10.1002/gepi.21984] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 03/08/2016] [Accepted: 04/26/2016] [Indexed: 12/22/2022]
Abstract
In association studies of complex traits, fixed-effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance-component tests based on mixed models were developed for region-based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT-O), and a combined sum test of rare and common variant effect (SKAT-C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT-O, and SKAT-C, (ii) traditional fixed-effect additive models, and (iii) fixed-effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed-effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed-effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT-O/SKAT-C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed-effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chi-Yang Chiu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jeesun Jung
- Laboratory of Epidemiology and Biometry, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Daniel E Weeks
- Departments of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America
| | - Zhen Chen
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas, United States of America
| |
Collapse
|
17
|
Campos-Sánchez R, Cremona MA, Pini A, Chiaromonte F, Makova KD. Integration and Fixation Preferences of Human and Mouse Endogenous Retroviruses Uncovered with Functional Data Analysis. PLoS Comput Biol 2016; 12:e1004956. [PMID: 27309962 PMCID: PMC4911145 DOI: 10.1371/journal.pcbi.1004956] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 04/29/2016] [Indexed: 01/24/2023] Open
Abstract
Endogenous retroviruses (ERVs), the remnants of retroviral infections in the germ line, occupy ~8% and ~10% of the human and mouse genomes, respectively, and affect their structure, evolution, and function. Yet we still have a limited understanding of how the genomic landscape influences integration and fixation of ERVs. Here we conducted a genome-wide study of the most recently active ERVs in the human and mouse genome. We investigated 826 fixed and 1,065 in vitro HERV-Ks in human, and 1,624 fixed and 242 polymorphic ETns, as well as 3,964 fixed and 1,986 polymorphic IAPs, in mouse. We quantitated >40 human and mouse genomic features (e.g., non-B DNA structure, recombination rates, and histone modifications) in ±32 kb of these ERVs' integration sites and in control regions, and analyzed them using Functional Data Analysis (FDA) methodology. In one of the first applications of FDA in genomics, we identified genomic scales and locations at which these features display their influence, and how they work in concert, to provide signals essential for integration and fixation of ERVs. The investigation of ERVs of different evolutionary ages (young in vitro and polymorphic ERVs, older fixed ERVs) allowed us to disentangle integration vs. fixation preferences. As a result of these analyses, we built a comprehensive model explaining the uneven distribution of ERVs along the genome. We found that ERVs integrate in late-replicating AT-rich regions with abundant microsatellites, mirror repeats, and repressive histone marks. Regions favoring fixation are depleted of genes and evolutionarily conserved elements, and have low recombination rates, reflecting the effects of purifying selection and ectopic recombination removing ERVs from the genome. In addition to providing these biological insights, our study demonstrates the power of exploiting multiple scales and localization with FDA. These powerful techniques are expected to be applicable to many other genomic investigations.
Collapse
Affiliation(s)
- Rebeca Campos-Sánchez
- Genetics Graduate Program, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
| | - Marzia A. Cremona
- MOX—Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milano, Italy
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
| | - Alessia Pini
- MOX—Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
18
|
Zhang F, Xie D, Liang M, Xiong M. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLoS Genet 2016; 12:e1005965. [PMID: 27104857 PMCID: PMC4841563 DOI: 10.1371/journal.pgen.1005965] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 03/08/2016] [Indexed: 12/02/2022] Open
Abstract
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. The widely used statistical methods test interaction for single phenotype. However, we often observe pleotropic genetic interaction effects. The simultaneous gene-gene (GxG) interaction analysis of multiple complementary traits will increase statistical power to detect GxG interactions. Although GxG interactions play an important role in uncovering the genetic structure of complex traits, the statistical methods for detecting GxG interactions in multiple phenotypes remains less developed owing to its potential complexity. Therefore, we extend functional regression model from single variate to multivariate for simultaneous GxG interaction analysis of multiple correlated phenotypes. Large-scale simulations are conducted to evaluate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare power with traditional multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for interaction analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic GxG interactions. 267 pairs of genes that formed a genetic interaction network showed significant evidence of interactions influencing five traits.
Collapse
Affiliation(s)
- Futao Zhang
- Department of Computer Science, College of Internet of Things, Hohai University, Changzhou, China
| | - Dan Xie
- College of Information Engineering, Hubei University of Chinese Medicine, Hubei, China
| | - Meimei Liang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
19
|
Fan R, Wang Y, Yan Q, Ding Y, Weeks DE, Lu Z, Ren H, Cook RJ, Xiong M, Swaroop A, Chew EY, Chen W. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions. Genet Epidemiol 2016; 40:133-43. [PMID: 26782979 DOI: 10.1002/gepi.21947] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 10/13/2015] [Accepted: 11/05/2015] [Indexed: 11/07/2022]
Abstract
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example.
Collapse
Affiliation(s)
- Ruzong Fan
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Yifan Wang
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhaohui Lu
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Haobo Ren
- Regeneron Pharmaceuticals, Inc, Basking Ridge, New Jersey, United States of America
| | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, Texas, United States of America
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
20
|
Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 2015; 202:457-70. [PMID: 26715663 DOI: 10.1534/genetics.115.180869] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao's efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao's efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao's efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao's efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10(-6)), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10(-5)), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.
Collapse
|
21
|
Upton A, Trelles O, Cornejo-García JA, Perkins JR. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
22
|
Zhang FT, Zhu ZH, Tong XR, Zhu ZX, Qi T, Zhu J. Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants. Sci Rep 2015. [PMID: 26223539 PMCID: PMC5155518 DOI: 10.1038/srep10298] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Precise prediction for genetic architecture of complex traits is impeded by the limited understanding on genetic effects of complex traits, especially on gene-by-gene (GxG) and gene-by-environment (GxE) interaction. In the past decades, an explosion of high throughput technologies enables omics studies at multiple levels (such as genomics, transcriptomics, proteomics, and metabolomics). The analyses of large omics data, especially two-loci interaction analysis, are very time intensive. Integrating the diverse omics data and environmental effects in the analyses also remain challenges. We proposed mixed linear model approaches using GPU (Graphic Processing Unit) computation to simultaneously dissect various genetic effects. Analyses can be performed for estimating genetic main effects, GxG epistasis effects, and GxE environment interaction effects on large-scale omics data for complex traits, and for estimating heritability of specific genetic effects. Both mouse data analyses and Monte Carlo simulations demonstrated that genetic effects and environment interaction effects could be unbiasedly estimated with high statistical power by using the proposed approaches.
Collapse
Affiliation(s)
- Fu-Tao Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Zhi-Hong Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Xiao-Ran Tong
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Zhi-Xiang Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Ting Qi
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Jun Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| |
Collapse
|
23
|
Wang Y, Li D, Wei P. Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions. Cancer Inform 2015; 14:209-18. [PMID: 26064040 PMCID: PMC4459566 DOI: 10.4137/cin.s17305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Revised: 04/20/2015] [Accepted: 04/28/2015] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of GWAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene–gene (G × G) and gene–environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey’s one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey’s 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey’s 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn’s disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case–control study of pancreatic cancer.
Collapse
Affiliation(s)
- Yaping Wang
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center
| | - Peng Wei
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center ; Human Genetics Center, School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| |
Collapse
|
24
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|