1
|
Johri P, Charlesworth B. A gene-based model of fitness and its implications for genetic variation: linkage disequilibrium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.12.612686. [PMID: 40027714 PMCID: PMC11870398 DOI: 10.1101/2024.09.12.612686] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
A widely used model of the effects of mutations on fitness (the "sites" model) assumes that heterozygous recessive or partially recessive deleterious mutations at different sites in a gene complement each other, similarly to mutations in different genes. However, the general lack of complementation between major effect allelic mutations suggests an alternative possibility, which we term the "gene" model. This assumes that a pair of heterozygous deleterious mutations in trans behave effectively as homozygotes, so that the fitnesses of trans heterozygotes are lower than those of cis heterozygotes. We examine the properties of the two different models, using both analytical and simulation methods. We show that the gene model predicts positive linkage disequilibrium (LD) between deleterious variants within the coding sequence, under conditions when the sites model predicts zero or slightly negative LD. We also show that focussing on rare variants when examining patterns of LD, especially with Lewontin's D ' measure, is likely to produce misleading results with respect to inferences concerning the causes of the sign of LD. Synergistic epistasis between pairs of mutations was also modeled; it is less likely to produce negative LD under the gene model than the sites model. The theoretical results are discussed in relation to patterns of LD in natural populations of several species.
Collapse
|
2
|
Linder RA, Zabanavar B, Majumder A, Hoang HCS, Delgado VG, Tran R, La VT, Leemans SW, Long AD. Adaptation in Outbred Sexual Yeast is Repeatable, Polygenic and Favors Rare Haplotypes. Mol Biol Evol 2022; 39:msac248. [PMID: 36366952 PMCID: PMC9728589 DOI: 10.1093/molbev/msac248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We carried out a 200 generation Evolve and Resequence (E&R) experiment initiated from an outbred diploid recombined 18-way synthetic base population. Replicate populations were evolved at large effective population sizes (>105 individuals), exposed to several different chemical challenges over 12 weeks of evolution, and whole-genome resequenced. Weekly forced outcrossing resulted in an average between adjacent-gene per cell division recombination rate of ∼0.0008. Despite attempts to force weekly sex, roughly half of our populations evolved cheaters and appear to be evolving asexually. Focusing on seven chemical stressors and 55 total evolved populations that remained sexual we observed large fitness gains and highly repeatable patterns of genome-wide haplotype change within chemical challenges, with limited levels of repeatability across chemical treatments. Adaptation appears highly polygenic with almost the entire genome showing significant and consistent patterns of haplotype change with little evidence for long-range linkage disequilibrium in a subset of populations for which we sequenced haploid clones. That is, almost the entire genome is under selection or drafting with selected sites. At any given locus adaptation was almost always dominated by one of the 18 founder's alleles, with that allele varying spatially and between treatments, suggesting that selection acts primarily on rare variants private to a founder or haplotype blocks harboring multiple mutations.
Collapse
Affiliation(s)
- Robert A Linder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Behzad Zabanavar
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Arundhati Majumder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Hannah Chiao-Shyan Hoang
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Vanessa Genesaret Delgado
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Ryan Tran
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Vy Thoai La
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| | - Simon William Leemans
- Department of Biomedical Engineering, School of Engineering, University of California, Irvine
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine
| |
Collapse
|
3
|
Long PN, Cook VJ, Majumder A, Barbour AG, Long AD. The utility of a closed breeding colony of Peromyscus leucopus for dissecting complex traits. Genetics 2022; 221:iyac026. [PMID: 35143664 PMCID: PMC9071557 DOI: 10.1093/genetics/iyac026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/01/2022] [Indexed: 11/13/2022] Open
Abstract
Deermice of the genus Peromyscus are well suited for addressing several questions of biologist interest, including the genetic bases of longevity, behavior, physiology, adaptation, and their ability to serve as disease vectors. Here, we explore a diversity outbred approach for dissecting complex traits in Peromyscus leucopus, a nontraditional genetic model system. We take advantage of a closed colony of deer-mice founded from 38 individuals and subsequently maintained for ∼40-60 generations. From 405 low-pass short-read sequenced deermice we accurate impute genotypes at 16 million single nucleotide polymorphisms. Conditional on observed genotypes simulations were conducted in which three different sized quantitative trait loci contribute to a complex trait under three different genetic models. Using a stringent significance threshold power was modest, largely a function of the percent variation attributable to the simulated quantitative trait loci, with the underlying genetic model having only a subtle impact. We additionally simulated 2,000 pseudo-individuals, whose genotypes were consistent with those observed in the genotyped cohort and carried out additional power simulations. In experiments employing more than 1,000 mice power is high to detect quantitative trait loci contributing greater than 2.5% to a complex trait, with a localization ability of ∼100 kb. We finally carried out a Genome-Wide Association Study on two demonstration traits, bleeding time and body weight, and uncovered one significant region. Our work suggests that complex traits can be dissected in founders-unknown P. leucopus colony mice and similar colonies in other systems using easily obtained genotypes from low-pass sequencing.
Collapse
Affiliation(s)
- Phillip N Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| | - Vanessa J Cook
- Departments of Microbiology & Molecular Genetics and Medicine, School of Medical Sciences, University of California Irvine, Irvine, CA 92687-2525, USA
| | - Arundhati Majumder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| | - Alan G Barbour
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
- Departments of Microbiology & Molecular Genetics and Medicine, School of Medical Sciences, University of California Irvine, Irvine, CA 92687-2525, USA
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697-2525, USA
| |
Collapse
|
4
|
Macdonald SJ, Cloud-Richardson KM, Sims-West DJ, Long AD. Powerful, efficient QTL mapping in Drosophila melanogaster using bulked phenotyping and pooled sequencing. Genetics 2022; 220:iyab238. [PMID: 35100395 PMCID: PMC8893256 DOI: 10.1093/genetics/iyab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 12/19/2021] [Indexed: 01/22/2024] Open
Abstract
Despite the value of recombinant inbred lines for the dissection of complex traits, large panels can be difficult to maintain, distribute, and phenotype. An attractive alternative to recombinant inbred lines for many traits leverages selecting phenotypically extreme individuals from a segregating population, and subjecting pools of selected and control individuals to sequencing. Under a bulked or extreme segregant analysis paradigm, genomic regions contributing to trait variation are revealed as frequency differences between pools. Here, we describe such an extreme quantitative trait locus, or extreme quantitative trait loci, mapping strategy that builds on an existing multiparental population, the Drosophila Synthetic Population Resource, and involves phenotyping and genotyping a population derived by mixing hundreds of Drosophila Synthetic Population Resource recombinant inbred lines. Simulations demonstrate that challenging, yet experimentally tractable extreme quantitative trait loci designs (≥4 replicates, ≥5,000 individuals/replicate, and selecting the 5-10% most extreme animals) yield at least the same power as traditional recombinant inbred line-based quantitative trait loci mapping and can localize variants with sub-centimorgan resolution. We empirically demonstrate the effectiveness of the approach using a 4-fold replicated extreme quantitative trait loci experiment that identifies 7 quantitative trait loci for caffeine resistance. Two mapped extreme quantitative trait loci factors replicate loci previously identified in recombinant inbred lines, 6/7 are associated with excellent candidate genes, and RNAi knock-downs support the involvement of 4 genes in the genetic control of trait variation. For many traits of interest to drosophilists, a bulked phenotyping/genotyping extreme quantitative trait loci design has considerable advantages.
Collapse
Affiliation(s)
- Stuart J Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA
- Center for Computational Biology, University of Kansas, Lawrence, KS 66047, USA
| | | | - Dylan J Sims-West
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California at Irvine, Irvine, CA 92697, USA
| |
Collapse
|
5
|
Khanzadeh H, Ghavi Hossein-Zadeh N, Ghovvati S. Statistical power and heritability in whole-genome association studies for quantitative traits. Meta Gene 2021. [DOI: 10.1016/j.mgene.2021.100869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
6
|
Baison J, Zhou L, Forsberg N, Mörling T, Grahn T, Olsson L, Karlsson B, Wu HX, Mellerowicz EJ, Lundqvist SO, García-Gil MR. Genetic control of tracheid properties in Norway spruce wood. Sci Rep 2020; 10:18089. [PMID: 33093525 PMCID: PMC7581746 DOI: 10.1038/s41598-020-72586-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 09/03/2020] [Indexed: 01/20/2023] Open
Abstract
Through the use of genome-wide association studies (GWAS) mapping it is possible to establish the genetic basis of phenotypic trait variation. Our GWAS study presents the first such effort in Norway spruce (Picea abies (L). Karst.) for the traits related to wood tracheid characteristics. The study employed an exome capture genotyping approach that generated 178 101 Single Nucleotide Polymorphisms (SNPs) from 40 018 probes within a population of 517 Norway spruce mother trees. We applied a least absolute shrinkage and selection operator (LASSO) based association mapping method using a functional multi-locus mapping approach, with a stability selection probability method as the hypothesis testing approach to determine significant Quantitative Trait Loci (QTLs). The analysis has provided 30 significant associations, the majority of which show specific expression in wood-forming tissues or high ubiquitous expression, potentially controlling tracheids dimensions, their cell wall thickness and microfibril angle. Among the most promising candidates based on our results and prior information for other species are: Picea abies BIG GRAIN 2 (PabBG2) with a predicted function in auxin transport and sensitivity, and MA_373300g0010 encoding a protein similar to wall-associated receptor kinases, which were both associated with cell wall thickness. The results demonstrate feasibility of GWAS to identify novel candidate genes controlling industrially-relevant tracheid traits in Norway spruce.
Collapse
Affiliation(s)
- J Baison
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Linghua Zhou
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Nils Forsberg
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Tommy Mörling
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Thomas Grahn
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
| | - Lars Olsson
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
| | - Bo Karlsson
- Skogforsk, Ekebo 2250, 268 90, Svalov, Sweden
| | - Harry X Wu
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Ewa J Mellerowicz
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Sven-Olof Lundqvist
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
- IIC, Rosenlundsgatan 48B, 11863, Stockholm, Sweden
| | - María Rosario García-Gil
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden.
| |
Collapse
|
7
|
Linder RA, Majumder A, Chakraborty M, Long A. Two Synthetic 18-Way Outcrossed Populations of Diploid Budding Yeast with Utility for Complex Trait Dissection. Genetics 2020; 215:323-342. [PMID: 32241804 PMCID: PMC7268983 DOI: 10.1534/genetics.120.303202] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 03/31/2020] [Indexed: 02/07/2023] Open
Abstract
Advanced-generation multiparent populations (MPPs) are a valuable tool for dissecting complex traits, having more power than genome-wide association studies to detect rare variants and higher resolution than F2 linkage mapping. To extend the advantages of MPPs in budding yeast, we describe the creation and characterization of two outbred MPPs derived from 18 genetically diverse founding strains. We carried out de novo assemblies of the genomes of the 18 founder strains, such that virtually all variation segregating between these strains is known, and represented those assemblies as Santa Cruz Genome Browser tracks. We discovered complex patterns of structural variation segregating among the founders, including a large deletion within the vacuolar ATPase VMA1, several different deletions within the osmosensor MSB2, a series of deletions and insertions at PRM7 and the adjacent BSC1, as well as copy number variation at the dehydrogenase ALD2 Resequenced haploid recombinant clones from the two MPPs have a median unrecombined block size of 66 kb, demonstrating that the population is highly recombined. We pool-sequenced the two MPPs to 3270× and 2226× coverage and demonstrated that we can accurately estimate local haplotype frequencies using pooled data. We further downsampled the pool-sequenced data to ∼20-40× and showed that local haplotype frequency estimates remained accurate, with median error rates 0.8 and 0.6% at 20× and 40×, respectively. Haplotypes frequencies are estimated much more accurately than SNP frequencies obtained directly from the same data. Deep sequencing of the two populations revealed that 10 or more founders are present at a detectable frequency for > 98% of the genome, validating the utility of this resource for the exploration of the role of standing variation in the architecture of complex traits.
Collapse
Affiliation(s)
- Robert A Linder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine, California 92697-2525
| | - Arundhati Majumder
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine, California 92697-2525
| | - Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine, California 92697-2525
| | - Anthony Long
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine, California 92697-2525
| |
Collapse
|
8
|
Thornton KR. Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation Under Gaussian Stabilizing Selection and Additive Effects on a Single Trait. Genetics 2019; 213:1513-1530. [PMID: 31653678 PMCID: PMC6893385 DOI: 10.1534/genetics.119.302662] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 10/21/2019] [Indexed: 11/26/2022] Open
Abstract
Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates.
Collapse
Affiliation(s)
- Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| |
Collapse
|
9
|
Kono TJY, Liu C, Vonderharr EE, Koenig D, Fay JC, Smith KP, Morrell PL. The Fate of Deleterious Variants in a Barley Genomic Prediction Population. Genetics 2019; 213:1531-1544. [PMID: 31653677 PMCID: PMC6893365 DOI: 10.1534/genetics.119.302733] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/11/2019] [Indexed: 02/07/2023] Open
Abstract
Targeted identification and purging of deleterious genetic variants has been proposed as a novel approach to animal and plant breeding. This strategy is motivated, in part, by the observation that demographic events and strong selection associated with cultivated species pose a "cost of domestication." This includes an increase in the proportion of genetic variants that are likely to reduce fitness. Recent advances in DNA resequencing and sequence constraint-based approaches to predict the functional impact of a mutation permit the identification of putatively deleterious SNPs (dSNPs) on a genome-wide scale. Using exome capture resequencing of 21 barley lines, we identified 3855 dSNPs among 497,754 total SNPs. We generated whole-genome resequencing data of Hordeum murinum ssp. glaucum as a phylogenetic outgroup to polarize SNPs as ancestral vs. derived. We also observed a higher proportion of dSNPs per synonymous SNPs (sSNPs) in low-recombination regions of the genome. Using 5215 progeny from a genomic prediction experiment, we examined the fate of dSNPs over three breeding cycles. Adjusting for initial frequency, derived alleles at dSNPs reduced in frequency or were lost more often than other classes of SNPs. The highest-yielding lines in the experiment, as chosen by standard genomic prediction approaches, carried fewer homozygous dSNPs than randomly sampled lines from the same progeny cycle. In the final cycle of the experiment, progeny selected by genomic prediction had a mean of 5.6% fewer homozygous dSNPs relative to randomly chosen progeny from the same cycle.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Chaochih Liu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Emily E Vonderharr
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Daniel Koenig
- Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Justin C Fay
- Department of Biology, University of Rochester, New York 14627
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Peter L Morrell
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|
10
|
Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat Commun 2019; 10:4872. [PMID: 31653862 PMCID: PMC6814777 DOI: 10.1038/s41467-019-12884-1] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 09/25/2019] [Indexed: 12/11/2022] Open
Abstract
It has been hypothesized that individually-rare hidden structural variants (SVs) could account for a significant fraction of variation in complex traits. Here we identified more than 20,000 euchromatic SVs from 14 Drosophila melanogaster genome assemblies, of which ~40% are invisible to high specificity short-read genotyping approaches. SVs are common, with 31.5% of diploid individuals harboring a SV in genes larger than 5kb, and 24% harboring multiple SVs in genes larger than 10kb. SV minor allele frequencies are rarer than amino acid polymorphisms, suggesting that SVs are more deleterious. We show that a number of functionally important genes harbor previously hidden structural variants likely to affect complex phenotypes. Furthermore, SVs are overrepresented in candidate genes associated with quantitative trait loci mapped using the Drosophila Synthetic Population Resource. We conclude that SVs are ubiquitous, frequently constitute a heterogeneous allelic series, and can act as rare alleles of large effect.
Collapse
|
11
|
O'Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, Price AL. Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am J Hum Genet 2019; 105:456-476. [PMID: 31402091 PMCID: PMC6732528 DOI: 10.1016/j.ajhg.2019.07.003] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 07/03/2019] [Indexed: 12/16/2022] Open
Abstract
Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selection-purging large-effect mutations in these regions-leaves behind common-variant associations in thousands of less critical regions instead. We refer to this phenomenon as flattening. To quantify its effects, we introduce a mathematical definition of polygenicity, the effective number of independently associated SNPs (Me), which describes how evenly the heritability of a trait is spread across the genome. We developed a method, stratified LD fourth moments regression (S-LD4M), to estimate Me, validating that it produces robust estimates in simulations. Analyzing 33 complex traits (average N = 361k), we determined that heritability is spread ∼4× more evenly among common SNPs than among low-frequency SNPs. This difference, together with evolutionary modeling of new mutations, suggests that complex traits would be orders of magnitude less polygenic if not for the influence of negative selection. We also determined that heritability is spread more evenly within functionally important regions in proportion to their heritability enrichment; functionally important regions do not harbor common SNPs with greatly increased causal effect sizes, due to selective constraint. Our results suggest that for most complex traits, the genes and loci with the most critical biological effects often differ from those with the strongest common-variant associations.
Collapse
Affiliation(s)
- Luke J O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA 02115, USA.
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
12
|
Oliynyk RT. Evaluating the Potential of Younger Cases and Older Controls Cohorts to Improve Discovery Power in Genome-Wide Association Studies of Late-Onset Diseases. J Pers Med 2019; 9:jpm9030038. [PMID: 31336617 PMCID: PMC6789773 DOI: 10.3390/jpm9030038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/15/2019] [Accepted: 07/16/2019] [Indexed: 11/25/2022] Open
Abstract
For more than a decade, genome-wide association studies have been making steady progress in discovering the causal gene variants that contribute to late-onset human diseases. Polygenic late-onset diseases in an aging population display a risk allele frequency decrease at older ages, caused by individuals with higher polygenic risk scores becoming ill proportionately earlier and bringing about a change in the distribution of risk alleles between new cases and the as-yet-unaffected population. This phenomenon is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer’s disease, coronary artery disease, cerebral stroke, and type 2 diabetes, while for late-onset diseases with relatively lower prevalence and heritability, exemplified by cancers, the effect is significantly lower. In this research, computer simulations have demonstrated that genome-wide association studies of late-onset polygenic diseases showing high cumulative incidence together with high initial heritability will benefit from using the youngest possible age-matched cohorts. Moreover, rather than using age-matched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genome-wide association studies.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
13
|
López-Cortegano E, Caballero A. Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog. Genetics 2019; 212:891-904. [PMID: 31123044 PMCID: PMC6614893 DOI: 10.1534/genetics.119.302077] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 05/11/2019] [Indexed: 02/07/2023] Open
Abstract
Thousands of genes responsible for many diseases and other common traits in humans have been detected by Genome Wide Association Studies (GWAS) in the last decade. However, candidate causal variants found so far usually explain only a small fraction of the heritability estimated by family data. The most common explanation for this observation is that the missing heritability corresponds to variants, either rare or common, with very small effect, which pass undetected due to a lack of statistical power. We carried out a meta-analysis using data from the NHGRI-EBI GWAS Catalog in order to explore the observed distribution of locus effects for a set of 42 complex traits and to quantify their contribution to narrow-sense heritability. With the data at hand, we were able to predict the expected distribution of locus effects for 16 traits and diseases, their expected contribution to heritability, and the missing number of loci yet to be discovered to fully explain the familial heritability estimates. Our results indicate that, for 6 out of the 16 traits, the additive contribution of a great number of loci is unable to explain the familial (broad-sense) heritability, suggesting that the gap between GWAS and familial estimates of heritability may not ever be closed for these traits. In contrast, for the other 10 traits, the additive contribution of hundreds or thousands of loci yet to be found could potentially explain the familial heritability estimates, if this were the case. Computer simulations are used to illustrate the possible contribution from nonadditive genetic effects to the gap between GWAS and familial estimates of heritability.
Collapse
Affiliation(s)
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Spain
| |
Collapse
|
14
|
Oliynyk RT. Age-related late-onset disease heritability patterns and implications for genome-wide association studies. PeerJ 2019; 7:e7168. [PMID: 31231601 PMCID: PMC6573810 DOI: 10.7717/peerj.7168] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 05/22/2019] [Indexed: 01/06/2023] Open
Abstract
Genome-wide association studies (GWASs) and other computational biology techniques are gradually discovering the causal gene variants that contribute to late-onset human diseases. After more than a decade of genome-wide association study efforts, these can account for only a fraction of the heritability implied by familial studies, the so-called "missing heritability" problem. Computer simulations of polygenic late-onset diseases (LODs) in an aging population have quantified the risk allele frequency decrease at older ages caused by individuals with higher polygenic risk scores (PRSs) becoming ill proportionately earlier. This effect is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer's disease, coronary artery disease, cerebral stroke, and type 2 diabetes. The incidence rate for LODs grows exponentially for decades after early onset ages, guaranteeing that the cohorts used for GWASs overrepresent older individuals with lower PRSs, whose disease cases are disproportionately due to environmental causes such as old age itself. This mechanism explains the decline in clinical predictive power with age and the lower discovery power of familial studies of heritability and GWASs. It also explains the relatively constant-with-age heritability found for LODs of lower prevalence, exemplified by cancers.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
15
|
Sokolowski M, Wasserman J, Wasserman D. Gene-level associations in suicide attempter families show overrepresentation of synaptic genes and genes differentially expressed in brain development. Am J Med Genet B Neuropsychiatr Genet 2018; 177:774-784. [PMID: 30381879 DOI: 10.1002/ajmg.b.32694] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Revised: 09/10/2018] [Accepted: 09/21/2018] [Indexed: 01/23/2023]
Abstract
Suicidal behavior (SB) has a complex etiology involving different polygenic and environmental components. Here we used an excess of significant markers (ESM) test to study gene-level associations in previous genome-wide association studies (GWAS) SNP data from a family-based sample, having medically severe suicide attempt (SA) as main outcome in the offspring. In SA without major psychiatric disorders (N = 498), a screening of 5,316 genes across the genome suggested association 17 genes (at fdr < 0.05). Genes RETREG1 (a.k.a. FAM134B), GSN, GNAS, and CACNA1D were particularly robust to different methodological variations. Comparison with the more widely used Multi-marker Analysis of GenoMic Annotation (MAGMA) methods, mainly supported RETREG1, GSN, RNASEH2B, UBE2H, and CACNA1D by using the "mean" model, and ranked 13 of the same genes as ESM among its top-17. Complementing the ESM screen by using MAGMA to analyze 17,899 genes, we observed excess of genes with p < .05 by using the "top" model, and the "mean" model suggested additional genes with genome-wide fdr < 0.25. Overrepresentation analysis of 10 selected gene sets using all genes with p < .05, showed significant results for synaptic genes, genes differentially expressed in brain development and for ~12% of the SA polygenic association genes identified previously in this sample. Exploratory analysis linked some of the ESM top-17 genes to psychotropic drugs and we examined the allelic heterogeneity in the previous SA candidate GRIN2B. This study complemented previous GWAS on SB outcomes, implicating both previous candidate (e.g., GRIN2B and GNAS) and novel genes in SA outcomes, as well as synaptic functions and brain development.
Collapse
Affiliation(s)
- Marcus Sokolowski
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), Stockholm, Sweden
| | - Jerzy Wasserman
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), Stockholm, Sweden
| | - Danuta Wasserman
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), Stockholm, Sweden.,WHO Collaborating Centre for Research, Methods, Development and Training in Suicide Prevention, Stockholm, Sweden
| |
Collapse
|
16
|
Kono TJY, Lei L, Shih CH, Hoffman PJ, Morrell PL, Fay JC. Comparative Genomics Approaches Accurately Predict Deleterious Variants in Plants. G3 (BETHESDA, MD.) 2018; 8:3321-3329. [PMID: 30139765 PMCID: PMC6169392 DOI: 10.1534/g3.118.200563] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 08/10/2018] [Indexed: 12/11/2022]
Abstract
Recent advances in genome resequencing have led to increased interest in prediction of the functional consequences of genetic variants. Variants at phylogenetically conserved sites are of particular interest, because they are more likely than variants at phylogenetically variable sites to have deleterious effects on fitness and contribute to phenotypic variation. Numerous comparative genomic approaches have been developed to predict deleterious variants, but the approaches are nearly always assessed based on their ability to identify known disease-causing mutations in humans. Determining the accuracy of deleterious variant predictions in nonhuman species is important to understanding evolution, domestication, and potentially to improving crop quality and yield. To examine our ability to predict deleterious variants in plants we generated a curated database of 2,910 Arabidopsis thaliana mutants with known phenotypes. We evaluated seven approaches and found that while all performed well, their relative ranking differed from prior benchmarks in humans. We conclude that deleterious mutations can be reliably predicted in A. thaliana and likely other plant species, but that the relative performance of various approaches does not necessarily translate from one species to another.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Li Lei
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Ching-Hua Shih
- Department of Genetics, Washington University, St. Louis, MO 63110
| | - Paul J Hoffman
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Peter L Morrell
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Justin C Fay
- Department of Genetics, Washington University, St. Louis, MO 63110
| |
Collapse
|
17
|
From Inflammation to Current and Alternative Therapies Involved in Wound Healing. Int J Inflam 2017; 2017:3406215. [PMID: 28811953 PMCID: PMC5547704 DOI: 10.1155/2017/3406215] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Revised: 06/01/2017] [Accepted: 06/06/2017] [Indexed: 02/08/2023] Open
Abstract
Wound healing is a complex event that develops in three overlapping phases: inflammatory, proliferative, and remodeling. These phases are distinct in function and histological characteristics. However, they depend on the interaction of cytokines, growth factors, chemokines, and chemical mediators from cells to perform regulatory events. In this article, we will review the pathway in the skin healing cascade, relating the major chemical inflammatory mediators, cellular and molecular, as well as demonstrating the local and systemic factors that interfere in healing and disorders associated with tissue repair deficiency. Finally, we will discuss the current therapeutic interventions in the wounds treatment, and the alternative therapies used as promising results in the development of new products with healing potential.
Collapse
|
18
|
The Beavis Effect in Next-Generation Mapping Panels in Drosophila melanogaster. G3-GENES GENOMES GENETICS 2017; 7:1643-1652. [PMID: 28592647 PMCID: PMC5473746 DOI: 10.1534/g3.117.041426] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
A major goal in the analysis of complex traits is to partition the observed genetic variation in a trait into components due to individual loci and perhaps variants within those loci. However, in both QTL mapping and genetic association studies, the estimated percent variation attributable to a QTL is upwardly biased conditional on it being discovered. This bias was first described in two-way QTL mapping experiments by William Beavis, and has been referred to extensively as “the Beavis effect.” The Beavis effect is likely to occur in multiparent population (MPP) panels as well as collections of sequenced lines used for genome-wide association studies (GWAS). However, the strength of the Beavis effect is unknown—and often implicitly assumed to be negligible—when “hits” are obtained from an association panel consisting of hundreds of inbred lines tested across millions of SNPs, or in multiparent mapping populations where mapping involves fitting a complex statistical model with several d.f. at thousands of genetic intervals. To estimate the size of the effect in more complex panels, we performed simulations of both biallelic and multiallelic QTL in two major Drosophila melanogaster mapping panels, the GWAS-based Drosophila Genetic Reference Panel (DGRP), and the MPP the Drosophila Synthetic Population Resource (DSPR). Our results show that overestimation is determined most strongly by sample size and is only minimally impacted by the mapping design. When < 100, 200, 500, and 1000 lines are employed, the variance attributable to hits is inflated by factors of 6, 3, 1.5, and 1.1, respectively, for a QTL that truly contributes 5% to the variation in the trait. This overestimation indicates that QTL could be difficult to validate in follow-up replication experiments where additional individuals are examined. Further, QTL could be difficult to cross-validate between the two Drosophila resources. We provide guidelines for: (1) the sample sizes necessary to accurately estimate the percent variance to an identified QTL, (2) the conditions under which one is likely to replicate a mapped QTL in a second study using the same mapping population, and (3) the conditions under which a QTL mapped in one mapping panel is likely to replicate in the other (DGRP and DSPR).
Collapse
|
19
|
Loci Contributing to Boric Acid Toxicity in Two Reference Populations of Drosophila melanogaster. G3-GENES GENOMES GENETICS 2017; 7:1631-1641. [PMID: 28592646 PMCID: PMC5473745 DOI: 10.1534/g3.117.041418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Populations maintain considerable segregating variation in the response to toxic, xenobiotic compounds. To identify variants associated with resistance to boric acid, a commonly-used household insecticide with a poorly understood mechanism of action, we assayed thousands of individuals from hundreds of strains. Using the Drosophila Synthetic Population Resource (DSPR), a multi-parental population (MPP) of inbred genotypes, we mapped six QTL to short genomic regions containing few protein-coding genes (3–188), allowing us to identify plausible candidate genes underlying resistance to boric acid toxicity. One interval contains multiple genes from the cytochrome P450 family, and we show that ubiquitous RNAi of one of these genes, Cyp9b2, markedly reduces resistance to the toxin. Resistance to boric acid is positively correlated with caffeine resistance. The two phenotypes additionally share a pair of QTL, potentially suggesting a degree of pleiotropy in the genetic control of resistance to these two distinct xenobiotics. Finally, we screened the Drosophila Genetic Reference Panel (DGRP) in an attempt to identify sequence variants within mapped QTL that are associated with boric acid resistance. The approach was largely unsuccessful, with only one QTL showing any associations at QTL-specific 20% False Discovery Rate (FDR) thresholds. Nonetheless, these associations point to a potential candidate gene that can be targeted in future validation efforts. Although the mapping data resulting from the two reference populations do not clearly overlap, our work provides a starting point for further genetic dissection of the processes underlying boric acid toxicity in insects.
Collapse
|
20
|
Josephs EB, Stinchcombe JR, Wright SI. What can genome-wide association studies tell us about the evolutionary forces maintaining genetic variation for quantitative traits? THE NEW PHYTOLOGIST 2017; 214:21-33. [PMID: 28211582 DOI: 10.1111/nph.14410] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/14/2016] [Indexed: 05/27/2023]
Abstract
Contents 21 I. 21 II. 22 III. 24 IV. 25 V. 29 30 References 30 SUMMARY: Understanding the evolutionary forces that shape genetic variation within species has long been a goal of evolutionary biology. Integrating data for the genetic architecture of traits from genome-wide association mapping studies (GWAS) along with the development of new population genetic methods for identifying selection in sequence data may allow us to evaluate the roles of mutation-selection balance and balancing selection in shaping genetic variation at various scales. Here, we review the theoretical predictions for genetic architecture and additional signals of selection on genomic sequence for the loci that affect traits. Next, we review how plant GWAS have tested for the signatures of various selective scenarios. Limited evidence to date suggests that within-population variation is maintained primarily by mutation-selection balance while variation across the landscape is the result of local adaptation. However, there are a number of inherent biases in these interpretations. We highlight these challenges and suggest ways forward to further understanding of the maintenance of variation.
Collapse
Affiliation(s)
- Emily B Josephs
- Department of Evolution and Ecology, University of California, Davis, One Shields Avenue, Davis, CA, 95616, USA
| | - John R Stinchcombe
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, ON, M5S 3B2, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, ON, M5S 3B2, Canada
| |
Collapse
|
21
|
Excess of genomic defects in a woolly mammoth on Wrangel island. PLoS Genet 2017; 13:e1006601. [PMID: 28253255 PMCID: PMC5333797 DOI: 10.1371/journal.pgen.1006601] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 01/24/2017] [Indexed: 01/31/2023] Open
Abstract
Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population 45,000 years ago when mammoths were plentiful. The second, a 4300 yr old specimen, is derived from an isolated population on Wrangel island where mammoths subsisted with small effective population size more than 43-fold lower than previous populations. These extreme differences in effective population size offer a rare opportunity to test nearly neutral models of genome architecture evolution within a single species. Using these previously published mammoth sequences, we identify deletions, retrogenes, and non-functionalizing point mutations. In the Wrangel island mammoth, we identify a greater number of deletions, a larger proportion of deletions affecting gene sequences, a greater number of candidate retrogenes, and an increased number of premature stop codons. This accumulation of detrimental mutations is consistent with genomic meltdown in response to low effective population sizes in the dwindling mammoth population on Wrangel island. In addition, we observe high rates of loss of olfactory receptors and urinary proteins, either because these loci are non-essential or because they were favored by divergent selective pressures in island environments. Finally, at the locus of FOXQ1 we observe two independent loss-of-function mutations, which would confer a satin coat phenotype in this island woolly mammoth. We observe an excess of detrimental mutations, consistent with genomic meltdown in woolly mammoths on Wrangel Island just prior to extinction. We observe an excess of deletions, an increase in the proportion of deletions affecting gene sequences, and an excess of premature stop codons in response to evolution under low effective population sizes. Large numbers of olfactory receptors appear to have loss of function mutations in the island mammoth. These results offer genetic support within a single species for nearly-neutral theories of genome evolution. We also observe two independent loss of function mutations at the FOXQ1 locus, likely conferring a satin coat in this unusual woolly mammoth.
Collapse
|
22
|
A Model of Compound Heterozygous, Loss-of-Function Alleles Is Broadly Consistent with Observations from Complex-Disease GWAS Datasets. PLoS Genet 2017; 13:e1006573. [PMID: 28103232 PMCID: PMC5289629 DOI: 10.1371/journal.pgen.1006573] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 02/02/2017] [Accepted: 01/05/2017] [Indexed: 12/17/2022] Open
Abstract
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. Gene action determines how mutations affect phenotype. When placed in an evolutionary context, the details of the genotype-to-phenotype model can impact the maintenance of genetic variation for complex traits. Likewise, non-equilibrium demographic history may affect patterns of genetic variation. Here, we explore the impact of genetic model and population growth on distribution of genetic variance across the allele frequency spectrum underlying risk for a complex disease. Using forward-in-time population genetic simulations, we show that the genetic model has important impacts on the composition of variation for complex disease risk in a population. We explicitly simulate genome-wide association studies (GWAS) and perform heritability estimation on population samples. A particular model of gene-based partial recessivity, based on allelic non-complementation, aligns well with empirical results. This model is congruent with the dominance variance estimates from both SNPs and twins, and the minor allele frequency distribution of GWAS hits.
Collapse
|
23
|
Progress from genome-wide association studies and copy number variant studies in epilepsy. Curr Opin Neurol 2016; 29:158-67. [PMID: 26886358 DOI: 10.1097/wco.0000000000000296] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
PURPOSE OF REVIEW The pace of gene discovery in epilepsy remains frenetic. Although most recent discoveries have come from next-generation sequencing studies, there has also been important progress using more established methodologies, such as genome-wide association studies (GWASs) and copy number variants (CNVs) identified through array-based techniques. Progress in these areas over the last year is reviewed. RECENT FINDINGS The first meta-analysis of GWASs was a landmark development for the epilepsy community, though more sizeable studies are sorely needed. Other GWASs point to potentially interesting discoveries, and are in need of replication and follow-up. Copy number variation is emerging as an important genetic contribution to causation across a wide range of epilepsies, with a number of discoveries in epilepsies from the common, such as genetic generalized epilepsies, to the individually comparatively rare, such as particular epileptic encephalopathies. The first studies of CNV analysis from next-generation sequencing data, and of the combination of sequencing and array-based data, have also emerged, allowing more comprehensive genetic evaluation of specific phenotypes. SUMMARY GWASs based on single nucleotide polymorphisms, and CNV analyses based on a variety of data sources, retain a place in the discovery of causation and susceptibility in the epilepsies, and will probably become more powerful in the near future through the use of large-scale next-generation sequencing studies. There are still discoveries to come through these routes.
Collapse
|
24
|
Cogni R, Cao C, Day JP, Bridson C, Jiggins FM. The genetic architecture of resistance to virus infection in Drosophila. Mol Ecol 2016; 25:5228-5241. [PMID: 27460507 PMCID: PMC5082504 DOI: 10.1111/mec.13769] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Revised: 07/03/2016] [Accepted: 07/05/2016] [Indexed: 12/18/2022]
Abstract
Variation in susceptibility to infection has a substantial genetic component in natural populations, and it has been argued that selection by pathogens may result in it having a simpler genetic architecture than many other quantitative traits. This is important as models of host-pathogen co-evolution typically assume resistance is controlled by a small number of genes. Using the Drosophila melanogaster multiparent advanced intercross, we investigated the genetic architecture of resistance to two naturally occurring viruses, the sigma virus and DCV (Drosophila C virus). We found extensive genetic variation in resistance to both viruses. For DCV resistance, this variation is largely caused by two major-effect loci. Sigma virus resistance involves more genes - we mapped five loci, and together these explained less than half the genetic variance. Nonetheless, several of these had a large effect on resistance. Models of co-evolution typically assume strong epistatic interactions between polymorphisms controlling resistance, but we were only able to detect one locus that altered the effect of the main effect loci we had mapped. Most of the loci we mapped were probably at an intermediate frequency in natural populations. Overall, our results are consistent with major-effect genes commonly affecting susceptibility to infectious diseases, with DCV resistance being a near-Mendelian trait.
Collapse
Affiliation(s)
- Rodrigo Cogni
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK.
- Department of Ecology, University of São Paulo, São Paulo, 05508-900, Brazil.
| | - Chuan Cao
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Jonathan P Day
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Calum Bridson
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Francis M Jiggins
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| |
Collapse
|
25
|
Kono TJY, Fu F, Mohammadi M, Hoffman PJ, Liu C, Stupar RM, Smith KP, Tiffin P, Fay JC, Morrell PL. The Role of Deleterious Substitutions in Crop Genomes. Mol Biol Evol 2016; 33:2307-17. [PMID: 27301592 PMCID: PMC4989107 DOI: 10.1093/molbev/msw102] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Populations continually incur new mutations with fitness effects ranging from lethal to adaptive. While the distribution of fitness effects of new mutations is not directly observable, many mutations likely either have no effect on organismal fitness or are deleterious. Historically, it has been hypothesized that a population may carry many mildly deleterious variants as segregating variation, which reduces the mean absolute fitness of the population. Recent advances in sequencing technology and sequence conservation-based metrics for inferring the functional effect of a variant permit examination of the persistence of deleterious variants in populations. The issue of segregating deleterious variation is particularly important for crop improvement, because the demographic history of domestication and breeding allows deleterious variants to persist and reach moderate frequency, potentially reducing crop productivity. In this study, we use exome resequencing of 15 barley accessions and genome resequencing of 8 soybean accessions to investigate the prevalence of deleterious single nucleotide polymorphisms (SNPs) in the protein-coding regions of the genomes of two crops. We conclude that individual cultivars carry hundreds of deleterious SNPs on average, and that nonsense variants make up a minority of deleterious SNPs. Our approach identifies known phenotype-altering variants as deleterious more frequently than the genome-wide average, suggesting that putatively deleterious variants are likely to affect phenotypic variation. We also report the implementation of a SNP annotation tool BAD_Mutations that makes use of a likelihood ratio test based on alignment of all currently publicly available Angiosperm genomes.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Fengli Fu
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Mohsen Mohammadi
- Department of Agronomy and Plant Genetics, University of Minnesota Department of Agronomy, Purdue University
| | - Paul J Hoffman
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Chaochih Liu
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Robert M Stupar
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota
| | - Peter Tiffin
- Department of Plant Biology, University of Minnesota
| | | | - Peter L Morrell
- Department of Agronomy and Plant Genetics, University of Minnesota
| |
Collapse
|
26
|
Highfill CA, Reeves GA, Macdonald SJ. Genetic analysis of variation in lifespan using a multiparental advanced intercross Drosophila mapping population. BMC Genet 2016; 17:113. [PMID: 27485207 PMCID: PMC4970266 DOI: 10.1186/s12863-016-0419-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 07/21/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Considerable natural variation for lifespan exists within human and animal populations. Genetically dissecting this variation can elucidate the pathways and genes involved in aging, and help uncover the genetic mechanisms underlying risk for age-related diseases. Studying aging in model systems is attractive due to their relatively short lifespan, and the ability to carry out programmed crosses under environmentally-controlled conditions. Here we investigate the genetic architecture of lifespan using the Drosophila Synthetic Population Resource (DSPR), a multiparental advanced intercross mapping population. RESULTS We measured lifespan in females from 805 DSPR lines, mapping five QTL (Quantitative Trait Loci) that each contribute 4-5 % to among-line lifespan variation in the DSPR. Each of these QTL co-localizes with the position of at least one QTL mapped in 13 previous studies of lifespan variation in flies. However, given that these studies implicate >90 % of the genome in the control of lifespan, this level of overlap is unsurprising. DSPR QTL intervals harbor 11-155 protein-coding genes, and we used RNAseq on samples of young and old flies to help resolve pathways affecting lifespan, and identify potentially causative loci present within mapped QTL intervals. Broad age-related patterns of expression revealed by these data recapitulate results from previous work. For example, we see an increase in antimicrobial defense gene expression with age, and a decrease in expression of genes involved in the electron transport chain. Several genes within QTL intervals are highlighted by our RNAseq data, such as Relish, a critical immune response gene, that shows increased expression with age, and UQCR-14, a gene involved in mitochondrial electron transport, that has reduced expression in older flies. CONCLUSIONS The five QTL we isolate collectively explain a considerable fraction of the genetic variation for female lifespan in the DSPR, and implicate modest numbers of genes. In several cases the candidate loci we highlight reside in biological pathways already implicated in the control of lifespan variation. Thus, our results provide further evidence that functional genetics tests targeting these genes will be fruitful, lead to the identification of natural sequence variants contributing to lifespan variation, and help uncover the mechanisms of aging.
Collapse
Affiliation(s)
- Chad A Highfill
- Department of Molecular Biosciences, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS, 66045, USA
| | - G Adam Reeves
- Department of Molecular Biosciences, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS, 66045, USA
| | - Stuart J Macdonald
- Department of Molecular Biosciences, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS, 66045, USA. .,Center for Computational Biology, University of Kansas, 2030 Becker Drive, Lawrence, KS, 66047, USA.
| |
Collapse
|
27
|
Schrodi SJ. Reflections on the Field of Human Genetics: A Call for Increased Disease Genetics Theory. Front Genet 2016; 7:106. [PMID: 27375680 PMCID: PMC4896932 DOI: 10.3389/fgene.2016.00106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 05/25/2016] [Indexed: 12/29/2022] Open
Abstract
Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.
Collapse
Affiliation(s)
- Steven J Schrodi
- Marshfield Clinic Research Foundation, Center for Human GeneticsMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| |
Collapse
|
28
|
Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res 2016; 26:863-73. [PMID: 27197206 PMCID: PMC4937562 DOI: 10.1101/gr.202440.115] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 05/16/2016] [Indexed: 12/20/2022]
Abstract
The role of rare alleles in complex phenotypes has been hotly debated, but most rare variant association tests (RVATs) do not account for the evolutionary forces that affect genetic architecture. Here, we use simulation and numerical algorithms to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance. We then assess the ability of RVATs to detect causal loci using simulations and human RNA-seq data. Surprisingly, we find that statistical performance is worst for phenotypes in which genetic variance is due mainly to rare alleles, and explosive population growth decreases power. Although many studies have attempted to identify causal rare variants, few have reported novel associations. This has sometimes been interpreted to mean that rare variants make negligible contributions to complex trait heritability. Our work shows that RVATs are not robust to realistic human evolutionary forces, so general conclusions about the impact of rare variants on complex traits may be premature.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Graduate Program in Bioinformatics, University of California, San Francisco, San Francisco, California 94143, USA
| | - Noah A Zaitlen
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94143, USA
| | - Chun Jimmie Ye
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California 94143, USA
| | - John S Witte
- Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California 94143, USA
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94143, USA
| |
Collapse
|
29
|
Abstract
Genome-wide association studies (GWAS) have associated many single variants with complex disease, yet the better part of heritable complex disease risk remains unexplained. Analytical tools designed to work under specific population genetic models are needed. Rare variants are increasingly shown to be important in human complex disease, but most existing GWAS data do not cover rare variants. Explicit population genetic models predict that genes contributing to complex traits and experiencing recurrent, unconditionally deleterious, mutation will harbor multiple rare, causative mutations of subtle effect. It is difficult to identify genes harboring rare variants of large effect that contribute to complex disease risk via the single marker association tests typically used in GWAS. Gene/region-based association tests may have the power detect associations by combining information from multiple markers, but have yielded limited success in practice. This is partially because many methods have not been widely applied. Here, we empirically demonstrate the utility of a procedure based on the rank truncated product (RTP) method, filtered to reduce the effects of linkage disequilibrium. We apply the procedure to the Wellcome Trust Case Control Consortium (WTCCC) data set, and uncover previously unidentified associations, some of which have been replicated in much larger studies. We show that, in the absence of significant rare variant coverage, RTP based methods still have the power to detect associated genes. We recommend that RTP-based methods be applied to all existing GWAS data to maximize the usefulness of those data. For this, we provide efficient software implementing our procedure.
Collapse
|
30
|
The Effects of Both Recent and Long-Term Selection and Genetic Drift Are Readily Evident in North American Barley Breeding Populations. G3-GENES GENOMES GENETICS 2015; 6:609-22. [PMID: 26715093 PMCID: PMC4777124 DOI: 10.1534/g3.115.024349] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Barley was introduced to North America ∼400 yr ago but adaptation to modern production environments is more recent. Comparisons of allele frequencies among growth habits and spike (inflorescence) types in North America indicate that significant genetic differentiation has accumulated in a relatively short evolutionary time span. Allele frequency differentiation is greatest among barley with two-row vs. six-row spikes, followed by spring vs. winter growth habit. Large changes in allele frequency among breeding programs suggest a major contribution of genetic drift and linked selection on genetic variation. Despite this, comparisons of 3613 modern North American cultivated barley breeding lines that differ for spike-type and growth habit permit the discovery of 142 single nucleotide polymorphism (SNP) outliers putatively linked to targets of selection. For example, SNPs within the Cbf4, Ppd-H1, and Vrn-H1 loci, which have previously been associated with agronomically adaptive phenotypes, are identified as outliers. Analysis of extended haplotype sharing identifies genomic regions shared within and among breeding populations, suggestive of a number of genomic regions subject to recent selection. Finally, we are able to identify recent bouts of gene flow between breeding populations that could point to the sharing of agronomically adaptive variation. These results are supported by pedigrees and breeders’ understanding of germplasm sharing.
Collapse
|
31
|
Najarro MA, Hackett JL, Smith BR, Highfill CA, King EG, Long AD, Macdonald SJ. Identifying Loci Contributing to Natural Variation in Xenobiotic Resistance in Drosophila. PLoS Genet 2015; 11:e1005663. [PMID: 26619284 PMCID: PMC4664282 DOI: 10.1371/journal.pgen.1005663] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/21/2015] [Indexed: 12/12/2022] Open
Abstract
Natural populations exhibit a great deal of interindividual genetic variation in the response to toxins, exemplified by the variable clinical efficacy of pharmaceutical drugs in humans, and the evolution of pesticide resistant insects. Such variation can result from several phenomena, including variable metabolic detoxification of the xenobiotic, and differential sensitivity of the molecular target of the toxin. Our goal is to genetically dissect variation in the response to xenobiotics, and characterize naturally-segregating polymorphisms that modulate toxicity. Here, we use the Drosophila Synthetic Population Resource (DSPR), a multiparent advanced intercross panel of recombinant inbred lines, to identify QTL (Quantitative Trait Loci) underlying xenobiotic resistance, and employ caffeine as a model toxic compound. Phenotyping over 1,700 genotypes led to the identification of ten QTL, each explaining 4.5-14.4% of the broad-sense heritability for caffeine resistance. Four QTL harbor members of the cytochrome P450 family of detoxification enzymes, which represent strong a priori candidate genes. The case is especially strong for Cyp12d1, with multiple lines of evidence indicating the gene causally impacts caffeine resistance. Cyp12d1 is implicated by QTL mapped in both panels of DSPR RILs, is significantly upregulated in the presence of caffeine, and RNAi knockdown robustly decreases caffeine tolerance. Furthermore, copy number variation at Cyp12d1 is strongly associated with phenotype in the DSPR, with a trend in the same direction observed in the DGRP (Drosophila Genetic Reference Panel). No additional plausible causative polymorphisms were observed in a full genomewide association study in the DGRP, or in analyses restricted to QTL regions mapped in the DSPR. Just as in human populations, replicating modest-effect, naturally-segregating causative variants in an association study framework in flies will likely require very large sample sizes.
Collapse
Affiliation(s)
- Michael A. Najarro
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Jennifer L. Hackett
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Brittny R. Smith
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Chad A. Highfill
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Elizabeth G. King
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
- Center for Computational Biology, University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
32
|
The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. Genetics 2015; 201:1601-13. [PMID: 26482794 DOI: 10.1534/genetics.115.177220] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 10/09/2015] [Indexed: 02/08/2023] Open
Abstract
We use computer simulations to investigate the amount of genetic variation for complex traits that can be revealed by single-SNP genome-wide association studies (GWAS) or regional heritability mapping (RHM) analyses based on full genome sequence data or SNP chips. We model a large population subject to mutation, recombination, selection, and drift, assuming a pleiotropic model of mutations sampled from a bivariate distribution of effects of mutations on a quantitative trait and fitness. The pleiotropic model investigated, in contrast to previous models, implies that common mutations of large effect are responsible for most of the genetic variation for quantitative traits, except when the trait is fitness itself. We show that GWAS applied to the full sequence increases the number of QTL detected by as much as 50% compared to the number found with SNP chips but only modestly increases the amount of additive genetic variance explained. Even with full sequence data, the total amount of additive variance explained is generally below 50%. Using RHM on the full sequence data, a slightly larger number of QTL are detected than by GWAS if the same probability threshold is assumed, but these QTL explain a slightly smaller amount of genetic variance. Our results also suggest that most of the missing heritability is due to the inability to detect variants of moderate effect (∼0.03-0.3 phenotypic SDs) segregating at substantial frequencies. Very rare variants, which are more difficult to detect by GWAS, are expected to contribute little genetic variation, so their eventual detection is less relevant for resolving the missing heritability problem.
Collapse
|
33
|
Abstract
The severity of the toxic side effects of chemotherapy shows a great deal of interindividual variability, and much of this variation is likely genetically based. Simple DNA tests predictive of toxic side effects could revolutionize the way chemotherapy is carried out. Due to the challenges in identifying polymorphisms that affect toxicity in humans, we use Drosophila fecundity following oral exposure to carboplatin, gemcitabine and mitomycin C as a model system to identify naturally occurring DNA variants predictive of toxicity. We use the Drosophila Synthetic Population Resource (DSPR), a panel of recombinant inbred lines derived from a multiparent advanced intercross, to map quantitative trait loci affecting chemotoxicity. We identify two QTL each for carboplatin and gemcitabine toxicity and none for mitomycin. One QTL is associated with fly orthologs of a priori human carboplatin candidate genes ABCC2 and MSH2, and a second QTL is associated with fly orthologs of human gemcitabine candidate genes RRM2 and RRM2B. The third, a carboplatin QTL, is associated with a posteriori human orthologs from solute carrier family 7A, INPP4A&B, and NALCN. The fourth, a gemcitabine QTL that also affects methotrexate toxicity, is associated with human ortholog GPx4. Mapped QTL each explain a significant fraction of variation in toxicity, yet individual SNPs and transposable elements in the candidate gene regions fail to singly explain QTL peaks. Furthermore, estimates of founder haplotype effects are consistent with genes harboring several segregating functional alleles. We find little evidence for nonsynonymous SNPs explaining mapped QTL; thus it seems likely that standing variation in toxicity is due to regulatory alleles.
Collapse
|
34
|
Siegert S, Wolf A, Cooper DN, Krawczak M, Nothnagel M. Mutations Causing Complex Disease May under Certain Circumstances Be Protective in an Epidemiological Sense. PLoS One 2015; 10:e0132150. [PMID: 26161957 PMCID: PMC4498598 DOI: 10.1371/journal.pone.0132150] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 06/10/2015] [Indexed: 01/19/2023] Open
Abstract
Guided by the practice of classical epidemiology, research into the genetic basis of complex disease has usually taken for granted the dictum that causative mutations are invariably over-represented among clinically affected as compared to unaffected individuals. However, we show that this supposition is not true and that a mutation contributing to the etiology of a complex disease can, under certain circumstances, be depleted among patients. Populations with defined disease prevalence were repeatedly simulated under a Wright-Fisher model, assuming various types of population history and genotype-phenotype relationship. For each simulation, the resulting mutation-specific population frequencies and odds ratios (ORs) were evaluated. In addition, the relationship between mutation frequency and OR was studied using real data from the NIH GWAS catalogue of reported phenotype associations of single-nucleotide polymorphisms (SNPs). While rare diseases (prevalence <1%) were found to be consistently caused by rare mutations with ORs>1, up to 20% of mutations causing a pandemic disease (prevalence 10-20%) had ORs<1, and their population frequency ranged from 0% to 100%. Moreover, simulation-based ORs exhibited a wide distribution, irrespective of mutation frequency. In conclusion, a substantial proportion of mutations causing common complex diseases may appear 'protective' in genetic epidemiological studies and hence would normally tend to be excluded, albeit erroneously, from further study. This apparently paradoxical result is explicable in terms of mutual confounding of the respective genotype-phenotype relationships due to a negative correlation between causal mutations induced by their common gene genealogy. As would be predicted by our findings, a significant negative correlation became apparent in published genome-wide association studies between the OR of genetic variants associated with a particular disease and the prevalence of that disease.
Collapse
Affiliation(s)
- Sabine Siegert
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
- Institute of Epidemiology, Christian-Albrechts University, Kiel, Germany
| | - Andreas Wolf
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany
| | - Michael Nothnagel
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany
- * E-mail:
| |
Collapse
|
35
|
Flanagan SP, Jones AG. Identifying signatures of sexual selection using genomewide selection components analysis. Ecol Evol 2015; 5:2722-44. [PMID: 26257884 PMCID: PMC4523367 DOI: 10.1002/ece3.1546] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 05/19/2015] [Accepted: 05/20/2015] [Indexed: 01/19/2023] Open
Abstract
Sexual selection must affect the genome for it to have an evolutionary impact, yet signatures of selection remain elusive. Here we use an individual-based model to investigate the utility of genome-wide selection components analysis, which compares allele frequencies of individuals at different life history stages within a single population to detect selection without requiring a priori knowledge of traits under selection. We modeled a diploid, sexually reproducing population and introduced strong mate choice on a quantitative trait to simulate sexual selection. Genome-wide allele frequencies in adults and offspring were compared using weighted FST values. The average number of outlier peaks (i.e., those with significantly large FST values) with a quantitative trait locus in close proximity (“real” peaks) represented correct diagnoses of loci under selection, whereas peaks above the FST significance threshold without a quantitative trait locus reflected spurious peaks. We found that, even with moderate sample sizes, signatures of strong sexual selection were detectable, but larger sample sizes improved detection rates. The model was better able to detect selection with more neutral markers, and when quantitative trait loci and neutral markers were distributed across multiple chromosomes. Although environmental variation decreased detection rates, the identification of real peaks nevertheless remained feasible. We also found that detection rates can be improved by sampling multiple populations experiencing similar selection regimes. In short, genome-wide selection components analysis is a challenging but feasible approach for the identification of regions of the genome under selection.
Collapse
Affiliation(s)
- Sarah P Flanagan
- Biology Department, Texas A&M University 3258 TAMU, College Station, Texas, 77843
| | - Adam G Jones
- Biology Department, Texas A&M University 3258 TAMU, College Station, Texas, 77843
| |
Collapse
|
36
|
Schrodi SJ, DeBarber A, He M, Ye Z, Peissig P, Van Wormer JJ, Haws R, Brilliant MH, Steiner RD. Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data. Hum Genet 2015; 134:659-69. [PMID: 25893794 DOI: 10.1007/s00439-015-1551-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 04/05/2015] [Indexed: 01/28/2023]
Abstract
Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy-Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation--functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95% credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.
Collapse
Affiliation(s)
- Steven J Schrodi
- Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 N Oak Ave-MLR, Marshfield, WI, 54449, USA,
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits. Genetics 2015; 199:991-1005. [PMID: 25672748 PMCID: PMC4391575 DOI: 10.1534/genetics.115.175075] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 02/05/2015] [Indexed: 11/18/2022] Open
Abstract
Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50–100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.
Collapse
|
38
|
Cheeseman IH, McDew-White M, Phyo AP, Sriprawat K, Nosten F, Anderson TJC. Pooled sequencing and rare variant association tests for identifying the determinants of emerging drug resistance in malaria parasites. Mol Biol Evol 2014; 32:1080-90. [PMID: 25534029 PMCID: PMC4379400 DOI: 10.1093/molbev/msu397] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We explored the potential of pooled sequencing to swiftly and economically identify selective sweeps due to emerging artemisinin (ART) resistance in a South-East Asian malaria parasite population. ART resistance is defined by slow parasite clearance from the blood of ART-treated patients and mutations in the kelch gene (chr. 13) have been strongly implicated to play a role. We constructed triplicate pools of 70 slow-clearing (resistant) and 70 fast-clearing (sensitive) infections collected from the Thai–Myanmar border and sequenced these to high (∼150-fold) read depth. Allele frequency estimates from pools showed almost perfect correlation (Lin’s concordance = 0.98) with allele frequencies at 93 single nucleotide polymorphisms measured directly from individual infections, giving us confidence in the accuracy of this approach. By mapping genome-wide divergence (FST) between pools of drug-resistant and drug-sensitive parasites, we identified two large (>150 kb) regions (on chrs. 13 and 14) and 17 smaller candidate genome regions. To identify individual genes within these genome regions, we resequenced an additional 38 parasite genomes (16 slow and 22 fast-clearing) and performed rare variant association tests. These confirmed kelch as a major molecular marker for ART resistance (P = 6.03 × 10−6). This two-tier approach is powerful because pooled sequencing rapidly narrows down genome regions of interest, while targeted rare variant association testing within these regions can pinpoint the genetic basis of resistance. We show that our approach is robust to recurrent mutation and the generation of soft selective sweeps, which are predicted to be common in pathogen populations with large effective population sizes, and may confound more traditional gene mapping approaches.
Collapse
Affiliation(s)
| | | | - Aung Pyae Phyo
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand
| | - Kanlaya Sriprawat
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand
| | - François Nosten
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand Centre for Tropical Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | | |
Collapse
|
39
|
Uricchio LH, Torres R, Witte JS, Hernandez RD. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet Epidemiol 2014; 39:35-44. [PMID: 25417809 DOI: 10.1002/gepi.21866] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 09/09/2014] [Accepted: 09/26/2014] [Indexed: 12/12/2022]
Abstract
Demographic events and natural selection alter patterns of genetic variation within populations and may play a substantial role in shaping the genetic architecture of complex phenotypes and disease. However, the joint impact of these basic evolutionary forces is often ignored in the assessment of statistical tests of association. Here, we provide a simulation-based framework for generating DNA sequences that incorporates selection and demography with flexible models for simulating phenotypic variation (sfs_coder). This tool also allows the user to perform locus-specific simulations by automatically querying annotated genomic functional elements and genetic maps. We demonstrate the effects of evolutionary forces on patterns of genetic variation by simulating recently inferred models of human selection and demography. We use these simulations to show that the demographic model and locus-specific features, such as the proportion of sites under selection, may have practical implications for estimating the statistical power of sequencing-based rare variant association tests. In particular, for some phenotype models, there may be higher power to detect rare variant associations in African populations compared to non-Africans, but power is considerably reduced in regions of the genome with rampant negative selection. Furthermore, we show that existing methods for simulating large samples based on resampling from a small set of observed haplotypes fail to recapitulate the distribution of rare variants in the presence of rapid population growth (as has been observed in several human populations).
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Graduate Program in Bioinformatics, University of California, San Francisco, California, United States of America
| | | | | | | |
Collapse
|
40
|
Chen HS, Hutter CM, Mechanic LE, Amos CI, Bafna V, Hauser ER, Hernandez RD, Li C, Liberles DA, McAllister K, Moore JH, Paltoo DN, Papanicolaou GJ, Peng B, Ritchie MD, Rosenfeld G, Witte JS, Gillanders EM, Feuer EJ. Genetic simulation tools for post-genome wide association studies of complex diseases. Genet Epidemiol 2014; 39:11-19. [PMID: 25371374 DOI: 10.1002/gepi.21870] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Revised: 09/02/2014] [Accepted: 09/26/2014] [Indexed: 01/12/2023]
Abstract
Genetic simulation programs are used to model data under specified assumptions to facilitate the understanding and study of complex genetic systems. Standardized data sets generated using genetic simulation are essential for the development and application of novel analytical tools in genetic epidemiology studies. With continuing advances in high-throughput genomic technologies and generation and analysis of larger, more complex data sets, there is a need for updating current approaches in genetic simulation modeling. To provide a forum to address current and emerging challenges in this area, the National Cancer Institute (NCI) sponsored a workshop, entitled "Genetic Simulation Tools for Post-Genome Wide Association Studies of Complex Diseases" at the National Institutes of Health (NIH) in Bethesda, Maryland on March 11-12, 2014. The goals of the workshop were to (1) identify opportunities, challenges, and resource needs for the development and application of genetic simulation models; (2) improve the integration of tools for modeling and analysis of simulated data; and (3) foster collaborations to facilitate development and applications of genetic simulation. During the course of the meeting, the group identified challenges and opportunities for the science of simulation, software and methods development, and collaboration. This paper summarizes key discussions at the meeting, and highlights important challenges and opportunities to advance the field of genetic simulation.
Collapse
Affiliation(s)
- Huann-Sheng Chen
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Carolyn M Hutter
- Division of Genomic Medicine, National Human Genome Research Institute, NIH, Bethesda, MD 20892
| | - Leah E Mechanic
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Christopher I Amos
- Division of Community, Family Medicine, Dartmouth College, Lebanon, NH 03755
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093
| | | | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143
| | - Chun Li
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37235
| | - David A Liberles
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071
| | - Kimberly McAllister
- Susceptibility and Population Health Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709
| | - Jason H Moore
- Department of Genetics, Dartmouth College, Lebanon, NH 03755
| | - Dina N Paltoo
- Office of Director, National Institutes of Health, Bethesda, MD 20892
| | - George J Papanicolaou
- Division of Cardiovascular Sciences, Prevention and Population Sciences Program, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892
| | - Bo Peng
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802
| | - Gabriel Rosenfeld
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94107
| | - Elizabeth M Gillanders
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Eric J Feuer
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| |
Collapse
|
41
|
Satten GA, Biswas S, Papachristou C, Turkmen A, König IR. Population-based association and gene by environment interactions in Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S49-56. [PMID: 25112188 DOI: 10.1002/gepi.21825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.
Collapse
Affiliation(s)
- Glen A Satten
- Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | | | | | | |
Collapse
|
42
|
Li M, Liu X, Bradbury P, Yu J, Zhang YM, Todhunter RJ, Buckler ES, Zhang Z. Enrichment of statistical power for genome-wide association studies. BMC Biol 2014; 12:73. [PMID: 25322753 PMCID: PMC4210555 DOI: 10.1186/s12915-014-0073-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 09/09/2014] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The inheritance of most human diseases and agriculturally important traits is controlled by many genes with small effects. Identifying these genes, while simultaneously controlling false positives, is challenging. Among available statistical methods, the mixed linear model (MLM) has been the most flexible and powerful for controlling population structure and individual unequal relatedness (kinship), the two common causes of spurious associations. The introduction of the compressed MLM (CMLM) method provided additional opportunities for optimization by adding two new model parameters: grouping algorithms and number of groups. RESULTS This study introduces another model parameter to develop an enriched CMLM (ECMLM). The parameter involves algorithms to define kinship between groups (that is, kinship algorithms). The ECMLM calculates kinship using several different algorithms and then chooses the best combination between kinship algorithms and grouping algorithms. CONCLUSION Simulations show that the ECMLM increases statistical power. In some cases, the magnitude of power gained by using ECMLM instead of CMLM is larger than the improvement found by using CMLM instead of MLM.
Collapse
Affiliation(s)
- Meng Li
- />College of Horticulture, Nanjing Agricultural University, Nanjing, 210095 China
- />Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853 USA
| | - Xiaolei Liu
- />Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853 USA
| | - Peter Bradbury
- />United States Department of Agriculture (USDA) – Agricultural Research Service (ARS), Ithaca, New York 14853 USA
| | - Jianming Yu
- />Department of Agronomy, Kansas State University, Manhattan, Kansas 66506 USA
| | - Yuan-Ming Zhang
- />State Key Laboratory of Crop Genetics and Germplasm Enhancement/National Center for Soybean Improvement, College of Agriculture, Nanjing Agricultural University, Nanjing, 210095 China
| | - Rory J Todhunter
- />Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, New York 14853 USA
| | - Edward S Buckler
- />Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853 USA
- />United States Department of Agriculture (USDA) – Agricultural Research Service (ARS), Ithaca, New York 14853 USA
| | - Zhiwu Zhang
- />Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853 USA
- />College of Agronomy, Northeast Agricultural University, Harbin, Heilongjiang 150030 China
- />Department of Crop and Soil Science, Washington State University, Pullman, WA 99164 USA
| |
Collapse
|
43
|
Sokolowski M, Wasserman J, Wasserman D. Genome-wide association studies of suicidal behaviors: a review. Eur Neuropsychopharmacol 2014; 24:1567-77. [PMID: 25219938 DOI: 10.1016/j.euroneuro.2014.08.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 07/24/2014] [Accepted: 08/10/2014] [Indexed: 11/17/2022]
Abstract
Suicidal behaviors represent a fatal dimension of mental ill-health, involving both environmental and heritable (genetic) influences. The putative genetic components of suicidal behaviors have until recent years been mainly investigated by hypothesis-driven research (of "candidate genes"). But technological progress in genotyping has opened the possibilities towards (hypothesis-generating) genomic screens and novel opportunities to explore polygenetic perspectives, now spanning a wide array of possible analyses falling under the term Genome-Wide Association Study (GWAS). Here we introduce and discuss broadly some apparent limitations but also certain developing opportunities of GWAS. We summarize the results from all the eight GWAS conducted up to date focused on suicidality outcomes; treatment emergent suicidal ideation (3 studies), suicide attempts (4 studies) and completed suicides (1 study). Clearly, there are few (if any) genome-wide significant and reproducible findings yet to be demonstrated. We then discuss and pinpoint certain future considerations in relation to sample sizes, the units of genetic associations used, study designs and outcome definitions, psychiatric diagnoses or biological measures, as well as the use of genomic sequencing. We conclude that GWAS should have a lot more potential to show in the case of suicidal outcomes, than what has yet been realized.
Collapse
Affiliation(s)
- Marcus Sokolowski
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), S-171 77 Stockholm, Sweden.
| | - Jerzy Wasserman
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), S-171 77 Stockholm, Sweden
| | - Danuta Wasserman
- National Centre for Suicide Research and Prevention of Mental Ill-Health (NASP), Karolinska Institute (KI), S-171 77 Stockholm, Sweden
| |
Collapse
|
44
|
Long AD, Macdonald SJ, King EG. Dissecting complex traits using the Drosophila Synthetic Population Resource. Trends Genet 2014; 30:488-95. [PMID: 25175100 DOI: 10.1016/j.tig.2014.07.009] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 07/28/2014] [Accepted: 07/28/2014] [Indexed: 11/25/2022]
Abstract
For most complex traits we have a poor understanding of the positions, phenotypic effects, and population frequencies of the underlying genetic variants contributing to their variation. Recently, several groups have developed multi-parent advanced intercross mapping panels in different model organisms in an attempt to improve our ability to characterize causative genetic variants. These panels are powerful and are particularly well suited to the dissection of phenotypic variation generated by rare alleles and loci segregating multiple functional alleles. We describe studies using one such panel, the Drosophila Synthetic Population Resource (DSPR), and the implications for our understanding of the genetic basis of complex traits. In particular, we note that many loci of large effect appear to be multiallelic. If multiallelism is a general rule, analytical approaches designed to identify multiallelic variants should be a priority for both genome-wide association studies (GWASs) and multi-parental panels.
Collapse
Affiliation(s)
- Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA.
| | - Stuart J Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA
| | - Elizabeth G King
- Division of Biological Sciences, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
45
|
A C++ template library for efficient forward-time population genetic simulation of large populations. Genetics 2014; 198:157-66. [PMID: 24950894 DOI: 10.1534/genetics.114.165019] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
fwdpp is a C++ library of routines intended to facilitate the development of forward-time simulations under arbitrary mutation and fitness models. The library design provides a combination of speed, low memory overhead, and modeling flexibility not currently available from other forward simulation tools. The library is particularly useful when the simulation of large populations is required, as programs implemented using the library are much more efficient than other available forward simulation programs.
Collapse
|
46
|
Lohmueller KE. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 2014; 10:e1004379. [PMID: 24875776 PMCID: PMC4038606 DOI: 10.1371/journal.pgen.1004379] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 03/28/2014] [Indexed: 02/06/2023] Open
Abstract
Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Because researchers are performing exome sequencing studies aimed at uncovering the role of low-frequency variants in the risk of complex traits, this topic is of critical importance. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant associations with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant associations detected. These findings suggest recent population history may be an important factor influencing the power of association tests and in accounting for the missing heritability of certain complex traits.
Collapse
Affiliation(s)
- Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
47
|
King EG, Sanderson BJ, McNeil CL, Long AD, Macdonald SJ. Genetic dissection of the Drosophila melanogaster female head transcriptome reveals widespread allelic heterogeneity. PLoS Genet 2014; 10:e1004322. [PMID: 24810915 PMCID: PMC4014434 DOI: 10.1371/journal.pgen.1004322] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 03/10/2014] [Indexed: 12/01/2022] Open
Abstract
Modern genetic mapping is plagued by the “missing heritability” problem, which refers to the discordance between the estimated heritabilities of quantitative traits and the variance accounted for by mapped causative variants. One major potential explanation for the missing heritability is allelic heterogeneity, in which there are multiple causative variants at each causative gene with only a fraction having been identified. The majority of genome-wide association studies (GWAS) implicitly assume that a single SNP can explain all the variance for a causative locus. However, if allelic heterogeneity is prevalent, a substantial amount of genetic variance will remain unexplained. In this paper, we take a haplotype-based mapping approach and quantify the number of alleles segregating at each locus using a large set of 7922 eQTL contributing to regulatory variation in the Drosophila melanogaster female head. Not only does this study provide a comprehensive eQTL map for a major community genetic resource, the Drosophila Synthetic Population Resource, but it also provides a direct test of the allelic heterogeneity hypothesis. We find that 95% of cis-eQTLs and 78% of trans-eQTLs are due to multiple alleles, demonstrating that allelic heterogeneity is widespread in Drosophila eQTL. Allelic heterogeneity likely contributes significantly to the missing heritability problem common in GWAS studies. For traits with complex genetic inheritance it has generally proven very difficult to identify the majority of the specific causative variants involved. A range of hypotheses have been put forward to explain this so-called “missing heritability”. One idea—allelic heterogeneity, where genes each harbor multiple different causative variants—has received little attention, because it is difficult to detect with most genetic mapping designs. Here we make use of a panel of Drosophila melanogaster lines derived from multiple founders, allowing us to directly test for the presence of multiple alleles at a large set of genetic loci influencing gene expression. We find that the vast majority of loci harbor more than two functional alleles, demonstrating extensive allelic heterogeneity at the level of gene expression and suggesting that such heterogeneity is an important factor determining the genetic basis of complex trait variation in general.
Collapse
Affiliation(s)
- Elizabeth G. King
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| | - Brian J. Sanderson
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Casey L. McNeil
- Department of Biology, Newman University, Wichita, Kansas, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| |
Collapse
|
48
|
The deleterious mutation load is insensitive to recent population history. Nat Genet 2014; 46:220-4. [PMID: 24509481 PMCID: PMC3953611 DOI: 10.1038/ng.2896] [Citation(s) in RCA: 205] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 01/16/2014] [Indexed: 01/07/2023]
Abstract
Human populations have undergone dramatic changes in population size in the past 100,000 years, including recent rapid growth. How these demographic events have affected the burden of deleterious mutations in individuals and the frequencies of disease mutations in populations remains unclear. We use population genetic models to show that recent human demography has likely had little impact on the average burden of deleterious mutations. This prediction is supported by two exome sequence datasets showing that individuals of west African and European ancestry carry very similar burdens of damaging mutations. We further show that for many diseases, rare alleles are unlikely to contribute a large fraction of the heritable variation, and therefore the impact of recent growth is likely to be modest. However, for those diseases that have a direct impact on fitness, strongly deleterious rare mutations likely do play an important role, and recent growth will have increased their impact.
Collapse
|
49
|
Baldwin-Brown JG, Long AD, Thornton KR. The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms. Mol Biol Evol 2014; 31:1040-55. [PMID: 24441104 PMCID: PMC3969567 DOI: 10.1093/molbev/msu048] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
A novel approach for dissecting complex traits is to experimentally evolve laboratory populations under a controlled environment shift, resequence the resulting populations, and identify single nucleotide polymorphisms (SNPs) and/or genomic regions highly diverged in allele frequency. To better understand the power and localization ability of such an evolve and resequence (E&R) approach, we carried out forward-in-time population genetics simulations of 1 Mb genomic regions under a large combination of experimental conditions, then attempted to detect significantly diverged SNPs. Our analysis indicates that the ability to detect differentiation between populations is primarily affected by selection coefficient, population size, number of replicate populations, and number of founding haplotypes. We estimate that E&R studies can detect and localize causative sites with 80% success or greater when the number of founder haplotypes is over 500, experimental populations are replicated at least 25-fold, population size is at least 1,000 diploid individuals, and the selection coefficient on the locus of interest is at least 0.1. More achievable experimental designs (less replicated, fewer founder haplotypes, smaller effective population size, and smaller selection coefficients) can have power of greater than 50% to identify a handful of SNPs of which one is likely causative. Similarly, in cases where s ≥ 0.2, less demanding experimental designs can yield high power.
Collapse
|
50
|
Mezmouk S, Ross-Ibarra J. The pattern and distribution of deleterious mutations in maize. G3 (BETHESDA, MD.) 2014; 4:163-71. [PMID: 24281428 PMCID: PMC3887532 DOI: 10.1534/g3.113.008870] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 11/19/2013] [Indexed: 12/19/2022]
Abstract
Most nonsynonymous mutations are thought to be deleterious because of their effect on protein sequence and are expected to be removed or kept at low frequency by the action of natural selection. Nonetheless, the effect of positive selection on linked sites or drift in small or inbred populations may also impact the evolution of deleterious alleles. Despite their potential to affect complex trait phenotypes, deleterious alleles are difficult to study precisely because they are often at low frequency. Here, we made use of genome-wide genotyping data to characterize deleterious variants in a large panel of maize inbred lines. We show that, despite small effective population sizes and inbreeding, most putatively deleterious SNPs are indeed at low frequencies within individual genetic groups. We find that genes associated with a number of complex traits are enriched for deleterious variants. Together, these data are consistent with the dominance model of heterosis, in which complementation of numerous low-frequency, weak deleterious variants contribute to hybrid vigor.
Collapse
Affiliation(s)
- Sofiane Mezmouk
- Department of Plant Sciences, University of California–Davis, Davis, California 95616
| | - Jeffrey Ross-Ibarra
- Department of Plant Sciences, University of California–Davis, Davis, California 95616
- Center for Population Biology and Genome Center, University of California–Davis, Davis, California 95616
| |
Collapse
|