1
|
Campos-Martin R, Schmickler S, Goel M, Schneeberger K, Tresch A. Reliable genotyping of recombinant genomes using a robust hidden Markov model. PLANT PHYSIOLOGY 2023; 192:821-836. [PMID: 36946207 PMCID: PMC10231367 DOI: 10.1093/plphys/kiad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 01/20/2023] [Accepted: 01/27/2023] [Indexed: 06/01/2023]
Abstract
Meiotic recombination is an essential mechanism during sexual reproduction and includes the exchange of chromosome segments between homologous chromosomes. New allelic combinations are transmitted to the new generation, introducing novel genetic variation in the offspring genomes. With the improvement of high-throughput whole-genome sequencing technologies, large numbers of recombinant individuals can now be sequenced with low sequencing depth at low costs, necessitating computational methods for reconstructing their haplotypes. The main challenge is the uncertainty in haplotype calling that arises from the low information content of a single genomic position. Straightforward sliding window-based approaches are difficult to tune and fail to place recombination breakpoints precisely. Hidden Markov model (HMM)-based approaches, on the other hand, tend to over-segment the genome. Here, we present RTIGER, an HMM-based model that exploits in a mathematically precise way the fact that true chromosome segments typically have a certain minimum length. We further separate the task of identifying the correct haplotype sequence from the accurate placement of haplotype borders, thereby maximizing the accuracy of border positions. By comparing segmentations based on simulated data with known underlying haplotypes, we highlight the reasons for RTIGER outperforming traditional segmentation approaches. We then analyze the meiotic recombination pattern of segregants of 2 Arabidopsis (Arabidopsis thaliana) accessions and a previously described hyper-recombining mutant. RTIGER is available as an R package with an efficient Julia implementation of the core algorithm.
Collapse
Affiliation(s)
- Rafael Campos-Martin
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Division of Neurogenetics and Molecular Psychiatry, Department of Psychiatry and Psychotherapy, University of Cologne, Medical Faculty, Cologne 50937, Germany
| | - Sophia Schmickler
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
| | - Manish Goel
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Faculty for Biology, LMU Munich, Planegg-Martinsried 82152, Germany
| | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Faculty for Biology, LMU Munich, Planegg-Martinsried 82152, Germany
- Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf 40225, Germany
| | - Achim Tresch
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
- CECAD, University of Cologne, Cologne 50931, Germany
- Center for Data and Simulation Science, University of Cologne, Cologne 50931, Germany
| |
Collapse
|
2
|
Manching H, Wisser RJ. SPEARS: Standard Performance Evaluation of Ancestral haplotype Reconstruction through Simulation. Bioinformatics 2021; 37:868-870. [PMID: 32840564 PMCID: PMC8097754 DOI: 10.1093/bioinformatics/btaa749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/05/2020] [Accepted: 08/18/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference. RESULTS We introduce SPEARS, a pipeline for the simulation-based appraisal of genome-wide haplotype maps constructed from sparse genotype data. Using a specified pedigree, the pipeline generates virtual genotypes (known data) with genotyping errors and missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to genotyping, imputation and haplotype inference. Standard metrics allow researchers to assess different population designs and which features of haplotype structure or regions of the genome are sufficiently accurate for analysis. Haplotype maps for 1000 outcross progeny from a multi-parent population of maize are used to demonstrate SPEARS. AVAILABILITYAND IMPLEMENTATION SPEARS, the protocol and suite of scripts, are publicly available under an MIT license at GitHub (https://github.com/maizeatlas/spears). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Heather Manching
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA
| | - Randall J Wisser
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA
| |
Collapse
|
3
|
Malosetti M, Zwep LB, Forrest K, van Eeuwijk FA, Dieters M. Lessons from a GWAS study of a wheat pre-breeding program: pyramiding resistance alleles to Fusarium crown rot. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:897-908. [PMID: 33367942 PMCID: PMC7925461 DOI: 10.1007/s00122-020-03740-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 11/24/2020] [Indexed: 05/18/2023]
Abstract
Much has been published on QTL detection for complex traits using bi-parental and multi-parental crosses (linkage analysis) or diversity panels (GWAS studies). While successful for detection, transferability of results to real applications has proven more difficult. Here, we combined a QTL detection approach using a pre-breeding populations which utilized intensive phenotypic selection for the target trait across multiple plant generations, combined with rapid generation turnover (i.e. "speed breeding") to allow cycling of multiple plant generations each year. The reasoning is that QTL mapping information would complement the selection process by identifying the genome regions under selection within the relevant germplasm. Questions to answer were the location of the genomic regions determining response to selection and the origin of the favourable alleles within the pedigree. We used data from a pre-breeding program that aimed at pyramiding different resistance sources to Fusarium crown rot into elite (but susceptible) wheat backgrounds. The population resulted from a complex backcrossing scheme involving multiple resistance donors and multiple elite backgrounds, akin to a MAGIC population (985 genotypes in total, with founders, and two major offspring layers within the pedigree). A significant increase in the resistance level was observed (i.e. a positive response to selection) after the selection process, and 17 regions significantly associated with that response were identified using a GWAS approach. Those regions included known QTL as well as potentially novel regions contributing resistance to Fusarium crown rot. In addition, we were able to trace back the sources of the favourable alleles for each QTL. We demonstrate that QTL detection using breeding populations under selection for the target trait can identify QTL controlling the target trait and that the frequency of the favourable alleles was increased as a response to selection, thereby validating the QTL detected. This is a valuable opportunistic approach that can provide QTL information that is more easily transferred to breeding applications.
Collapse
Affiliation(s)
- Marcos Malosetti
- Mathematical and Statistical Methods (Biometris), Wageningen University and Research, Wageningen, The Netherlands
| | - Laura B Zwep
- Mathematical and Statistical Methods (Biometris), Wageningen University and Research, Wageningen, The Netherlands
- Mathematical Institute, Leiden University, Leiden, The Netherlands
| | - Kerrie Forrest
- Agriculture Victoria Research, Agribio, Bundoora, Melbourne, VIC, 3083, Australia
| | - Fred A van Eeuwijk
- Mathematical and Statistical Methods (Biometris), Wageningen University and Research, Wageningen, The Netherlands
| | - Mark Dieters
- School of Agriculture and Food Sciences, Faculty of Science, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
4
|
Finke K, Kourakos M, Brown G, Dang HT, Tan SJS, Simons YB, Ramdas S, Schäffer AA, Kember RL, Bućan M, Mathieson S. Ancestral haplotype reconstruction in endogamous populations using identity-by-descent. PLoS Comput Biol 2021; 17:e1008638. [PMID: 33635861 PMCID: PMC7946327 DOI: 10.1371/journal.pcbi.1008638] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 03/10/2021] [Accepted: 12/15/2020] [Indexed: 12/24/2022] Open
Abstract
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs. We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families. When analyzing complex heritable traits, genomic data from many generations of an extended family increases the amount of information available for statistical inference. However, typically only genomic data from the recent generations of a pedigree are available, as ancestral individuals are deceased. In this work we present an algorithm, called thread, for reconstructing the genomes of ancestral individuals, given a complex pedigree and genomic data from the recent generations. Previous approaches have not been able to accommodate large datasets (both in terms of sites and individuals), made simplifying assumptions about pedigree structure, or did not tie reconstructed sequences back to specific individuals. We apply thread to a complex Old Order Amish pedigree of 1338 individuals, 394 with genotype data.
Collapse
Affiliation(s)
- Kelly Finke
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Michael Kourakos
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Gabriela Brown
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Huyen Trang Dang
- Department of Computer Science, Bryn Mawr College, Bryn Mawr, Pennsylvania, United States of America
| | - Shi Jie Samuel Tan
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
| | - Yuval B. Simons
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Shweta Ramdas
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Alejandro A. Schäffer
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Rachel L. Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Maja Bućan
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sara Mathieson
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
5
|
Scott MF, Ladejobi O, Amer S, Bentley AR, Biernaskie J, Boden SA, Clark M, Dell'Acqua M, Dixon LE, Filippi CV, Fradgley N, Gardner KA, Mackay IJ, O'Sullivan D, Percival-Alwyn L, Roorkiwal M, Singh RK, Thudi M, Varshney RK, Venturini L, Whan A, Cockram J, Mott R. Multi-parent populations in crops: a toolbox integrating genomics and genetic mapping with breeding. Heredity (Edinb) 2020; 125:396-416. [PMID: 32616877 PMCID: PMC7784848 DOI: 10.1038/s41437-020-0336-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/16/2020] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Crop populations derived from experimental crosses enable the genetic dissection of complex traits and support modern plant breeding. Among these, multi-parent populations now play a central role. By mixing and recombining the genomes of multiple founders, multi-parent populations combine many commonly sought beneficial properties of genetic mapping populations. For example, they have high power and resolution for mapping quantitative trait loci, high genetic diversity and minimal population structure. Many multi-parent populations have been constructed in crop species, and their inbred germplasm and associated phenotypic and genotypic data serve as enduring resources. Their utility has grown from being a tool for mapping quantitative trait loci to a means of providing germplasm for breeding programmes. Genomics approaches, including de novo genome assemblies and gene annotations for the population founders, have allowed the imputation of rich sequence information into the descendent population, expanding the breadth of research and breeding applications of multi-parent populations. Here, we report recent successes from crop multi-parent populations in crops. We also propose an ideal genotypic, phenotypic and germplasm 'package' that multi-parent populations should feature to optimise their use as powerful community resources for crop research, development and breeding.
Collapse
Affiliation(s)
| | | | - Samer Amer
- University of Reading, Reading, RG6 6AH, UK
- Faculty of Agriculture, Alexandria University, Alexandria, 23714, Egypt
| | - Alison R Bentley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Jay Biernaskie
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Scott A Boden
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | | | | | - Laura E Dixon
- Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Carla V Filippi
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-CONICET, Nicolas Repetto y Los Reseros s/n, 1686, Hurlingham, Buenos Aires, Argentina
| | - Nick Fradgley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Keith A Gardner
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Ian J Mackay
- SRUC, West Mains Road, Kings Buildings, Edinburgh, EH9 3JG, UK
| | | | | | - Manish Roorkiwal
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rakesh Kumar Singh
- International Center for Biosaline Agriculture, Academic City, Dubai, United Arab Emirates
| | - Mahendar Thudi
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rajeev Kumar Varshney
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | | | - Alex Whan
- CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| | - James Cockram
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Richard Mott
- UCL Genetics Institute, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
6
|
Abstract
The Collaborative Cross (CC) is a mouse genetic reference population whose range of applications includes quantitative trait loci (QTL) mapping. The design of a CC QTL mapping study involves multiple decisions, including which and how many strains to use, and how many replicates per strain to phenotype, all viewed within the context of hypothesized QTL architecture. Until now, these decisions have been informed largely by early power analyses that were based on simulated, hypothetical CC genomes. Now that more than 50 CC strains are available and more than 70 CC genomes have been observed, it is possible to characterize power based on realized CC genomes. We report power analyses from extensive simulations and examine several key considerations: 1) the number of strains and biological replicates, 2) the QTL effect size, 3) the presence of population structure, and 4) the distribution of functionally distinct alleles among the founder strains at the QTL. We also provide general power estimates to aide in the design of future experiments. All analyses were conducted with our R package, SPARCC (Simulated Power Analysis in the Realized Collaborative Cross), developed for performing either large scale power analyses or those tailored to particular CC experiments.
Collapse
|
7
|
Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree. G3-GENES GENOMES GENETICS 2018; 8:3231-3245. [PMID: 30068523 PMCID: PMC6169389 DOI: 10.1534/g3.118.200340] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The study of gene flow in pedigrees is of strong interest for the development of quantitative trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for modeling ancestral origins along two homologous chromosomes within individuals in fixed pedigrees. A highly beneficial property of our method is that the size of state space depends linearly or quadratically on the number of pedigree founders, whereas this increases exponentially with pedigree size in alternative methods. To calculate the parameter values of the Markov process, we describe two novel recursive algorithms that differ with respect to the pedigree founders being assumed to be exchangeable or not. Our algorithms apply equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the accuracy of the algorithms by a million simulations on a pedigree. We demonstrated two applications of the recursive algorithms in multiparental populations: design a breeding scheme for maximizing the overall density of recombination breakpoints and thus the QTL mapping resolution, and incorporate pedigree information into hidden Markov models in ancestral inference from genotypic data; the conditional probabilities and the recombination breakpoint data resulting from ancestral inference can facilitate follow-up QTL mapping. The results show that the generality of the recursive algorithms can greatly increase the application range of genetic analysis such as ancestral inference in multiparental populations.
Collapse
|
8
|
Male Infertility Is Responsible for Nearly Half of the Extinction Observed in the Mouse Collaborative Cross. Genetics 2017; 206:557-572. [PMID: 28592496 DOI: 10.1534/genetics.116.199596] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 03/09/2017] [Indexed: 11/18/2022] Open
Abstract
The goal of the Collaborative Cross (CC) project was to generate and distribute over 1000 independent mouse recombinant inbred strains derived from eight inbred founders. With inbreeding nearly complete, we estimated the extinction rate among CC lines at a remarkable 95%, which is substantially higher than in the derivation of other mouse recombinant inbred populations. Here, we report genome-wide allele frequencies in 347 extinct CC lines. Contrary to expectations, autosomes had equal allelic contributions from the eight founders, but chromosome X had significantly lower allelic contributions from the two inbred founders with underrepresented subspecific origins (PWK/PhJ and CAST/EiJ). By comparing extinct CC lines to living CC strains, we conclude that a complex genetic architecture is driving extinction, and selection pressures are different on the autosomes and chromosome X Male infertility played a large role in extinction as 47% of extinct lines had males that were infertile. Males from extinct lines had high variability in reproductive organ size, low sperm counts, low sperm motility, and a high rate of vacuolization of seminiferous tubules. We performed QTL mapping and identified nine genomic regions associated with male fertility and reproductive phenotypes. Many of the allelic effects in the QTL were driven by the two founders with underrepresented subspecific origins, including a QTL on chromosome X for infertility that was driven by the PWK/PhJ haplotype. We also performed the first example of cross validation using complementary CC resources to verify the effect of sperm curvilinear velocity from the PWK/PhJ haplotype on chromosome 2 in an independent population across multiple generations. While selection typically constrains the examination of reproductive traits toward the more fertile alleles, the CC extinct lines provided a unique opportunity to study the genetic architecture of fertility in a widely genetically variable population. We hypothesize that incompatibilities between alleles with different subspecific origins is a key driver of infertility. These results help clarify the factors that drove strain extinction in the CC, reveal the genetic regions associated with poor fertility in the CC, and serve as a resource to further study mammalian infertility.
Collapse
|
9
|
Oreper D, Cai Y, Tarantino LM, de Villena FPM, Valdar W. Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders. G3 (BETHESDA, MD.) 2017; 7:1623-1630. [PMID: 28592645 PMCID: PMC5473744 DOI: 10.1534/g3.117.041491] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 03/20/2017] [Indexed: 02/07/2023]
Abstract
The Collaborative Cross (CC) is a panel of recently established multiparental recombinant inbred mouse strains. For the CC, as for any multiparental population (MPP), effective experimental design and analysis benefit from detailed knowledge of the genetic differences between strains. Such differences can be directly determined by sequencing, but until now whole-genome sequencing was not publicly available for individual CC strains. An alternative and complementary approach is to infer genetic differences by combining two pieces of information: probabilistic estimates of the CC haplotype mosaic from a custom genotyping array, and probabilistic variant calls from sequencing of the CC founders. The computation for this inference, especially when performed genome-wide, can be intricate and time-consuming, requiring the researcher to generate nontrivial and potentially error-prone scripts. To provide standardized, easy-to-access CC sequence information, we have developed the Inbred Strain Variant Database (ISVdb). The ISVdb provides, for all the exonic variants from the Sanger Institute mouse sequencing dataset, direct sequence information for CC founders and, critically, the imputed sequence information for CC strains. Notably, the ISVdb also: (1) provides predicted variant consequence metadata; (2) allows rapid simulation of F1 populations; and (3) preserves imputation uncertainty, which will allow imputed data to be refined in the future as additional sequencing and genotyping data are collected. The ISVdb information is housed in an SQL database and is easily accessible through a custom online interface (http://isvdb.unc.edu), reducing the analytic burden on any researcher using the CC.
Collapse
Affiliation(s)
- Daniel Oreper
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599-7265
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265
| | - Yanwei Cai
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599-7265
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265
| | - Lisa M Tarantino
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy University of North Carolina, Chapel Hill, North Carolina 27599-7265
| | - Fernando Pardo-Manuel de Villena
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265
- Lineberger Comprehensive Cancer Center
| | - William Valdar
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265
- Lineberger Comprehensive Cancer Center
| |
Collapse
|
10
|
X-Chromosome Control of Genome-Scale Recombination Rates in House Mice. Genetics 2017; 205:1649-1656. [PMID: 28159751 PMCID: PMC5378119 DOI: 10.1534/genetics.116.197533] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/24/2017] [Indexed: 12/19/2022] Open
Abstract
Sex differences in recombination are widespread in mammals, but the causes of this pattern are poorly understood. Previously, males from two interfertile subspecies of house mice, Mus musculus musculus and M. m. castaneus, were shown to exhibit a ∼30% difference in their global crossover frequencies. Much of this crossover rate divergence is explained by six autosomal loci and a large-effect locus on the X chromosome. Intriguingly, the allelic effects at this X-linked locus are transgressive, with the allele conferring increased crossover rate being transmitted by the low crossover rate M. m. castaneus parent. Despite the pronounced divergence between males, females from these subspecies exhibit similar crossover rates, raising the question of how recombination is genetically controlled in this sex. Here, I analyze publicly available genotype data from early generations of the Collaborative Cross, an eight-way panel of recombinant inbred strains, to estimate crossover frequencies in female mice with sex-chromosome genotypes of diverse subspecific origins. Consistent with the transgressive influence of the X chromosome in males, I show that females inheriting an M. m. castaneus X possess higher average crossover rates than females lacking the M. m. castaneus X chromosome. The differential inheritance of the X chromosome in males and females provides a simple genetic explanation for sex-limited evolution of this trait. Further, the presence of X-linked and autosomal crossover rate modifiers with antagonistic effects hints at an underlying genetic conflict fueled by selection for distinct crossover rate optima in males and females.
Collapse
|
11
|
Plethysmography Phenotype QTL in Mice Before and After Allergen Sensitization and Challenge. G3-GENES GENOMES GENETICS 2016; 6:2857-65. [PMID: 27449512 PMCID: PMC5015943 DOI: 10.1534/g3.116.032912] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Allergic asthma is common airway disease that is characterized in part by enhanced airway constriction in response to nonspecific stimuli. Genome-wide association studies have identified multiple loci associated with asthma risk in humans, but these studies have not accounted for gene-environment interactions, which are thought to be important factors in asthma. To identify quantitative trait loci (QTL) that regulate responses to a common human allergen, we applied a house dust mite mouse (HDM) model of allergic airway disease (AAD) to 146 incipient lines of the Collaborative Cross (CC) and the CC founder strains. We employed a longitudinal study design in which mice were phenotyped for response to the bronchoconstrictor methacholine both before and after HDM sensitization and challenge using whole body plethysmography (WBP). There was significant variation in methacholine responsiveness due to both strain and HDM treatment, as reflected by changes in the WBP parameter enhanced pause. We also found that distinct QTL regulate baseline [chromosome (Chr) 18] and post-HDM (Chr 19) methacholine responsiveness and that post-HDM airway responsiveness was correlated with other features of AAD. Finally, using invasive measurements of airway mechanics, we tested whether the Chr 19 QTL affects lung resistance per se using C57BL/6J mice and a consomic strain but found that QTL haplotype did not affect lung resistance. We conclude that aspects of baseline and allergen-induced methacholine responsiveness are associated with genetic variation, and that robust detection of airway resistance QTL in genetically diverse mice will be facilitated by direct measurement of airway mechanics.
Collapse
|
12
|
Probabilistic Multilocus Haplotype Reconstruction in Outcrossing Tetraploids. Genetics 2016; 203:119-31. [PMID: 26920758 DOI: 10.1534/genetics.115.185579] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 02/22/2016] [Indexed: 01/29/2023] Open
Abstract
For both plant (e.g., potato) and animal (e.g., salmon) species, unveiling the genetic architecture of complex traits is key to the genetic improvement of polyploids in agriculture. F1 progenies of a biparental cross are often used for quantitative trait loci (QTL) mapping in outcrossing polyploids, where haplotype reconstruction by identifying the parental origins of marker alleles is necessary. In this paper, we build a novel and integrated statistical framework for multilocus haplotype reconstruction in a full-sib tetraploid family from biallelic marker dosage data collected from single-nucleotide polymorphism (SNP) arrays or next-generation sequencing technology given a genetic linkage map. Compared to diploids, in tetraploids, additional complexity needs to be addressed, including double reduction and possible preferential pairing of chromosomes. We divide haplotype reconstruction into two stages: parental linkage phasing for reconstructing the most probable parental haplotypes and ancestral inference for probabilistically reconstructing the offspring haplotypes conditional on the reconstructed parental haplotypes. The simulation studies and the application to real data from potato show that the parental linkage phasing is robust to, and that the subsequent ancestral inference is accurate for, complex chromosome pairing behaviors during meiosis, various marker segregation types, erroneous genetic maps except for long-range disturbances of marker ordering, various amounts of offspring dosage errors (up to ∼20%), and various fractions of missing data in parents and offspring dosages.
Collapse
|
13
|
Didion JP, Morgan AP, Yadgary L, Bell TA, McMullan RC, Ortiz de Solorzano L, Britton-Davidian J, Bult CJ, Campbell KJ, Castiglia R, Ching YH, Chunco AJ, Crowley JJ, Chesler EJ, Förster DW, French JE, Gabriel SI, Gatti DM, Garland T, Giagia-Athanasopoulou EB, Giménez MD, Grize SA, Gündüz İ, Holmes A, Hauffe HC, Herman JS, Holt JM, Hua K, Jolley WJ, Lindholm AK, López-Fuster MJ, Mitsainas G, da Luz Mathias M, McMillan L, Ramalhinho MDGM, Rehermann B, Rosshart SP, Searle JB, Shiao MS, Solano E, Svenson KL, Thomas-Laemont P, Threadgill DW, Ventura J, Weinstock GM, Pomp D, Churchill GA, Pardo-Manuel de Villena F. R2d2 Drives Selfish Sweeps in the House Mouse. Mol Biol Evol 2016; 33:1381-95. [PMID: 26882987 PMCID: PMC4868115 DOI: 10.1093/molbev/msw036] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
A selective sweep is the result of strong positive selection driving newly occurring or standing genetic variants to fixation, and can dramatically alter the pattern and distribution of allelic diversity in a population. Population-level sequencing data have enabled discoveries of selective sweeps associated with genes involved in recent adaptations in many species. In contrast, much debate but little evidence addresses whether “selfish” genes are capable of fixation—thereby leaving signatures identical to classical selective sweeps—despite being neutral or deleterious to organismal fitness. We previously described R2d2, a large copy-number variant that causes nonrandom segregation of mouse Chromosome 2 in females due to meiotic drive. Here we show population-genetic data consistent with a selfish sweep driven by alleles of R2d2 with high copy number (R2d2HC) in natural populations. We replicate this finding in multiple closed breeding populations from six outbred backgrounds segregating for R2d2 alleles. We find that R2d2HC rapidly increases in frequency, and in most cases becomes fixed in significantly fewer generations than can be explained by genetic drift. R2d2HC is also associated with significantly reduced litter sizes in heterozygous mothers, making it a true selfish allele. Our data provide direct evidence of populations actively undergoing selfish sweeps, and demonstrate that meiotic drive can rapidly alter the genomic landscape in favor of mutations with neutral or even negative effects on overall Darwinian fitness. Further study will reveal the incidence of selfish sweeps, and will elucidate the relative contributions of selfish genes, adaptation and genetic drift to evolution.
Collapse
Affiliation(s)
- John P Didion
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Andrew P Morgan
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Liran Yadgary
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Timothy A Bell
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Rachel C McMullan
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Lydia Ortiz de Solorzano
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | - Janice Britton-Davidian
- Institut des Sciences de l'Evolution, Université De Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | | | - Karl J Campbell
- Island Conservation, Puerto Ayora, Galápagos Island, Ecuador School of Geography, Planning & Environmental Management, The University of Queensland, St Lucia, QLD, Australia
| | - Riccardo Castiglia
- Department of Biology and Biotechnologies "Charles Darwin", University of Rome "La Sapienza", Rome, Italy
| | - Yung-Hao Ching
- Department of Molecular Biology and Human Genetics, Tzu Chi University, Hualien City, Taiwan
| | | | - James J Crowley
- Department of Genetics, The University of North Carolina at Chapel Hill
| | | | - Daniel W Förster
- Department of Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Berlin, Germany
| | - John E French
- National Toxicology Program, National Institute of Environmental Sciences, NIH, Research Triangle Park, NC
| | - Sofia I Gabriel
- Department of Animal Biology & CESAM - Centre for Environmental and Marine Studies, Faculty of Sciences, University of Lisbon, Lisboa, Portugal
| | | | | | | | - Mabel D Giménez
- Instituto de Biología Subtropical, CONICET - Universidad Nacional de Misiones, Posadas, Misiones, Argentina
| | - Sofia A Grize
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - İslam Gündüz
- Department of Biology, Faculty of Arts and Sciences, University of Ondokuz Mayis, Samsun, Turkey
| | - Andrew Holmes
- Laboratory of Behavioral and Genomic Neuroscience, National Institute on Alcohol Abuse and Alcoholism, NIH, Bethesda, MD
| | - Heidi C Hauffe
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, San Michele All'adige, TN, Italy
| | - Jeremy S Herman
- Department of Natural Sciences, National Museums Scotland, Edinburgh, United Kingdom
| | - James M Holt
- Department of Computer Science, The University of North Carolina at Chapel Hill
| | - Kunjie Hua
- Department of Genetics, The University of North Carolina at Chapel Hill
| | | | - Anna K Lindholm
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | - George Mitsainas
- Section of Animal Biology, Department of Biology, University of Patras, Patras, Greece
| | - Maria da Luz Mathias
- Department of Animal Biology & CESAM - Centre for Environmental and Marine Studies, Faculty of Sciences, University of Lisbon, Lisboa, Portugal
| | - Leonard McMillan
- Department of Computer Science, The University of North Carolina at Chapel Hill
| | - Maria da Graça Morgado Ramalhinho
- Department of Animal Biology & CESAM - Centre for Environmental and Marine Studies, Faculty of Sciences, University of Lisbon, Lisboa, Portugal
| | - Barbara Rehermann
- Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD
| | - Stephan P Rosshart
- Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD
| | - Jeremy B Searle
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY
| | - Meng-Shin Shiao
- Research Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Emanuela Solano
- Department of Biology and Biotechnologies "Charles Darwin", University of Rome "La Sapienza", Rome, Italy
| | | | | | - David W Threadgill
- Department of Veterinary Pathobiology, Texas A&M University, College Station Department of Molecular and Cellular Medicine, Texas A&M University, College Station
| | - Jacint Ventura
- Departament de Biologia Animal, de Biologia Vegetal y de Ecologia, Facultat de Biociències, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Daniel Pomp
- Department of Genetics, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| | | | - Fernando Pardo-Manuel de Villena
- Department of Genetics, The University of North Carolina at Chapel Hill Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill Carolina Center for Genome Science, The University of North Carolina at Chapel Hill
| |
Collapse
|
14
|
Rutledge H, Baran-Gale J, de Villena FPM, Chesler EJ, Churchill GA, Sethupathy P, Kelada SNP. Identification of microRNAs associated with allergic airway disease using a genetically diverse mouse population. BMC Genomics 2015; 16:633. [PMID: 26303911 PMCID: PMC4548451 DOI: 10.1186/s12864-015-1732-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 06/29/2015] [Indexed: 12/17/2022] Open
Abstract
Background Allergic airway diseases (AADs) such as asthma are characterized in part by granulocytic airway inflammation. The gene regulatory networks that govern granulocyte recruitment are poorly understood, but evidence is accruing that microRNAs (miRNAs) play an important role. To identify miRNAs that may underlie AADs, we used two complementary approaches that leveraged the genotypic and phenotypic diversity of the Collaborative Cross (CC) mouse population. In the first approach, we sought to identify miRNA expression quantitative trait loci (eQTL) that overlap QTL for AAD-related phenotypes. Specifically, CC founder strains and incipient lines of the CC were sensitized and challenged with house dust mite allergen followed by measurement of granulocyte recruitment to the lung. Total lung RNA was isolated and miRNA was measured using arrays for CC founders and qRT-PCR for incipient CC lines. Results Among CC founders, 92 miRNAs were differentially expressed. We measured the expression of 40 of the most highly expressed of these 92 miRNAs in the incipient lines of the CC and identified 18 eQTL corresponding to 14 different miRNAs. Surprisingly, half of these eQTL were distal to the corresponding miRNAs, and even on different chromosomes. One of the largest-effect local miRNA eQTL was for miR-342-3p, for which we identified putative causal variants by bioinformatic analysis of the effects of single nucleotide polymorphisms on RNA structure. None of the miRNA eQTL co-localized with QTL for eosinophil or neutrophil recruitment. In the second approach, we constructed putative miRNA/mRNA regulatory networks and identified three miRNAs (miR-497, miR-351 and miR-31) as candidate master regulators of genes associated with neutrophil recruitment. Analysis of a dataset from human keratinocytes transfected with a miR-31 inhibitor revealed two target genes in common with miR-31 targets correlated with neutrophils, namely Oxsr1 and Nsf. Conclusions miRNA expression in the allergically inflamed murine lung is regulated by genetic loci that are smaller in effect size compared to mRNA eQTL and often act in trans. Thus our results indicate that the genetic architecture of miRNA expression is different from mRNA expression. We identified three miRNAs, miR-497, miR-351 and miR-31, that are candidate master regulators of genes associated with neutrophil recruitment. Because miR-31 is expressed in airway epithelia and is predicted to target genes with known links to neutrophilic inflammation, we suggest that miR-31 is a potentially novel regulator of airway inflammation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1732-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Holly Rutledge
- Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA.
| | - Jeanette Baran-Gale
- Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA. .,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA.
| | - Fernando Pardo-Manuel de Villena
- Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA. .,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA. .,Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, NC, USA. .,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | - Praveen Sethupathy
- Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA. .,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA. .,Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, NC, USA. .,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| | - Samir N P Kelada
- Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA. .,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA. .,Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, NC, USA. .,Marsico Lung Institute, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
15
|
Abstract
The next generation of QTL (quantitative trait loci) mapping populations have been designed with multiple founders, where one to a number of generations of intercrossing are introduced prior to the inbreeding phase to increase accumulated recombinations and thus mapping resolution. Examples of such populations are Collaborative Cross (CC) in mice and Multiparent Advanced Generation Inter-Cross (MAGIC) lines in Arabidopsis. The genomes of the produced inbred lines are fine-grained random mosaics of the founder genomes. In this article, we present a novel framework for modeling ancestral origin processes along two homologous autosomal chromosomes from mapping populations, which is a major component in the reconstruction of the ancestral origins of each line for QTL mapping. We construct a general continuous time Markov model for ancestral origin processes, where the rate matrix is deduced from the expected densities of various types of junctions (recombination breakpoints). The model can be applied to monoecious populations with or without self-fertilizations and to dioecious populations with two separate sexes. The analytic expressions for map expansions and expected junction densities are obtained for mapping populations that have stage-wise constant mating schemes, such as CC and MAGIC. Our studies on the breeding design of MAGIC populations show that the intercross mating schemes do not matter much for large population size and that the overall expected junction density, and thus map resolution, are approximately proportional to the inverse of the number of founders.
Collapse
|
16
|
Abstract
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.
Collapse
|
17
|
Morgan AP, Welsh CE. Informatics resources for the Collaborative Cross and related mouse populations. Mamm Genome 2015; 26:521-39. [PMID: 26135136 DOI: 10.1007/s00335-015-9581-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 06/23/2015] [Indexed: 02/05/2023]
Affiliation(s)
- Andrew P Morgan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Catherine E Welsh
- Department of Mathematics & Computer Science, Rhodes College, Memphis, TN, USA.
| |
Collapse
|
18
|
Reconstruction of Genome Ancestry Blocks in Multiparental Populations. Genetics 2015; 200:1073-87. [PMID: 26048018 DOI: 10.1534/genetics.115.177873] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 05/31/2015] [Indexed: 11/18/2022] Open
Abstract
We present a general hidden Markov model framework called R: econstructing A: ncestry B: locks BIT: by bit (RABBIT) for reconstructing genome ancestry blocks from single-nucleotide polymorphism (SNP) array data, a required step for quantitative trait locus (QTL) mapping. The framework can be applied to a wide range of mapping populations such as the Arabidopsis multiparent advanced generation intercross (MAGIC), the mouse Collaborative Cross (CC), and the diversity outcross (DO) for both autosomes and X chromosomes if they exist. The model underlying RABBIT accounts for the joint pattern of recombination breakpoints between two homologous chromosomes and missing data and allelic typing errors in the genotype data of both sampled individuals and founders. Studies on simulated data of the MAGIC and the CC and real data of the MAGIC, the DO, and the CC demonstrate that RABBIT is more robust and accurate in reconstructing recombination bin maps than some commonly used methods.
Collapse
|
19
|
Abstract
The models for the mosaic structure of an individual’s genome from multiparental populations have been developed primarily for autosomes, whereas X chromosomes receive very little attention. In this paper, we extend our previous approach to model ancestral origin processes along two X chromosomes in a mapping population, which is necessary for developing hidden Markov models in the reconstruction of ancestry blocks for X-linked quantitative trait locus mapping. The model accounts for the joint recombination pattern, the asymmetry between maternally and paternally derived X chromosomes, and the finiteness of population size. The model can be applied to various mapping populations such as the advanced intercross lines (AIL), the Collaborative Cross (CC), the heterogeneous stock (HS), the Diversity Outcross (DO), and the Drosophila synthetic population resource (DSPR). We further derive the map expansion, density (per Morgan) of recombination breakpoints, in advanced intercross populations with L inbred founders under the limit of an infinitely large population size. The analytic results show that for X chromosomes the genetic map expands linearly at a rate (per generation) of two-thirds times 1 – 10/(9L) for the AIL, and at a rate of two-thirds times 1 – 1/L for the DO and the HS, whereas for autosomes the map expands at a rate of 1 – 1/L for the AIL, the DO, and the HS.
Collapse
|
20
|
Didion JP, Morgan AP, Clayshulte AMF, Mcmullan RC, Yadgary L, Petkov PM, Bell TA, Gatti DM, Crowley JJ, Hua K, Aylor DL, Bai L, Calaway M, Chesler EJ, French JE, Geiger TR, Gooch TJ, Garland T, Harrill AH, Hunter K, McMillan L, Holt M, Miller DR, O'Brien DA, Paigen K, Pan W, Rowe LB, Shaw GD, Simecek P, Sullivan PF, Svenson KL, Weinstock GM, Threadgill DW, Pomp D, Churchill GA, Pardo-Manuel de Villena F. A multi-megabase copy number gain causes maternal transmission ratio distortion on mouse chromosome 2. PLoS Genet 2015; 11:e1004850. [PMID: 25679959 PMCID: PMC4334553 DOI: 10.1371/journal.pgen.1004850] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 10/24/2014] [Indexed: 12/29/2022] Open
Abstract
Significant departures from expected Mendelian inheritance ratios (transmission ratio distortion, TRD) are frequently observed in both experimental crosses and natural populations. TRD on mouse Chromosome (Chr) 2 has been reported in multiple experimental crosses, including the Collaborative Cross (CC). Among the eight CC founder inbred strains, we found that Chr 2 TRD was exclusive to females that were heterozygous for the WSB/EiJ allele within a 9.3 Mb region (Chr 2 76.9 - 86.2 Mb). A copy number gain of a 127 kb-long DNA segment (designated as responder to drive, R2d) emerged as the strongest candidate for the causative allele. We mapped R2d sequences to two loci within the candidate interval. R2d1 is located near the proximal boundary, and contains a single copy of R2d in all strains tested. R2d2 maps to a 900 kb interval, and the number of R2d copies varies from zero in classical strains (including the mouse reference genome) to more than 30 in wild-derived strains. Using real-time PCR assays for the copy number, we identified a mutation (R2d2WSBdel1) that eliminates the majority of the R2d2WSB copies without apparent alterations of the surrounding WSB/EiJ haplotype. In a three-generation pedigree segregating for R2d2WSBdel1, the mutation is transmitted to the progeny and Mendelian segregation is restored in females heterozygous for R2d2WSBdel1, thus providing direct evidence that the copy number gain is causal for maternal TRD. We found that transmission ratios in R2d2WSB heterozygous females vary between Mendelian segregation and complete distortion depending on the genetic background, and that TRD is under genetic control of unlinked distorter loci. Although the R2d2WSB transmission ratio was inversely correlated with average litter size, several independent lines of evidence support the contention that female meiotic drive is the cause of the distortion. We discuss the implications and potential applications of this novel meiotic drive system.
Collapse
Affiliation(s)
- John P. Didion
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Andrew P. Morgan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Amelia M.-F. Clayshulte
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Rachel C. Mcmullan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Liran Yadgary
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Petko M. Petkov
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Timothy A. Bell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Daniel M. Gatti
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - James J. Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Kunjie Hua
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - David L. Aylor
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Ling Bai
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Mark Calaway
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | | - John E. French
- National Toxicology Program, National Institute of Environmental Sciences, NIH, Research Triangle Park, North Carolina, United States of America
| | - Thomas R. Geiger
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Terry J. Gooch
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Theodore Garland
- Department of Biology, University of California Riverside, Riverside, California, United States of America
| | - Alison H. Harrill
- Department of Environmental and Occupational Health, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Kent Hunter
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Leonard McMillan
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Matt Holt
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Darla R. Miller
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Deborah A. O'Brien
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Kenneth Paigen
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Wenqi Pan
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lucy B. Rowe
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Ginger D. Shaw
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Petr Simecek
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Karen L Svenson
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - George M. Weinstock
- Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - David W. Threadgill
- Department of Veterinary Pathobiology and Department of Molecular and Cellular Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Daniel Pomp
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | | - Fernando Pardo-Manuel de Villena
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
21
|
Abstract
Allergic asthma is a complex disease characterized in part by granulocytic inflammation of the airways. In addition to eosinophils, neutrophils (PMN) are also present, particularly in cases of severe asthma. We sought to identify the genetic determinants of neutrophilic inflammation in a mouse model of house dust mite (HDM)-induced asthma. We applied an HDM model of allergic asthma to the eight founder strains of the Collaborative Cross (CC) and 151 incipient lines of the CC (preCC). Lung lavage fluid was analyzed for PMN count and the concentration of CXCL1, a hallmark PMN chemokine. PMN and CXCL1 were strongly correlated in preCC mice. We used quantitative trait locus (QTL) mapping to identify three variants affecting PMN, one of which colocalized with a QTL for CXCL1 on chromosome (Chr) 7. We used lung eQTL data to implicate a variant in the gene Zfp30 in the CXCL1/PMN response. This genetic variant regulates both CXCL1 and PMN by altering Zfp30 expression, and we model the relationships between the QTL and these three endophenotypes. We show that Zfp30 is expressed in airway epithelia in the normal mouse lung and that altering Zfp30 expression in vitro affects CXCL1 responses to an immune stimulus. Our results provide strong evidence that Zfp30 is a novel regulator of neutrophilic airway inflammation.
Collapse
|
22
|
Liu EY, Morgan AP, Chesler EJ, Wang W, Churchill GA, Pardo-Manuel de Villena F. High-resolution sex-specific linkage maps of the mouse reveal polarized distribution of crossovers in male germline. Genetics 2014; 197:91-106. [PMID: 24578350 PMCID: PMC4012503 DOI: 10.1534/genetics.114.161653] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 02/20/2014] [Indexed: 12/31/2022] Open
Abstract
Since the publication of the first comprehensive linkage map for the laboratory mouse, the architecture of recombination as a basic biological process has become amenable to investigation in mammalian model organisms. Here we take advantage of high-density genotyping and the unique pedigree structure of the incipient Collaborative Cross to investigate the roles of sex and genetic background in mammalian recombination. Our results confirm the observation that map length is longer when measured through female meiosis than through male meiosis, but we find that this difference is modified by genotype at loci on both the X chromosome and the autosomes. In addition, we report a striking concentration of crossovers in the distal ends of autosomes in male meiosis that is absent in female meiosis. The presence of this pattern in both single- and double-recombinant chromosomes, combined with the absence of a corresponding asymmetry in the distribution of double-strand breaks, indicates a regulated sequence of events specific to male meiosis that is anchored by chromosome ends. This pattern is consistent with the timing of chromosome pairing and evolutionary constraints on male recombination. Finally, we identify large regions of reduced crossover frequency that together encompass 5% of the genome. Many of these "cold regions" are enriched for segmental duplications, suggesting an inverse local correlation between recombination rate and mutation rate for large copy number variants.
Collapse
Affiliation(s)
- Eric Yi Liu
- Department of Computer Science, University of North Carolina, Chapel Hill, North Carolina 27599-3175
| | - Andrew P. Morgan
- Department of Genetics, Carolina Center for Genome Sciences and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599-7264
| | | | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, California 90095-1596
| | | | - Fernando Pardo-Manuel de Villena
- Department of Genetics, Carolina Center for Genome Sciences and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599-7264
| |
Collapse
|
23
|
Hitzemann R, Bottomly D, Iancu O, Buck K, Wilmot B, Mooney M, Searles R, Zheng C, Belknap J, Crabbe J, McWeeney S. The genetics of gene expression in complex mouse crosses as a tool to study the molecular underpinnings of behavior traits. Mamm Genome 2013; 25:12-22. [PMID: 24374554 PMCID: PMC3916704 DOI: 10.1007/s00335-013-9495-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 11/25/2013] [Indexed: 02/06/2023]
Abstract
Complex Mus musculus crosses provide increased resolution to examine the relationships between gene expression and behavior. While the advantages are clear, there are numerous analytical and technological concerns that arise from the increased genetic complexity that must be considered. Each of these issues is discussed, providing an initial framework for complex cross study design and planning.
Collapse
Affiliation(s)
- Robert Hitzemann
- Portland Alcohol Research Center, Veterans Affairs Medical Center, Portland, 97239, OR, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Illingworth CJR, Parts L, Bergström A, Liti G, Mustonen V. Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses. PLoS One 2013; 8:e62266. [PMID: 23658715 PMCID: PMC3642125 DOI: 10.1371/journal.pone.0062266] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 03/19/2013] [Indexed: 01/23/2023] Open
Abstract
Accurate estimates of recombination rates are of great importance for understanding evolution. In an experimental genetic cross, recombination breaks apart and rejoins genetic material, such that the genomes of the resulting isolates are comprised of distinct blocks of differing parental origin. We here describe a method exploiting this fact to infer genome-wide recombination profiles from sequenced isolates from an advanced intercross line (AIL). We verified the accuracy of the method against simulated data. Next, we sequenced 192 isolates from a twelve-generation cross between West African and North American yeast Saccharomyces cerevisiae strains and inferred the underlying recombination landscape at a fine genomic resolution (mean segregating site distance 0.22 kb). Comparison was made with landscapes inferred for a similar cross between four yeast strains, and with a previous single-generation, intra-strain cross (Mancera et al., Nature 2008). Moderate congruence was identified between landscapes (correlation 0.58-0.77 at 5 kb resolution), albeit with variance between mean genome-wide recombination rates. The multiple generations of mating undergone in the AILs gave more precise inference of recombination rates than could be achieved from a single-generation cross, in particular in identifying recombination cold-spots. The recombination landscapes we describe have particular utility; both AILs are part of a resource to study complex yeast traits (see e.g. Parts et al., Genome Res 2011). Our results will enable future applications of this resource to take better account of local linkage structure heterogeneities. Our method has general applicability to other crossing experiments, including a variety of experimental designs.
Collapse
Affiliation(s)
| | - Leopold Parts
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Anders Bergström
- Institute of Research on Cancer and Ageing of Nice, Université de Nice Sophia Antipolis, Nice, France
| | - Gianni Liti
- Institute of Research on Cancer and Ageing of Nice, Université de Nice Sophia Antipolis, Nice, France
| | - Ville Mustonen
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
25
|
Abstract
The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multiparental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identified lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genome architecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly significant transmission ratio distortion at specific loci across all three populations. On chromosome 2, there is significant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large deficit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of fields will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals.
Collapse
|
26
|
Abstract
The mouse Collaborative Cross (CC) is a panel of eight-way recombinant inbred lines: eight diverse parental strains are intermated, followed by repeated sibling mating, many times in parallel, to create a new set of inbred lines whose genomes are random mosaics of the genomes of the original eight strains. Many generations are required to reach inbreeding, and so a number of investigators have sought to make use of phenotype and genotype data on mice from intermediate generations during the formation of the CC lines (so-called pre-CC mice). The development of a hidden Markov model for genotype reconstruction in such pre-CC mice, on the basis of incompletely informative genetic markers (such as single-nucleotide polymorphisms), formally requires the two-locus genotype probabilities at an arbitrary generation along the path to inbreeding. In this article, I describe my efforts to calculate such probabilities. While closed-form solutions for the two-locus genotype probabilities could not be derived, I provide a prescription for calculating such probabilities numerically. In addition, I present a number of useful quantities, including single-locus genotype probabilities, two-locus haplotype probabilities, and the fixation probability and map expansion at each generation along the course to inbreeding.
Collapse
|
27
|
|
28
|
Expression quantitative trait Loci for extreme host response to influenza a in pre-collaborative cross mice. G3-GENES GENOMES GENETICS 2012; 2:213-21. [PMID: 22384400 PMCID: PMC3284329 DOI: 10.1534/g3.111.001800] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 12/08/2011] [Indexed: 01/05/2023]
Abstract
Outbreaks of influenza occur on a yearly basis, causing a wide range of symptoms across the human population. Although evidence exists that the host response to influenza infection is influenced by genetic differences in the host, this has not been studied in a system with genetic diversity mirroring that of the human population. Here we used mice from 44 influenza-infected pre-Collaborative Cross lines determined to have extreme phenotypes with regard to the host response to influenza A virus infection. Global transcriptome profiling identified 2671 transcripts that were significantly differentially expressed between mice that showed a severe ("high") and mild ("low") response to infection. Expression quantitative trait loci mapping was performed on those transcripts that were differentially expressed because of differences in host response phenotype to identify putative regulatory regions potentially controlling their expression. Twenty-one significant expression quantitative trait loci were identified, which allowed direct examination of genes associated with regulation of host response to infection. To perform initial validation of our findings, quantitative polymerase chain reaction was performed in the infected founder strains, and we were able to confirm or partially confirm more than 70% of those tested. In addition, we explored putative causal and reactive (downstream) relationships between the significantly regulated genes and others in the high or low response groups using structural equation modeling. By using systems approaches and a genetically diverse population, we were able to develop a novel framework for identifying the underlying biological subnetworks under host genetic control during influenza virus infection.
Collapse
|
29
|
Genetic analysis of hematological parameters in incipient lines of the collaborative cross. G3-GENES GENOMES GENETICS 2012; 2:157-65. [PMID: 22384394 PMCID: PMC3284323 DOI: 10.1534/g3.111.001776] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 12/20/2011] [Indexed: 12/19/2022]
Abstract
Hematological parameters, including red and white blood cell counts and hemoglobin concentration, are widely used clinical indicators of health and disease. These traits are tightly regulated in healthy individuals and are under genetic control. Mutations in key genes that affect hematological parameters have important phenotypic consequences, including multiple variants that affect susceptibility to malarial disease. However, most variation in hematological traits is continuous and is presumably influenced by multiple loci and variants with small phenotypic effects. We used a newly developed mouse resource population, the Collaborative Cross (CC), to identify genetic determinants of hematological parameters. We surveyed the eight founder strains of the CC and performed a mapping study using 131 incipient lines of the CC. Genome scans identified quantitative trait loci for several hematological parameters, including mean red cell volume (Chr 7 and Chr 14), white blood cell count (Chr 18), percent neutrophils/lymphocytes (Chr 11), and monocyte number (Chr 1). We used evolutionary principles and unique bioinformatics resources to reduce the size of candidate intervals and to view functional variation in the context of phylogeny. Many quantitative trait loci regions could be narrowed sufficiently to identify a small number of promising candidate genes. This approach not only expands our knowledge about hematological traits but also demonstrates the unique ability of the CC to elucidate the genetic architecture of complex traits.
Collapse
|
30
|
Accelerating the inbreeding of multi-parental recombinant inbred lines generated by sibling matings. G3-GENES GENOMES GENETICS 2012; 2:191-8. [PMID: 22384397 PMCID: PMC3284326 DOI: 10.1534/g3.111.001784] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 11/06/2011] [Indexed: 11/26/2022]
Abstract
Inbred model organisms are powerful tools for genetic studies because they provide reproducible genomes for use in mapping and genetic manipulation. Generating inbred lines via sibling matings, however, is a costly undertaking that requires many successive generations of breeding, during which time many lines fail. We evaluated several approaches for accelerating inbreeding, including the systematic use of back-crosses and marker-assisted breeder selection, which we contrasted with randomized sib-matings. Using simulations, we explored several alternative breeder-selection methods and monitored the gain and loss of genetic diversity, measured by the number of recombination-induced founder intervals, as a function of generation. For each approach we simulated 100,000 independent lines to estimate distributions of generations to achieve full-fixation as well as to achieve a mean heterozygosity level equal to 20 generations of randomized sib-mating. Our analyses suggest that the number of generations to fully inbred status can be substantially reduced with minimal impact on genetic diversity through combinations of parental backcrossing and marker-assisted inbreeding. Although simulations do not consider all confounding factors underlying the inbreeding process, such as a loss of fecundity, our models suggest many viable alternatives for accelerating the inbreeding process.
Collapse
|
31
|
Abstract
The February 2012 issues of GENETICS and G3: Genes, Genomes, Genetics present a collection of articles reporting recent advances from the international Collaborative Cross (CC) project. The goal of the CC project is to develop a new resource that will enhance quantitative trait locus (QTL) and systems genetic analyses in mice. The CC consists of hundreds of independently bred, octo-parental recombinant inbred lines (Figure 1). The work reported in these issues represents progress toward completion of the CC, proof-of-principle experiments using incipient inbred CC mice, and new research areas and complementary resources facilitated by the CC project.
Collapse
|
32
|
Quantitative trait Loci association mapping by imputation of strain origins in multifounder crosses. Genetics 2011; 190:459-73. [PMID: 22143921 DOI: 10.1534/genetics.111.135095] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Although mapping quantitative traits in inbred strains is simpler than mapping the analogous traits in humans, classical inbred crosses suffer from reduced genetic diversity compared to experimental designs involving outbred animal populations. Multiple crosses, for example the Complex Trait Consortium's eight-way cross, circumvent these difficulties. However, complex mating schemes and systematic inbreeding raise substantial computational difficulties. Here we present a method for locally imputing the strain origins of each genotyped animal along its genome. Imputed origins then serve as mean effects in a multivariate Gaussian model for testing association between trait levels and local genomic variation. Imputation is a combinatorial process that assigns the maternal and paternal strain origin of each animal on the basis of observed genotypes and prior pedigree information. Without smoothing, imputation is likely to be ill-defined or jump erratically from one strain to another as an animal's genome is traversed. In practice, one expects to see long stretches where strain origins are invariant. Smoothing can be achieved by penalizing strain changes from one marker to the next. A dynamic programming algorithm then solves the strain imputation process in one quick pass through the genome of an animal. Imputation accuracy exceeds 99% in practical examples and leads to high-resolution mapping in simulated and real data. The previous fastest quantitative trait loci (QTL) mapping software for dense genome scans reduced compute times to hours. Our implementation further reduces compute times from hours to minutes with no loss in statistical power. Indeed, power is enhanced for full pedigree data.
Collapse
|
33
|
Aylor DL, Valdar W, Foulds-Mathes W, Buus RJ, Verdugo RA, Baric RS, Ferris MT, Frelinger JA, Heise M, Frieman MB, Gralinski LE, Bell TA, Didion JD, Hua K, Nehrenberg DL, Powell CL, Steigerwalt J, Xie Y, Kelada SNP, Collins FS, Yang IV, Schwartz DA, Branstetter LA, Chesler EJ, Miller DR, Spence J, Liu EY, McMillan L, Sarkar A, Wang J, Wang W, Zhang Q, Broman KW, Korstanje R, Durrant C, Mott R, Iraqi FA, Pomp D, Threadgill D, de Villena FPM, Churchill GA. Genetic analysis of complex traits in the emerging Collaborative Cross. Genome Res 2011; 21:1213-22. [PMID: 21406540 DOI: 10.1101/gr.111310.110] [Citation(s) in RCA: 260] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The Collaborative Cross (CC) is a mouse recombinant inbred strain panel that is being developed as a resource for mammalian systems genetics. Here we describe an experiment that uses partially inbred CC lines to evaluate the genetic properties and utility of this emerging resource. Genome-wide analysis of the incipient strains reveals high genetic diversity, balanced allele frequencies, and dense, evenly distributed recombination sites-all ideal qualities for a systems genetics resource. We map discrete, complex, and biomolecular traits and contrast two quantitative trait locus (QTL) mapping approaches. Analysis based on inferred haplotypes improves power, reduces false discovery, and provides information to identify and prioritize candidate genes that is unique to multifounder crosses like the CC. The number of expression QTLs discovered here exceeds all previous efforts at eQTL mapping in mice, and we map local eQTL at 1-Mb resolution. We demonstrate that the genetic diversity of the CC, which derives from random mixing of eight founder strains, results in high phenotypic diversity and enhances our ability to map causative loci underlying complex disease-related traits.
Collapse
Affiliation(s)
- David L Aylor
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Guzzetta G, Jurman G, Furlanello C. A machine learning pipeline for quantitative phenotype prediction from genotype data. BMC Bioinformatics 2010; 11 Suppl 8:S3. [PMID: 21034428 PMCID: PMC2966290 DOI: 10.1186/1471-2105-11-s8-s3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Quantitative phenotypes emerge everywhere in systems biology and biomedicine due to a direct interest for quantitative traits, or to high individual variability that makes hard or impossible to classify samples into distinct categories, often the case with complex common diseases. Machine learning approaches to genotype-phenotype mapping may significantly improve Genome-Wide Association Studies (GWAS) results by explicitly focusing on predictivity and optimal feature selection in a multivariate setting. It is however essential that stringent and well documented Data Analysis Protocols (DAP) are used to control sources of variability and ensure reproducibility of results. We present a genome-to-phenotype pipeline of machine learning modules for quantitative phenotype prediction. The pipeline can be applied for the direct use of whole-genome information in functional studies. As a realistic example, the problem of fitting complex phenotypic traits in heterogeneous stock mice from single nucleotide polymorphims (SNPs) is here considered. Methods The core element in the pipeline is the L1L2 regularization method based on the naïve elastic net. The method gives at the same time a regression model and a dimensionality reduction procedure suitable for correlated features. Model and SNP markers are selected through a DAP originally developed in the MAQC-II collaborative initiative of the U.S. FDA for the identification of clinical biomarkers from microarray data. The L1L2 approach is compared with standard Support Vector Regression (SVR) and with Recursive Jump Monte Carlo Markov Chain (MCMC). Algebraic indicators of stability of partial lists are used for model selection; the final panel of markers is obtained by a procedure at the chromosome scale, termed ’saturation’, to recover SNPs in Linkage Disequilibrium with those selected. Results With respect to both MCMC and SVR, comparable accuracies are obtained by the L1L2 pipeline. Good agreement is also found between SNPs selected by the L1L2 algorithms and candidate loci previously identified by a standard GWAS. The combination of L1L2-based feature selection with a saturation procedure tackles the issue of neglecting highly correlated features that affects many feature selection algorithms. Conclusions The L1L2 pipeline has proven effective in terms of marker selection and prediction accuracy. This study indicates that machine learning techniques may support quantitative phenotype prediction, provided that adequate DAPs are employed to control bias in model selection.
Collapse
|