1
|
Hyten DL. Genotyping Platforms for Genome-Wide Association Studies: Options and Practical Considerations. Methods Mol Biol 2022; 2481:29-42. [PMID: 35641757 DOI: 10.1007/978-1-0716-2237-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Genome-wide association studies (GWAS) in crops requires genotyping platforms that are capable of producing accurate high density genotyping data on hundreds of plants in a cost-effective manner. Currently there are multiple commercial platforms available that are being effectively used across crops. These platforms include genotyping arrays such as the Illumina Infinium arrays and the Applied Biosystems Axiom Arrays along with a variety of resequencing methods. These methods are being used to genotype tens of thousands of markers up to millions of markers on GWAS panels. They are being used on crops with simple genomes to crops with very complex, large, polyploid genomes. Depending on the crop and the goal of the GWAS, there are several options and practical considerations to take into account when selecting a genotyping technology to ensure that the right coverage, accuracy, and cost for the study is achieved.
Collapse
Affiliation(s)
- David L Hyten
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
2
|
Scott MF, Ladejobi O, Amer S, Bentley AR, Biernaskie J, Boden SA, Clark M, Dell'Acqua M, Dixon LE, Filippi CV, Fradgley N, Gardner KA, Mackay IJ, O'Sullivan D, Percival-Alwyn L, Roorkiwal M, Singh RK, Thudi M, Varshney RK, Venturini L, Whan A, Cockram J, Mott R. Multi-parent populations in crops: a toolbox integrating genomics and genetic mapping with breeding. Heredity (Edinb) 2020; 125:396-416. [PMID: 32616877 PMCID: PMC7784848 DOI: 10.1038/s41437-020-0336-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/16/2020] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Crop populations derived from experimental crosses enable the genetic dissection of complex traits and support modern plant breeding. Among these, multi-parent populations now play a central role. By mixing and recombining the genomes of multiple founders, multi-parent populations combine many commonly sought beneficial properties of genetic mapping populations. For example, they have high power and resolution for mapping quantitative trait loci, high genetic diversity and minimal population structure. Many multi-parent populations have been constructed in crop species, and their inbred germplasm and associated phenotypic and genotypic data serve as enduring resources. Their utility has grown from being a tool for mapping quantitative trait loci to a means of providing germplasm for breeding programmes. Genomics approaches, including de novo genome assemblies and gene annotations for the population founders, have allowed the imputation of rich sequence information into the descendent population, expanding the breadth of research and breeding applications of multi-parent populations. Here, we report recent successes from crop multi-parent populations in crops. We also propose an ideal genotypic, phenotypic and germplasm 'package' that multi-parent populations should feature to optimise their use as powerful community resources for crop research, development and breeding.
Collapse
Affiliation(s)
| | | | - Samer Amer
- University of Reading, Reading, RG6 6AH, UK
- Faculty of Agriculture, Alexandria University, Alexandria, 23714, Egypt
| | - Alison R Bentley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Jay Biernaskie
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Scott A Boden
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | | | | | - Laura E Dixon
- Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Carla V Filippi
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-CONICET, Nicolas Repetto y Los Reseros s/n, 1686, Hurlingham, Buenos Aires, Argentina
| | - Nick Fradgley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Keith A Gardner
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Ian J Mackay
- SRUC, West Mains Road, Kings Buildings, Edinburgh, EH9 3JG, UK
| | | | | | - Manish Roorkiwal
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rakesh Kumar Singh
- International Center for Biosaline Agriculture, Academic City, Dubai, United Arab Emirates
| | - Mahendar Thudi
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rajeev Kumar Varshney
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | | | - Alex Whan
- CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| | - James Cockram
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Richard Mott
- UCL Genetics Institute, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
3
|
Can H, Kal U, Ozyigit II, Paksoy M, Turkmen O. Construction, characteristics and high throughput molecular screening methodologies in some special breeding populations: a horticultural perspective. J Genet 2019; 98:86. [PMID: 31544799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Advanced marker technologies are widely used for evaluation of genetic diversity in cultivated crops, wild ancestors, landraces or any special plant genotypes. Developing agricultural cultivars requires the following steps: (i) determining desired characteristics to be improved, (ii) screening genetic resources to help find a superior cultivar, (iii) intercrossing selected individuals, (iv) generating genetically hybrid populations and screening them for agro-morphological or molecular traits, (v) evaluating the superior cultivar candidates, (vi) testing field performance at different locations, and (vii) certifying. In the cultivar development process valuable genes can be identified by creating special biparental or multiparental populations and analysing their association using suitable markers in given populations. These special populations and advanced marker technologies give us a deeper knowledge about the inherited agronomic characteristics. Unaffected by the changing environmental conditions, these provide a higher understanding of genome dynamics in plants. The last decade witnessed new applications for advanced molecular techniques in the area of breeding,with low costs per sample. These, especially, include next-generation sequencing technologies like reduced representation genome sequencing (genotyping by sequencing, restriction site-associated DNA). These enabled researchers to develop new markers, such as simple sequence repeat and single- nucleotide polymorphism, for expanding the qualitative and quantitative information onpopulation dynamics. Thus, the knowledge acquired from novel technologies is a valuable asset for the breeding process and to better understand the population dynamics, their properties, and analysis methods.
Collapse
Affiliation(s)
- Hasan Can
- Faculty of Agriculture, Department of Field Crops and Horticulture, Kyrgyz-Turkish Manas University, Bishkek 720038, Kyrgyzstan.
| | | | | | | | | |
Collapse
|
4
|
Can H, Kal U, Ozyigit II, Paksoy M, Turkmen O. Construction, characteristics and high throughput molecular screening methodologies in some special breeding populations: a horticultural perspective. J Genet 2019. [DOI: 10.1007/s12041-019-1129-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
5
|
Zan Y, Payen T, Lillie M, Honaker CF, Siegel PB, Carlborg Ö. Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach. Genet Sel Evol 2019; 51:44. [PMID: 31412777 PMCID: PMC6694510 DOI: 10.1186/s12711-019-0487-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 08/07/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Experimental intercrosses between outbred founder populations are powerful resources for mapping loci that contribute to complex traits i.e. quantitative trait loci (QTL). Here, we present an approach and its accompanying software for high-resolution reconstruction of founder mosaic genotypes in the intercross offspring from such populations using whole-genome high-coverage sequence data on founder individuals (~ 30×) and very low-coverage sequence data on intercross individuals (< 0.5×). Sets of founder-line informative markers were selected for each full-sib family and used to infer the founder mosaic genotypes of the intercross individuals. The application of this approach and the quality of the estimated genome-wide genotypes are illustrated in a large F2 pedigree between two divergently selected lines of chickens. RESULTS We describe how we obtained whole-genome genotype data for hundreds of individuals in a cost- and time-efficient manner by using a Tn5-based library preparation protocol and an imputation algorithm that was optimized for this application. In total, 7.6 million markers segregated in this pedigree and, within each full-sib family, between 10.0 and 13.7% of these were fully informative, i.e. fixed for alternative alleles in the founders from the divergent lines, and were used for reconstruction of the offspring mosaic genotypes. The genotypes that were estimated based on the low-coverage sequence data were highly consistent (> 95% agreement) with those obtained using individual single nucleotide polymorphism (SNP) genotyping. The estimated resolution of the inferred recombination breakpoints was relatively high, with 50% of them being defined on regions shorter than 10 kb. CONCLUSIONS A method and software for inferring founder mosaic genotypes in intercross offspring from low-coverage whole-genome sequencing in pedigrees from heterozygous founders are described. They provide high-quality, high-resolution genotypes in a time- and cost-efficient manner. The software is freely available at https://github.com/CarlborgGenomics/Stripes .
Collapse
Affiliation(s)
- Yanjun Zan
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Thibaut Payen
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Mette Lillie
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Christa F Honaker
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Paul B Siegel
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Örjan Carlborg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
6
|
Happ MM, Wang H, Graef GL, Hyten DL. Generating High Density, Low Cost Genotype Data in Soybean [ Glycine max (L.) Merr.]. G3 (BETHESDA, MD.) 2019; 9:2153-2160. [PMID: 31072870 PMCID: PMC6643887 DOI: 10.1534/g3.119.400093] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022]
Abstract
Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK's Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean.
Collapse
Affiliation(s)
- Mary M Happ
- University of Nebraska-Lincoln, Lincoln, NE 68503
| | | | | | | |
Collapse
|
7
|
|
8
|
D'Agostino N, Taranto F, Camposeo S, Mangini G, Fanelli V, Gadaleta S, Miazzi MM, Pavan S, di Rienzo V, Sabetta W, Lombardo L, Zelasco S, Perri E, Lotti C, Ciani E, Montemurro C. GBS-derived SNP catalogue unveiled wide genetic variability and geographical relationships of Italian olive cultivars. Sci Rep 2018; 8:15877. [PMID: 30367101 PMCID: PMC6203791 DOI: 10.1038/s41598-018-34207-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 10/12/2018] [Indexed: 11/08/2022] Open
Abstract
Information on the distribution of genetic variation is essential to preserve olive germplasm from erosion and to recover alleles lost through selective breeding. In addition, knowledge on population structure and genotype-phenotype associations is crucial to support modern olive breeding programs that must respond to new environmental conditions imposed by climate change and novel biotic/abiotic stressors. To further our understanding of genetic variation in the olive, we performed genotype-by-sequencing on a panel of 94 Italian olive cultivars. A reference-based and a reference-independent SNP calling pipeline generated 22,088 and 8,088 high-quality SNPs, respectively. Both datasets were used to model population structure via parametric and non parametric clustering. Although the two pipelines yielded a 3-fold difference in the number of SNPs, both described wide genetic variability among our study panel and allowed individuals to be grouped based on fruit weight and the geographical area of cultivation. Multidimensional scaling analysis on identity-by-state allele-sharing values as well as inference of population mixtures from genome-wide allele frequency data corroborated the clustering pattern we observed. These findings allowed us to formulate hypotheses about geographical relationships of Italian olive cultivars and to confirm known and uncover novel cases of synonymy.
Collapse
Affiliation(s)
- Nunzio D'Agostino
- CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy.
| | - Francesca Taranto
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy.
| | - Salvatore Camposeo
- Department of Agricultural and Environmental sciences, University of Bari "Aldo Moro", Bari, Italy
| | - Giacomo Mangini
- Department of Soil, Plant and Food Sciences, University of Bari "Aldo Moro", Bari, Italy
| | - Valentina Fanelli
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy
| | - Susanna Gadaleta
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy
| | - Monica Marilena Miazzi
- Department of Soil, Plant and Food Sciences, University of Bari "Aldo Moro", Bari, Italy
| | - Stefano Pavan
- Department of Soil, Plant and Food Sciences, University of Bari "Aldo Moro", Bari, Italy
| | - Valentina di Rienzo
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy
| | - Wilma Sabetta
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy
| | - Luca Lombardo
- Center for Agriculture, Food ad Environment (C3A), University of Trento, San Michele all'Adige, Italy
| | - Samanta Zelasco
- CREA Research Centre for Olive, Citrus and Tree Fruit, Rende, Italy
| | - Enzo Perri
- CREA Research Centre for Olive, Citrus and Tree Fruit, Rende, Italy
| | - Concetta Lotti
- Department of the Sciences of Agriculture, Food and Environment, University of Foggia, Foggia, Italy
| | - Elena Ciani
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari "Aldo Moro", Bari, Italy
| | - Cinzia Montemurro
- SINAGRI S.r.l. - Spin Off of the University of Bari "Aldo Moro", Bari, Italy
- Department of Soil, Plant and Food Sciences, University of Bari "Aldo Moro", Bari, Italy
| |
Collapse
|
9
|
Zheng C, Boer MP, van Eeuwijk FA. Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence. Genetics 2018; 210:71-82. [PMID: 30045858 PMCID: PMC6116951 DOI: 10.1534/genetics.118.300885] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 07/21/2018] [Indexed: 11/18/2022] Open
Abstract
Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi- and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low ([Formula: see text]) sequencing depth, in addition to having accurate genotype phasing and error detection.
Collapse
Affiliation(s)
- Chaozhi Zheng
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Martin P Boer
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
10
|
Torkamaneh D, Boyle B, Belzile F. Efficient genome-wide genotyping strategies and data integration in crop plants. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:499-511. [PMID: 29352324 DOI: 10.1007/s00122-018-3056-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 01/12/2018] [Indexed: 05/21/2023]
Abstract
Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.
Collapse
Affiliation(s)
- Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec City, QC, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada.
| |
Collapse
|
11
|
Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations. Genetics 2018; 209:65-76. [PMID: 29487138 PMCID: PMC5937187 DOI: 10.1534/genetics.117.300627] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 02/25/2018] [Indexed: 01/06/2023] Open
Abstract
Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model.
Collapse
|
12
|
Approaches in Characterizing Genetic Structure and Mapping in a Rice Multiparental Population. G3-GENES GENOMES GENETICS 2017; 7:1721-1730. [PMID: 28592653 PMCID: PMC5473752 DOI: 10.1534/g3.117.042101] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Multi-parent Advanced Generation Intercross (MAGIC) populations are fast becoming mainstream tools for research and breeding, along with the technology and tools for analysis. This paper demonstrates the analysis of a rice MAGIC population from data filtering to imputation and processing of genetic data to characterizing genomic structure, and finally quantitative trait loci (QTL) mapping. In this study, 1316 S6:8 indica MAGIC (MI) lines and the eight founders were sequenced using Genotyping by Sequencing (GBS). As the GBS approach often includes missing data, the first step was to impute the missing SNPs. The observable number of recombinations in the population was then explored. Based on this case study, a general outline of procedures for a MAGIC analysis workflow is provided, as well as for QTL mapping of agronomic traits and biotic and abiotic stress, using the results from both association and interval mapping approaches. QTL for agronomic traits (yield, flowering time, and plant height), physical (grain length and grain width) and cooking properties (amylose content) of the rice grain, abiotic stress (submergence tolerance), and biotic stress (brown spot disease) were mapped. Through presenting this extensive analysis in the MI population in rice, we highlight important considerations when choosing analytical approaches. The methods and results reported in this paper will provide a guide to future genetic analysis methods applied to multi-parent populations.
Collapse
|
13
|
Gonen S, Ros-Freixedes R, Battagin M, Gorjanc G, Hickey JM. A method for the allocation of sequencing resources in genotyped livestock populations. Genet Sel Evol 2017; 49:47. [PMID: 28521728 PMCID: PMC5437657 DOI: 10.1186/s12711-017-0322-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 05/12/2017] [Indexed: 11/18/2022] Open
Abstract
Background This paper describes a method, called AlphaSeqOpt, for the allocation of sequencing resources in livestock populations with existing phased genomic data to maximise the ability to phase and impute sequenced haplotypes into the whole population. Methods We present two algorithms. The first selects focal individuals that collectively represent the maximum possible portion of the haplotype diversity in the population. The second allocates a fixed sequencing budget among the families of focal individuals to enable phasing of their haplotypes at the sequence level. We tested the performance of the two algorithms in simulated pedigrees. For each pedigree, we evaluated the proportion of population haplotypes that are carried by the focal individuals and compared our results to a variant of the widely-used key ancestors approach and to two haplotype-based approaches. We calculated the expected phasing accuracy of the haplotypes of a focal individual at the sequence level given the proportion of the fixed sequencing budget allocated to its family. Results AlphaSeqOpt maximises the ability to capture and phase the most frequent haplotypes in a population in three ways. First, it selects focal individuals that collectively represent a larger portion of the population haplotype diversity than existing methods. Second, it selects focal individuals from across the pedigree whose haplotypes can be easily phased using family-based phasing and imputation algorithms, thus maximises the ability to impute sequence into the rest of the population. Third, it allocates more of the fixed sequencing budget to focal individuals whose haplotypes are more frequent in the population than to focal individuals whose haplotypes are less frequent. Unlike existing methods, we additionally present an algorithm to allocate part of the sequencing budget to the families (i.e. immediate ancestors) of focal individuals to ensure that their haplotypes can be phased at the sequence level, which is essential for enabling and maximising subsequent sequence imputation. Conclusions We present a new method for the allocation of a fixed sequencing budget to focal individuals and their families such that the final sequenced haplotypes, when phased at the sequence level, represent the maximum possible portion of the haplotype diversity in the population that can be sequenced and phased at that budget. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0322-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Mara Battagin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
14
|
Fu YB, Peterson GW, Dong Y. Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes. G3 (BETHESDA, MD.) 2016; 6:845-56. [PMID: 26818077 PMCID: PMC4825655 DOI: 10.1534/g3.115.025775] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 01/22/2016] [Indexed: 12/15/2022]
Abstract
Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications.
Collapse
Affiliation(s)
- Yong-Bi Fu
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| | - Gregory W Peterson
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| | - Yibo Dong
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| |
Collapse
|
15
|
Kagale S, Koh C, Clarke WE, Bollina V, Parkin IAP, Sharpe AG. Analysis of Genotyping-by-Sequencing (GBS) Data. Methods Mol Biol 2016; 1374:269-284. [PMID: 26519412 DOI: 10.1007/978-1-4939-3167-5_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The development of genotyping-by-sequencing (GBS) to rapidly detect nucleotide variation at the whole genome level, in many individuals simultaneously, has provided a transformative genetic profiling technique. GBS can be carried out in species with or without reference genome sequences yields huge amounts of potentially informative data. One limitation with the approach is the paucity of tools to transform the raw data into a format that can be easily interrogated at the genetic level. In this chapter we describe bioinformatics tools developed to address this shortfall together with experimental design considerations to fully leverage the power of GBS for genetic analysis.
Collapse
Affiliation(s)
- Sateesh Kagale
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9
| | - Chushin Koh
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Venkatesh Bollina
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Andrew G Sharpe
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9.
| |
Collapse
|
16
|
Pootakham W, Sonthirod C, Naktang C, Jomchai N, Sangsrakru D, Tangphatsornruang S. Effects of methylation-sensitive enzymes on the enrichment of genic SNPs and the degree of genome complexity reduction in a two-enzyme genotyping-by-sequencing (GBS) approach: a case study in oil palm ( Elaeis guineensis). MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2016; 36:154. [PMID: 27942246 PMCID: PMC5104780 DOI: 10.1007/s11032-016-0572-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 10/20/2016] [Indexed: 05/08/2023]
Abstract
Advances in next generation sequencing have facilitated a large-scale single nucleotide polymorphism (SNP) discovery in many crop species. Genotyping-by-sequencing (GBS) approach couples next generation sequencing with genome complexity reduction techniques to simultaneously identify and genotype SNPs. Choice of enzymes used in GBS library preparation depends on several factors including the number of markers required, the desired level of multiplexing, and whether the enrichment of genic SNP is preferred. We evaluated various combinations of methylation-sensitive (AatII, PstI, MspI) and methylation-insensitive (SphI, MseI) enzymes for their effectiveness in genome complexity reduction and enrichment of genic SNPs. We discovered that the use of two methylation-sensitive enzymes effectively reduced genome complexity and did not require a size selection step. On the contrary, the genome coverage of libraries constructed with methylation-insensitive enzymes was quite high, and the additional size selection step may be required to increase the overall read depth. We also demonstrated the effectiveness of methylation-sensitive enzymes in enriching for SNPs located in genic regions. When two methylation-insensitive enzymes were used, only 16% of SNPs identified were located in genes and 18% in the vicinity (± 5 kb) of the genic regions, while most SNPs resided in the intergenic regions. In contrast, a remarkable degree of enrichment was observed when two methylation-sensitive enzymes were employed. Almost two thirds of the SNPs were located either inside (32-36%) or in the vicinity (28-31%) of the genic regions. These results provide useful information to help researchers choose appropriate GBS enzymes in oil palm and other crop species.
Collapse
Affiliation(s)
- Wirulda Pootakham
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| | - Chutima Sonthirod
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| | - Chaiwat Naktang
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| | - Nukoon Jomchai
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| | - Duangjai Sangsrakru
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| | - Sithichoke Tangphatsornruang
- National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Pathum Thani, 12120 Thailand
| |
Collapse
|
17
|
Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data. Genetics 2015; 202:487-95. [PMID: 26715670 DOI: 10.1534/genetics.115.182071] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 12/16/2015] [Indexed: 12/31/2022] Open
Abstract
Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.
Collapse
|
18
|
High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using Genotyping-by-Sequencing (GBS). Sci Rep 2015; 5:17512. [PMID: 26631981 PMCID: PMC4668357 DOI: 10.1038/srep17512] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 10/30/2015] [Indexed: 12/18/2022] Open
Abstract
This study reports the use of Genotyping-by-Sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of recombinant inbred lines (RILs) of an intra-specific mapping population of chickpea contrasting for seed traits. A total of 119,672 raw SNPs were discovered, which after stringent filtering revealed 3,977 high quality SNPs of which 39.5% were present in genic regions. Comparative analysis using physically mapped marker loci revealed a higher degree of synteny with Medicago in comparison to soybean. The SNP genotyping data was utilized to construct one of the most saturated intra-specific genetic linkage maps of chickpea having 3,363 mapped positions including 3,228 SNPs on 8 linkage groups spanning 1006.98 cM at an average inter marker distance of 0.33 cM. The map was utilized to identify 20 quantitative trait loci (QTLs) associated with seed traits accounting for phenotypic variations ranging from 9.97% to 29.71%. Analysis of the genomic sequence corresponding to five robust QTLs led to the identification of 684 putative candidate genes whose expression profiling revealed that 101 genes exhibited seed specific expression. The integrated approach utilizing the identified QTLs along with the available genome and transcriptome could serve as a platform for candidate gene identification for molecular breeding of chickpea.
Collapse
|
19
|
Bajaj D, Das S, Badoni S, Kumar V, Singh M, Bansal KC, Tyagi AK, Parida SK. Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea. Sci Rep 2015; 5:12468. [PMID: 26208313 PMCID: PMC4513697 DOI: 10.1038/srep12468] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 06/29/2015] [Indexed: 12/22/2022] Open
Abstract
We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66-85%) and broader natural allelic diversity (6-64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools.
Collapse
Affiliation(s)
- Deepak Bajaj
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Shouvik Das
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Saurabh Badoni
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Vinod Kumar
- National Research Centre on Plant Biotechnology (NRCPB), New Delhi-110012, India
| | - Mohar Singh
- National Bureau of Plant Genetic Resources (NBPGR), New Delhi-110012, India
| | - Kailash C. Bansal
- National Bureau of Plant Genetic Resources (NBPGR), New Delhi-110012, India
| | - Akhilesh K. Tyagi
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Swarup K. Parida
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| |
Collapse
|
20
|
Torkamaneh D, Belzile F. Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data. PLoS One 2015; 10:e0131533. [PMID: 26161900 PMCID: PMC4498655 DOI: 10.1371/journal.pone.0131533] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 06/03/2015] [Indexed: 01/07/2023] Open
Abstract
Genotyping-by-sequencing (GBS) represents a highly cost-effective high-throughput genotyping approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%), the resulting SNP datasets were of uniformly high accuracy (96-98%). We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K). We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%). Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis), 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K) to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at ≤20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more powerful genetic analyses.
Collapse
Affiliation(s)
- Davoud Torkamaneh
- Département de Phytologie and Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada
| | - Francois Belzile
- Département de Phytologie and Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada
| |
Collapse
|
21
|
Kujur A, Bajaj D, Upadhyaya HD, Das S, Ranjan R, Shree T, Saxena MS, Badoni S, Kumar V, Tripathi S, Gowda CLL, Sharma S, Singh S, Tyagi AK, Parida SK. A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea. Sci Rep 2015; 5:11166. [PMID: 26058368 PMCID: PMC4461920 DOI: 10.1038/srep11166] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 05/18/2015] [Indexed: 01/09/2023] Open
Abstract
We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23-47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea.
Collapse
Affiliation(s)
- Alice Kujur
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Deepak Bajaj
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Hari D Upadhyaya
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, Andhra Pradesh, India
| | - Shouvik Das
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Rajeev Ranjan
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Tanima Shree
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Maneesha S Saxena
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Saurabh Badoni
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Vinod Kumar
- National Research Centre on Plant Biotechnology (NRCPB), New Delhi 110012, India
| | - Shailesh Tripathi
- Division of Genetics, Indian Agricultural Research Institute (IARI), New Delhi 110012, India
| | - C L L Gowda
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, Andhra Pradesh, India
| | - Shivali Sharma
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, Andhra Pradesh, India
| | - Sube Singh
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, Andhra Pradesh, India
| | - Akhilesh K Tyagi
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Swarup K Parida
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| |
Collapse
|
22
|
Huang BE, Verbyla KL, Verbyla AP, Raghavan C, Singh VK, Gaur P, Leung H, Varshney RK, Cavanagh CR. MAGIC populations in crops: current status and future prospects. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:999-1017. [PMID: 25855139 DOI: 10.1007/s00122-015-2506-0] [Citation(s) in RCA: 129] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 03/20/2015] [Indexed: 05/20/2023]
Abstract
MAGIC populations present novel challenges and opportunities in crops due to their complex pedigree structure. They offer great potential both for dissecting genomic structure and for improving breeding populations. The past decade has seen the rise of multiparental populations as a study design offering great advantages for genetic studies in plants. The genetic diversity of multiple parents, recombined over several generations, generates a genetic resource population with large phenotypic diversity suitable for high-resolution trait mapping. While there are many variations on the general design, this review focuses on populations where the parents have all been inter-mated, typically termed Multi-parent Advanced Generation Intercrosses (MAGIC). Such populations have already been created in model animals and plants, and are emerging in many crop species. However, there has been little consideration of the full range of factors which create novel challenges for design and analysis in these populations. We will present brief descriptions of large MAGIC crop studies currently in progress to motivate discussion of population construction, efficient experimental design, and genetic analysis in these populations. In addition, we will highlight some recent achievements and discuss the opportunities and advantages to exploit the unique structure of these resources post-QTL analysis for gene discovery.
Collapse
Affiliation(s)
- B Emma Huang
- Digital Productivity and Agriculture Flagships, CSIRO, Dutton Park, QLD, 4102, Australia,
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Kujur A, Bajaj D, Upadhyaya HD, Das S, Ranjan R, Shree T, Saxena MS, Badoni S, Kumar V, Tripathi S, Gowda CLL, Sharma S, Singh S, Tyagi AK, Parida SK. Employing genome-wide SNP discovery and genotyping strategy to extrapolate the natural allelic diversity and domestication patterns in chickpea. FRONTIERS IN PLANT SCIENCE 2015; 6:162. [PMID: 25873920 PMCID: PMC4379880 DOI: 10.3389/fpls.2015.00162] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 03/01/2015] [Indexed: 05/19/2023]
Abstract
The genome-wide discovery and high-throughput genotyping of SNPs in chickpea natural germplasm lines is indispensable to extrapolate their natural allelic diversity, domestication, and linkage disequilibrium (LD) patterns leading to the genetic enhancement of this vital legume crop. We discovered 44,844 high-quality SNPs by sequencing of 93 diverse cultivated desi, kabuli, and wild chickpea accessions using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays that were physically mapped across eight chromosomes of desi and kabuli. Of these, 22,542 SNPs were structurally annotated in different coding and non-coding sequence components of genes. Genes with 3296 non-synonymous and 269 regulatory SNPs could functionally differentiate accessions based on their contrasting agronomic traits. A high experimental validation success rate (92%) and reproducibility (100%) along with strong sensitivity (93-96%) and specificity (99%) of GBS-based SNPs was observed. This infers the robustness of GBS as a high-throughput assay for rapid large-scale mining and genotyping of genome-wide SNPs in chickpea with sub-optimal use of resources. With 23,798 genome-wide SNPs, a relatively high intra-specific polymorphic potential (49.5%) and broader molecular diversity (13-89%)/functional allelic diversity (18-77%) was apparent among 93 chickpea accessions, suggesting their tremendous applicability in rapid selection of desirable diverse accessions/inter-specific hybrids in chickpea crossbred varietal improvement program. The genome-wide SNPs revealed complex admixed domestication pattern, extensive LD estimates (0.54-0.68) and extended LD decay (400-500 kb) in a structured population inclusive of 93 accessions. These findings reflect the utility of our identified SNPs for subsequent genome-wide association study (GWAS) and selective sweep-based domestication trait dissection analysis to identify potential genomic loci (gene-associated targets) specifically regulating important complex quantitative agronomic traits in chickpea. The numerous informative genome-wide SNPs, natural allelic diversity-led domestication pattern, and LD-based information generated in our study have got multidimensional applicability with respect to chickpea genomics-assisted breeding.
Collapse
Affiliation(s)
- Alice Kujur
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | - Deepak Bajaj
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | - Hari D. Upadhyaya
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Telangana, India
| | - Shouvik Das
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | - Rajeev Ranjan
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | - Tanima Shree
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | | | - Saurabh Badoni
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| | - Vinod Kumar
- National Research Centre on Plant Biotechnology (NRCPB)New Delhi, India
| | - Shailesh Tripathi
- Division of Genetics, Indian Agricultural Research Institute (IARI)New Delhi, India
| | - C. L. L. Gowda
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Telangana, India
| | - Shivali Sharma
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Telangana, India
| | - Sube Singh
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Telangana, India
| | | | - Swarup K. Parida
- National Institute of Plant Genome Research (NIPGR)New Delhi, India
| |
Collapse
|
24
|
Bhakta MS, Jones VA, Vallejos CE. Punctuated distribution of recombination hotspots and demarcation of pericentromeric regions in Phaseolus vulgaris L. PLoS One 2015; 10:e0116822. [PMID: 25629314 PMCID: PMC4309454 DOI: 10.1371/journal.pone.0116822] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 12/15/2014] [Indexed: 11/18/2022] Open
Abstract
High density genetic maps are a reliable tool for genetic dissection of complex plant traits. Mapping resolution is often hampered by the variable crossover and non-crossover events occurring across the genome, with pericentromeric regions (pCENR) showing highly suppressed recombination rates. The efficiency of linkage mapping can further be improved by characterizing and understanding the distribution of recombinational activity along individual chromosomes. In order to evaluate the genome wide recombination rate in common beans (Phaseolus vulgaris L.) we developed a SNP-based linkage map using the genotype-by-sequencing approach with a 188 recombinant inbred line family generated from an inter gene pool cross (Andean x Mesoamerican). We identified 1,112 SNPs that were subsequently used to construct a robust linkage map with 11 groups, comprising 513 recombinationally unique marker loci spanning 943 cM (LOD 3.0). Comparative analysis showed that the linkage map spanned >95% of the physical map, indicating that the map is almost saturated. Evaluation of genome-wide recombination rate indicated that at least 45% of the genome is highly recombinationally suppressed, and allowed us to estimate locations of pCENRs. We observed an average recombination rate of 0.25 cM/Mb in pCENRs as compared to the rest of genome that showed 3.72 cM/Mb. However, several hot spots of recombination were also detected with recombination rates reaching as high as 34 cM/Mb. Hotspots were mostly found towards the end of chromosomes, which also happened to be gene-rich regions. Analyzing relationships between linkage and physical map indicated a punctuated distribution of recombinational hot spots across the genome.
Collapse
Affiliation(s)
- Mehul S. Bhakta
- Horticultural Sciences Department, University of Florida, Gainesville, Florida, United States of America
| | - Valerie A. Jones
- Horticultural Sciences Department, University of Florida, Gainesville, Florida, United States of America
| | - C. Eduardo Vallejos
- Horticultural Sciences Department, University of Florida, Gainesville, Florida, United States of America
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, Florida, United States of America
- * E-mail:
| |
Collapse
|
25
|
Heffelfinger C, Fragoso CA, Moreno MA, Overton JD, Mottinger JP, Zhao H, Tohme J, Dellaporta SL. Flexible and scalable genotyping-by-sequencing strategies for population studies. BMC Genomics 2014. [PMID: 25406744 DOI: 10.1186/1471‐2164‐15‐979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many areas critical to agricultural production and research, such as the breeding and trait mapping in plants and livestock, require robust and scalable genotyping platforms. Genotyping-by-sequencing (GBS) is a one such method highly suited to non-human organisms. In the GBS protocol, genomic DNA is fractionated via restriction digest, then reduced representation is achieved through size selection. Since many restriction sites are conserved across a species, the sequenced portion of the genome is highly consistent within a population. This makes the GBS protocol highly suited for experiments that require surveying large numbers of markers within a population, such as those involving genetic mapping, breeding, and population genomics. We have modified the GBS technology in a number of ways. Custom, enzyme specific adaptors have been replaced with standard Illumina adaptors compatible with blunt-end restriction enzymes. Multiplexing is achieved through a dual barcoding system, and bead-based library preparation protocols allows for in-solution size selection and eliminates the need for columns and gels. RESULTS A panel of eight restriction enzymes was selected for testing on B73 maize and Nipponbare rice genomic DNA. Quality of the data was demonstrated by identifying that the vast majority of reads from each enzyme aligned to restriction sites predicted in silico. The link between enzyme parameters and experimental outcome was demonstrated by showing that the sequenced portion of the genome was adaptable by selecting enzymes based on motif length, complexity, and methylation sensitivity. The utility of the new GBS protocol was demonstrated by correctly mapping several in a maize F2 population resulting from a B73×Country Gentleman test cross. CONCLUSIONS This technology is readily adaptable to different genomes, highly amenable to multiplexing and compatible with over forty commercially available restriction enzymes. These advancements represent a major improvement in genotyping technology by providing a highly flexible and scalable GBS that is readily implemented for studies on genome-wide variation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Stephen L Dellaporta
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06511, USA.
| |
Collapse
|
26
|
Heffelfinger C, Fragoso CA, Moreno MA, Overton JD, Mottinger JP, Zhao H, Tohme J, Dellaporta SL. Flexible and scalable genotyping-by-sequencing strategies for population studies. BMC Genomics 2014; 15:979. [PMID: 25406744 PMCID: PMC4253001 DOI: 10.1186/1471-2164-15-979] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 10/23/2014] [Indexed: 12/19/2022] Open
Abstract
Background Many areas critical to agricultural production and research, such as the breeding and trait mapping in plants and livestock, require robust and scalable genotyping platforms. Genotyping-by-sequencing (GBS) is a one such method highly suited to non-human organisms. In the GBS protocol, genomic DNA is fractionated via restriction digest, then reduced representation is achieved through size selection. Since many restriction sites are conserved across a species, the sequenced portion of the genome is highly consistent within a population. This makes the GBS protocol highly suited for experiments that require surveying large numbers of markers within a population, such as those involving genetic mapping, breeding, and population genomics. We have modified the GBS technology in a number of ways. Custom, enzyme specific adaptors have been replaced with standard Illumina adaptors compatible with blunt-end restriction enzymes. Multiplexing is achieved through a dual barcoding system, and bead-based library preparation protocols allows for in-solution size selection and eliminates the need for columns and gels. Results A panel of eight restriction enzymes was selected for testing on B73 maize and Nipponbare rice genomic DNA. Quality of the data was demonstrated by identifying that the vast majority of reads from each enzyme aligned to restriction sites predicted in silico. The link between enzyme parameters and experimental outcome was demonstrated by showing that the sequenced portion of the genome was adaptable by selecting enzymes based on motif length, complexity, and methylation sensitivity. The utility of the new GBS protocol was demonstrated by correctly mapping several in a maize F2 population resulting from a B73 × Country Gentleman test cross. Conclusions This technology is readily adaptable to different genomes, highly amenable to multiplexing and compatible with over forty commercially available restriction enzymes. These advancements represent a major improvement in genotyping technology by providing a highly flexible and scalable GBS that is readily implemented for studies on genome-wide variation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-979) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Stephen L Dellaporta
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06511, USA.
| |
Collapse
|