1
|
Fernandes SB, Casstevens TM, Bradbury PJ, Lipka AE. A multi-trait multi-locus stepwise approach for conducting GWAS on correlated traits. Plant Genome 2022; 15:e20200. [PMID: 35307964 DOI: 10.1002/tpg2.20200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 02/14/2022] [Indexed: 06/14/2023]
Abstract
The ability to accurately quantify the simultaneous effect of multiple genomic loci on multiple traits is now possible due to current and emerging high-throughput genotyping and phenotyping technologies. To date, most efforts to quantify these genotype-to-phenotype relationships have focused on either multi-trait models that test a single marker at a time or multi-locus models that quantify associations with a single trait. Therefore, the purpose of this study was to compare the performance of a multi-trait, multi-locus stepwise (MSTEP) model selection procedure we developed to (a) a commonly used multi-trait single-locus model and (b) a univariate multi-locus model. We used real marker data in maize (Zea mays L.) and soybean (Glycine max L.) to simulate multiple traits controlled by various combinations of pleiotropic and nonpleiotropic quantitative trait nucleotides (QTNs). In general, we found that both multi-trait models outperformed the univariate multi-locus model, especially when analyzing a trait of low heritability. For traits controlled by either a combination of pleiotropic and nonpleiotropic QTNs or a large number of QTNs (i.e., 50), our MSTEP model often outperformed at least one of the two alternative models. When applied to the analysis of two tocochromanol-related traits in maize grain, MSTEP identified the same peak-associated marker that has been reported in a previous study. We therefore conclude that MSTEP is a useful addition to the suite of statistical models that are commonly used to gain insight into the genetic architecture of agronomically important traits.
Collapse
Affiliation(s)
- Samuel B Fernandes
- Dep. of Crop Sciences, Univ. of Illinois Urbana-Champaign, Urbana, IL, USA
| | | | | | - Alexander E Lipka
- Dep. of Crop Sciences, Univ. of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
2
|
Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, Elshire RJ, Acharya CB, Mitchell SE, Flint-Garcia SA, McMullen MD, Holland JB, Buckler ES, Gardner CA. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol 2013; 14:R55. [PMID: 23759205 PMCID: PMC3707059 DOI: 10.1186/gb-2013-14-6-r55] [Citation(s) in RCA: 308] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/30/2013] [Accepted: 06/11/2013] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Genotyping by sequencing, a new low-cost, high-throughput sequencing technology was used to genotype 2,815 maize inbred accessions, preserved mostly at the National Plant Germplasm System in the USA. The collection includes inbred lines from breeding programs all over the world. RESULTS The method produced 681,257 single-nucleotide polymorphism (SNP) markers distributed across the entire genome, with the ability to detect rare alleles at high confidence levels. More than half of the SNPs in the collection are rare. Although most rare alleles have been incorporated into public temperate breeding programs, only a modest amount of the available diversity is present in the commercial germplasm. Analysis of genetic distances shows population stratification, including a small number of large clusters centered on key lines. Nevertheless, an average fixation index of 0.06 indicates moderate differentiation between the three major maize subpopulations. Linkage disequilibrium (LD) decays very rapidly, but the extent of LD is highly dependent on the particular group of germplasm and region of the genome. The utility of these data for performing genome-wide association studies was tested with two simply inherited traits and one complex trait. We identified trait associations at SNPs very close to known candidate genes for kernel color, sweet corn, and flowering time; however, results suggest that more SNPs are needed to better explore the genetic architecture of complex traits. CONCLUSIONS The genotypic information described here allows this publicly available panel to be exploited by researchers facing the challenges of sustainable agriculture through better knowledge of the nature of genetic diversity.
Collapse
Affiliation(s)
- Maria C Romay
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Mark J Millard
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- North Central Regional Plant Introduction Station, Agronomy bldg., Department of Agronomy, Iowa State University, Ames, IA, 50001, USA
| | - Jeffrey C Glaubitz
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Jason A Peiffer
- Bioinformatics Research Center, Thomas Hall, North Carolina State University, Raleigh, NC, 27606, USA
| | - Kelly L Swarts
- Department of Plant Breeding and Genetics, Bradfield Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Terry M Casstevens
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Robert J Elshire
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Charlotte B Acharya
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Sharon E Mitchell
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
| | - Sherry A Flint-Garcia
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- Division of Plant Sciences, Curtis Hall, University of Missouri, Columbia, MO, 65211,USA
| | - Michael D McMullen
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- Division of Plant Sciences, Curtis Hall, University of Missouri, Columbia, MO, 65211,USA
| | - James B Holland
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- Department of Crop Science, Williams Hall, North Carolina State University, Raleigh, NC, 27695, USA
| | - Edward S Buckler
- Institute for Genomic Diversity, Biotechnology bldg., Cornell University, Ithaca, NY, 14853, USA
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- Department of Plant Breeding and Genetics, Bradfield Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Candice A Gardner
- USA Department of Agriculture (USDA) - Agricultural Research Service (USDA-ARS
- North Central Regional Plant Introduction Station, Agronomy bldg., Department of Agronomy, Iowa State University, Ames, IA, 50001, USA
| |
Collapse
|
3
|
Zhang Z, Buckler ES, Casstevens TM, Bradbury PJ. Software engineering the mixed model for genome-wide association studies on large samples. Brief Bioinform 2010; 10:664-75. [PMID: 19933212 DOI: 10.1093/bib/bbp050] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.
Collapse
Affiliation(s)
- Zhiwu Zhang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
| | | | | | | |
Collapse
|
4
|
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 2007. [PMID: 17586829 DOI: 10.1371/journal.pgen.0030004.4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Association analyses that exploit the natural diversity of a genome to map at very high resolutions are becoming increasingly important. In most studies, however, researchers must contend with the confounding effects of both population and family structure. TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.
Collapse
Affiliation(s)
- Peter J Bradbury
- United States Department of Agriculture-Agricultural Research Service, Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
| | | | | | | | | | | |
Collapse
|
5
|
Abstract
Association analyses that exploit the natural diversity of a genome to map at very high resolutions are becoming increasingly important. In most studies, however, researchers must contend with the confounding effects of both population and family structure. TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.
Collapse
Affiliation(s)
- Peter J Bradbury
- United States Department of Agriculture-Agricultural Research Service, Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
| | | | | | | | | | | |
Collapse
|
6
|
Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, Faga B, Canaran P, Fogleman M, Hebbard C, Avraham S, Schmidt S, Casstevens TM, Buckler ES, Stein L, McCouch S. Gramene: a bird's eye view of cereal genomes. Nucleic Acids Res 2006; 34:D717-23. [PMID: 16381966 PMCID: PMC1347516 DOI: 10.1093/nar/gkj154] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Rice, maize, sorghum, wheat, barley and the other major crop grasses from the family Poaceae (Gramineae) are mankind's most important source of calories and contribute tens of billions of dollars annually to the world economy (FAO 1999, http://www.fao.org; USDA 1997, http://www.usda.gov). Continued improvement of Poaceae crops is necessary in order to continue to feed an ever-growing world population. However, of the major crop grasses, only rice (Oryza sativa), with a compact genome of approximately 400 Mbp, has been sequenced and annotated. The Gramene database (http://www.gramene.org) takes advantage of the known genetic colinearity (synteny) between rice and the major crop plant genomes to provide maize, sorghum, millet, wheat, oat and barley researchers with the benefits of an annotated genome years before their own species are sequenced. Gramene is a one stop portal for finding curated literature, genetic and genomic datasets related to maps, markers, genes, genomes and quantitative trait loci. The addition of several new tools to Gramene has greatly facilitated the potential for comparative analysis among the grasses and contributes to our understanding of the anatomy, development, environmental responses and the factors influencing agronomic performance of cereal crops. Since the last publication on Gramene database by D. H. Ware, P. Jaiswal, J. Ni, I. V. Yap, X. Pan, K. Y. Clark, L. Teytelman, S. C. Schmidt, W. Zhao, K. Chang et al. [(2002), Plant Physiol., 130, 1606-1613], the database has undergone extensive changes that are described in this publication.
Collapse
Affiliation(s)
| | | | | | - Doreen Ware
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
- USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, Cornell UniversityIthaca, NY 14853, USA
| | - William Spooner
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Ken Youens-Clark
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Liya Ren
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Chengzhi Liang
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Wei Zhao
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Kiran Ratnapu
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Benjamin Faga
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Payan Canaran
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | | | - Shuly Avraham
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Steven Schmidt
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell UniversityIthaca, NY 14853, USA
- USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, Cornell UniversityIthaca, NY 14853, USA
| | - Lincoln Stein
- Cold Spring Harbor Laboratory1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Susan McCouch
- To whom correspondence should be addressed. Tel: +1 607 255 0420; Fax: +1 607 255 6683;
| |
Collapse
|
7
|
Abstract
UNLABELLED The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources available as web services that transfer XML formatted data via the SOAP protocol; (2) a JAVA API for programmatic access to data sources; and (3) a front-end application that allows users to manage data sources, retrieve data based on filters, sort/group data based on property values and save/open the data as XML files. AVAILABILITY The source code, compiled code, documentation and GDPC Browser are freely available at: www.maizegenetics.net/gdpc/index.html the current release of GDPC is version 1.0, with updated releases planned for the future. Comments are welcome.
Collapse
Affiliation(s)
- Terry M Casstevens
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853-2703, USA.
| | | |
Collapse
|