1
|
Abstract
A single reference genome does not fully capture species diversity. By contrast, a pangenome incorporates multiple genomes to capture the entire set of nonredundant genes in a given species, along with its genome diversity. New sequencing technologies enable researchers to produce multiple high-quality genome sequences and catalog diverse genetic variations with better precision. Pangenomic studies have detected structural variants in plant genomes, dissected the genetic architecture of agronomic traits, and helped unravel molecular underpinnings and evolutionary origins of plant phenotypes. The pangenome concept has further evolved into a so-called super-pangenome that includes wild relatives within a genus or clade and shifted to graph-based reference systems. Nevertheless, building pangenomes and representing complex structural variants remain challenging in many crops. Standardized computing pipelines and common data structures are needed to compare and interpret pangenomes. The growing body of plant pangenomics data requires new algorithms, huge data storage capacity, and training to help researchers and breeders take advantage of newly discovered genes and genetic variants.
Collapse
Affiliation(s)
- Murukarthick Jayakodi
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA;
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, Texas, USA
| | - Hyeonah Shim
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| | - Martin Mascher
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany;
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| |
Collapse
|
2
|
Marand AP, Jiang L, Gomez-Cano F, Minow MAA, Zhang X, Mendieta JP, Luo Z, Bang S, Yan H, Meyer C, Schlegel L, Johannes F, Schmitz RJ. The genetic architecture of cell type-specific cis regulation in maize. Science 2025; 388:eads6601. [PMID: 40245149 DOI: 10.1126/science.ads6601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 02/04/2025] [Indexed: 04/19/2025]
Abstract
Gene expression and complex phenotypes are determined by the activity of cis-regulatory elements. However, an understanding of how extant genetic variants affect cis regulation remains limited. Here, we investigated the consequences of cis-regulatory diversity using single-cell genomics of more than 0.7 million nuclei across 172 Zea mays (maize) inbreds. Our analyses pinpointed cis-regulatory elements distinct to domesticated maize and revealed how historical transposon activity has shaped the cis-regulatory landscape. Leveraging population genetics principles, we fine-mapped about 22,000 chromatin accessibility-associated genetic variants with widespread cell type-specific effects. Variants in TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR-binding sites were the most prevalent determinants of chromatin accessibility. Finally, integrating chromatin accessibility-associated variants, organismal trait variation, and population differentiation revealed how local adaptation has rewired regulatory networks in unique cellular contexts to alter maize flowering.
Collapse
Affiliation(s)
| | - Luguang Jiang
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA
| | - Fabio Gomez-Cano
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA
| | - Mark A A Minow
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Xuan Zhang
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - John P Mendieta
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Ziliang Luo
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Sohyun Bang
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Haidong Yan
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Cullan Meyer
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Luca Schlegel
- Plant Epigenomics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Frank Johannes
- Plant Epigenomics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | |
Collapse
|
3
|
Sun H, Tusso S, Dent CI, Goel M, Wijfjes RY, Baus LC, Dong X, Campoy JA, Kurdadze A, Walkemeier B, Sänger C, Huettel B, Hutten RCB, van Eck HJ, Dehmer KJ, Schneeberger K. The phased pan-genome of tetraploid European potato. Nature 2025:10.1038/s41586-025-08843-0. [PMID: 40240601 DOI: 10.1038/s41586-025-08843-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 02/26/2025] [Indexed: 04/18/2025]
Abstract
Potatoes were first brought to Europe in the sixteenth century1,2. Two hundred years later, one of the species had become one of the most important food sources across the entire continent and, later, even the entire world3. However, its highly heterozygous, autotetraploid genome has complicated its improvement since then4-7. Here we present the pan-genome of European potatoes generated from phased genome assemblies of ten historical potato cultivars, which includes approximately 85% of all haplotypes segregating in Europe. Sequence diversity between the haplotypes was extremely high (for example, 20× higher than in humans), owing to numerous introgressions from wild potato species. By contrast, haplotype diversity was very low, in agreement with the population bottlenecks caused by domestication and transition to Europe. To illustrate a practical application of the pan-genome, we converted it into a haplotype graph and used it to generate phased, megabase-scale pseudo-genome assemblies of commercial potatoes (including the famous French fries potato 'Russet Burbank') using cost-efficient short reads only. In summary, we present a nearly complete pan-genome of autotetraploid European potato, we describe extraordinarily high sequence diversity in a domesticated crop, and we outline how this resource might be used to accelerate genomics-assisted breeding and research.
Collapse
Affiliation(s)
- Hequan Sun
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Sergio Tusso
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Craig I Dent
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- CEPLAS: Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf, Germany
| | - Manish Goel
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Raúl Y Wijfjes
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Lisa C Baus
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Xiao Dong
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - José A Campoy
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Department of Agronomical Engineering, Institute of Plant Biotechnology, Universidad Politécnica de Cartagena, Cartagena, Spain
| | - Ana Kurdadze
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Birgit Walkemeier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Christine Sänger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Bruno Huettel
- Max Planck Genome Center, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Ronald C B Hutten
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - Herman J van Eck
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - Klaus J Dehmer
- CEPLAS: Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf, Germany
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gross Luesewitz, Germany
| | - Korbinian Schneeberger
- Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany.
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany.
- CEPLAS: Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf, Germany.
| |
Collapse
|
4
|
Zhao H, MacLeod IM, Keeble-Gagnere G, Barbulescu DM, Tibbits JF, Kaur S, Hayden M. Using genotype imputation to integrate Canola populations for genome-wide association and genomic prediction of blackleg resistance. BMC Genomics 2025; 26:215. [PMID: 40038585 PMCID: PMC11877698 DOI: 10.1186/s12864-025-11250-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 01/16/2025] [Indexed: 03/06/2025] Open
Abstract
BACKGROUND Integrating germplasm populations genotyped by different genotyping platforms via genotype imputation is a way to utilize accumulated genetic resources. In this study, we used 278 canola samples genotyped via whole-genome sequencing (WGS) at 10× coverage to evaluate the imputation accuracy of three imputation approaches. The optimal imputation methods were used to impute and integrate two Canola genotype datasets: a diverse canola collection genotyped by genotyping-by-sequencing via transcriptome (GBS-t) and a double haploid (DH) line collection genotyped with low-coverage WGS (skim-WGS). The genomic predictive ability (GP) and detection power of marker‒trait association (GWAS) of the combined population for blackleg resistance were evaluated. RESULTS The empirical imputation accuracy (r2) measured as the squared correlation between observed and imputed genotypes was moderate for Minimac3 when imputing from the GBS-t density to the WGS. The accuracy dramatically improved from 0.64 to 0.82 by removing SNPs with poor Minimac3-reported Rsq (Rsq < 0.2) quality statistics. The r2 for GLIMPSE was higher than that for Beagle when imputing from different low-coverage to full-coverage WGS. We imputed and integrated the diverse canola collection and the DH lines, and the combined population showed similar or slightly greater predictive ability (PA) for blackleg resistance traits than did each of the single populations with ~ 921 K SNPs. Higher marker-trait association (MTA) detection powers were indicated with the combined population; however, similar numbers of MTAs were discovered when each single population was combined in a meta-GWAS. CONCLUSION It is feasible to impute and integrate germplasms from different sequencing platforms for downstream analyses. However, genetic heterogeneity across populations could add complexity to the analysis. Increasing the sample size by combining datasets showed slightly greater predictive ability and greater detection power in GWASs in the present study.
Collapse
Affiliation(s)
- Huanhuan Zhao
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Gabriel Keeble-Gagnere
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Denise M Barbulescu
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Josquin F Tibbits
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Sukhjiwan Kaur
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Matthew Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.
| |
Collapse
|
5
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Trucillo Silva I, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Soriano Chavez E, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Segura Abá K, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global genotype by environment prediction competition reveals that diverse modeling strategies can deliver satisfactory maize yield estimates. Genetics 2025; 229:iyae195. [PMID: 39576009 PMCID: PMC12054733 DOI: 10.1093/genetics/iyae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/13/2024] [Indexed: 11/27/2024] Open
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023, the first open-to-the-public Genomes to Fields initiative Genotype by Environment prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements, and field management notes gathered by the project over 9 years. The competition attracted registrants from around the world with representation from academic, government, industry, and nonprofit institutions as well as unaffiliated. These participants came from diverse disciplines, including plant science, animal science, breeding, statistics, computational biology, and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved 2 models combining machine learning and traditional breeding tools: 1 model emphasized environment using features extracted by random forest, ridge regression, and least squares, and 1 focused on genetics. Other high-performing teams' methods included quantitative genetics, machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics, weather, and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D Washburn
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA 50131, USA
| | - Joseph L Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - James B Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
- USDA-ARS, Plant Science Research Unit, Raleigh, NC 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | | | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Renaud Rincent
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Julie Aubert
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Hugo Gangloff
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R Kick
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - Emily S Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Jason L Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Gayara D Fernando
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104, USA
| | - Annan J Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Max J Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - B Kirtley Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | | | - Hawlader A Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Monica F Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Shriprabha R Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| |
Collapse
|
6
|
Liu HJ, Liu J, Zhai Z, Dai M, Tian F, Wu Y, Tang J, Lu Y, Wang H, Jackson D, Yang X, Qin F, Xu M, Fernie AR, Zhang Z, Yan J. Maize2035: A decadal vision for intelligent maize breeding. MOLECULAR PLANT 2025; 18:313-332. [PMID: 39827366 DOI: 10.1016/j.molp.2025.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/12/2025] [Accepted: 01/14/2025] [Indexed: 01/22/2025]
Abstract
Maize, a cornerstone of global food security, has undergone remarkable transformations through breeding, yet further increase in global maize production faces mounting challenges in a changing world. In this Perspective paper, we overview the historical successes of maize breeding that laid the foundation for present opportunities. We examine both the specific and shared breeding goals related to diverse geographies and end-use demands. Achieving these coordinated breeding objectives requires a holistic approach to trait improvement for sustainable agriculture. We discuss cutting-edge solutions, including multi-omics approaches from single-cell analysis to holobionts, smart breeding with advanced technologies and algorithms, and the transformative potential of rational design with synthetic biology approaches. A transition toward a data-driven future is currently underway, with large-scale precision agriculture and autonomous systems poised to revolutionize farming practice. Realizing these futuristic opportunities hinges on collaborative efforts spanning scientific discoveries, technology translations, and socioeconomic considerations in maximizing human and environmental well-being.
Collapse
Affiliation(s)
- Hai-Jun Liu
- Yazhouwan National Laboratory, Sanya 572024, China
| | - Jie Liu
- Yazhouwan National Laboratory, Sanya 572024, China
| | - Zhiwen Zhai
- Yazhouwan National Laboratory, Sanya 572024, China
| | - Mingqiu Dai
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Feng Tian
- State Key Laboratory of Plant Environmental Resilience, China Agricultural University, Beijing 100193, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100193, China; National Maize Improvement Center of China, China Agricultural University, Beijing 100193, China
| | - Yongrui Wu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai 200032, China
| | - Jihua Tang
- National Key Laboratory of Wheat and Maize Crop Science, Collaborative Innovation Center of Henan Grain Crops, College of Agronomy, Henan Agricultural University, Zhengzhou 450002, China
| | - Yanli Lu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Haiyang Wang
- Yazhouwan National Laboratory, Sanya 572024, China
| | - David Jackson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Xiaohong Yang
- State Key Laboratory of Plant Environmental Resilience, China Agricultural University, Beijing 100193, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100193, China; National Maize Improvement Center of China, China Agricultural University, Beijing 100193, China
| | - Feng Qin
- State Key Laboratory of Plant Environmental Resilience, China Agricultural University, Beijing 100193, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100193, China
| | - Mingliang Xu
- State Key Laboratory of Plant Environmental Resilience, China Agricultural University, Beijing 100193, China; National Maize Improvement Center of China, China Agricultural University, Beijing 100193, China
| | - Alisdair R Fernie
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany
| | - Zuxin Zhang
- Yazhouwan National Laboratory, Sanya 572024, China
| | - Jianbing Yan
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| |
Collapse
|
7
|
Ruperao P, Rangan P, Shah T, Sharma V, Rathore A, Mayes S, Pandey MK. Developing pangenomes for large and complex plant genomes and their representation formats. J Adv Res 2025:S2090-1232(25)00071-2. [PMID: 39894347 DOI: 10.1016/j.jare.2025.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 01/27/2025] [Accepted: 01/27/2025] [Indexed: 02/04/2025] Open
Abstract
BACKGROUND The development of pangenomes has revolutionized genomic studies by capturing the complete genetic diversity within a species. Pangenome assembly integrates data from multiple individuals to construct a comprehensive genomic landscape, revealing both core and accessory genomic elements. This approach enables the identification of novel genes, structural variations, and gene presence-absence variations, providing insights into species evolution, adaptation, and trait variation. Representing pangenomes requires innovative visualization formats that effectively convey the complex genomic structures and variations. AIM This review delves into contemporary methodologies and recent advancements in constructing pangenomes, particularly in plant genomes. It examines the structure of pangenome representation, including format comparison, conversion, visualization techniques, and their implications for enhancing crop improvement strategies. KEY SCIENTIFIC CONCEPTS OF REVIEW Earlier comparative studies have illuminated novel gene sequences, copy number variations, and presence-absence variations across diverse crop species. The concept of a pan-genome, which captures multiple genetic variations from a broad spectrum of genotypes, offers a holistic perspective of a species' genetic makeup. However, constructing a pan-genome for plants with larger genomes poses challenges, including managing vast genome sequence data and comprehending the genetic variations within the germplasm. To address these challenges, researchers have explored cost-effective alternatives to encapsulate species diversity in a single assembly known as a pangenome. This involves reducing the volume of genome sequences while focusing on genetic variations. With the growing prominence of the pan-genome concept in plant genomics, several software tools have emerged to facilitate pangenome construction. This review sheds light on developing and utilizing software tools tailored for constructing pan-genomes in plants. It also discusses representation formats suitable for downstream analyses, offering valuable insights into the genetic landscape and evolutionary dynamics of plant species. In summary, this review underscores the significance of pan-genome construction and representation formats in resolving the genetic architecture of plants, particularly those with complex genomes. It provides a comprehensive overview of recent advancements, aiding in exploring and understanding plant genetic diversity.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi, India; Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Australia
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi, Kenya
| | - Vinay Sharma
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Abhishek Rathore
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Manish K Pandey
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
| |
Collapse
|
8
|
Stack GM, Quade MA, Wilkerson DG, Monserrate LA, Bentz PC, Carey SB, Grimwood J, Toth JA, Crawford S, Harkess A, Smart LB. Comparison of Recombination Rate, Reference Bias, and Unique Pangenomic Haplotypes in Cannabis sativa Using Seven De Novo Genome Assemblies. Int J Mol Sci 2025; 26:1165. [PMID: 39940933 PMCID: PMC11818205 DOI: 10.3390/ijms26031165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 01/20/2025] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Genomic characterization of Cannabis sativa has accelerated rapidly in the last decade as sequencing costs have decreased and public and private interest in the species has increased. Here, we present seven new chromosome-level haplotype-phased genomes of C. sativa. All of these genotypes were alive at the time of publication, and several have numerous years of associated phenotype data. We performed a k-mer-based pangenome analysis to contextualize these assemblies within over 200 existing assemblies. This allowed us to identify unique haplotypes and genomic diversity among Cannabis sativa genotypes. We leveraged linkage maps constructed from F2 progeny of two of the assembled genotypes to characterize the recombination rate across the genome showing strong periphery-biased recombination. Lastly, we re-aligned a bulk segregant analysis dataset for the major-effect flowering locus Early1 to several of the new assemblies to evaluate the impact of reference bias on the mapping results and narrow the locus to a smaller region of the chromosome. These new assemblies, combined with the continued propagation of the genotypes, will contribute to the growing body of genomic resources for C. sativa to accelerate future research efforts.
Collapse
Affiliation(s)
- George M. Stack
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Michael A. Quade
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Dustin G. Wilkerson
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Luis A. Monserrate
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Philip C. Bentz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Sarah B. Carey
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Jacob A. Toth
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | | | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Lawrence B. Smart
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| |
Collapse
|
9
|
Gage JL, Romay MC, Buckler ES. Maize inbreds show allelic variation for diel transcription patterns. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.16.628400. [PMID: 39763849 PMCID: PMC11702552 DOI: 10.1101/2024.12.16.628400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Circadian entrainment and external cues can cause gene transcript abundance to oscillate throughout the day, and these patterns of diel transcript oscillation vary across genes and plant species. Less is known about within-species allelic variation for diel patterns of transcript oscillation, or about how regulatory sequence variation influences diel transcription patterns. In this study, we evaluated diel transcript abundance for 24 diverse maize inbred lines. We observed extensive natural variation in diel transcription patterns, with two-fold variation in the number of genes that oscillate over the course of the day. A convolutional neural network trained to predict oscillation from promoter sequence identified sequences previously reported as binding motifs for known circadian clock genes in other plant systems. Genes showing diel transcription patterns that cosegregate with promoter sequence haplotypes are enriched for associations with photoperiod sensitivity and may have been indirect targets of selection as maize was adapted to longer day lengths at higher latitudes. These findings support the idea that cis-regulatory sequence variation influences patterns of gene expression, which in turn can have effects on phenotypic plasticity and local adaptation.
Collapse
Affiliation(s)
- Joseph L. Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695
- NC Plant Sciences Initiative, North Carolina State University, Raleigh, NC, 27606
| | - M. Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853
- USDA-ARS, Ithaca, NY 14850
- School of Integrative Plant Science, Plant Breeding and Genetics Section, Cornell University, Ithaca NY 14853
| |
Collapse
|
10
|
Kaur H, Shannon LM, Samac DA. A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study. BMC Genomics 2024; 25:1022. [PMID: 39482604 PMCID: PMC11526573 DOI: 10.1186/s12864-024-10931-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/21/2024] [Indexed: 11/03/2024] Open
Abstract
BACKGROUND The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes. MAIN BODY In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline. CONCLUSION Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.
Collapse
Affiliation(s)
- Harpreet Kaur
- Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA.
| | - Laura M Shannon
- Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA
| | - Deborah A Samac
- USDA-ARS, Plant Science Research Unit, St. Paul, MN, 55108, USA
| |
Collapse
|
11
|
Chandra G, Hossen MH, Scholz S, Dilthey AT, Gibney D, Jain C. Integer programming framework for pangenome-based genome inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.27.620212. [PMID: 39554168 PMCID: PMC11565907 DOI: 10.1101/2024.10.27.620212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance. Recent works highlight the advantage of haplotype-resolved pangenome graphs in addressing these challenges. Building on these developments, we propose a rigorous alignment-free genotyping framework. Our formulation seeks a path through the pangenome graph that maximizes the matches between the path and substrings of sequencing reads (e.g., k-mers) while minimizing recombination events (haplotype switches) along the path. We prove that this problem is NP-Hard and develop efficient integer-programming solutions. We benchmarked the algorithm using downsampled short-read datasets from homozygous human cell lines with coverage ranging from 0.1× to 10×. Our algorithm accurately estimates complete major histocompatibility complex (MHC) haplotype sequences with small edit distances from the ground-truth sequences, providing a significant advantage over existing methods on low-coverage inputs. Although our algorithm is designed for haploid samples, we discuss future extensions to diploid samples.
Collapse
Affiliation(s)
- Ghanshyam Chandra
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA 560012, India
| | - Md Helal Hossen
- Department of Computer Science, The University of Texas at Dallas, TX 75080, USA
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Daniel Gibney
- Department of Computer Science, The University of Texas at Dallas, TX 75080, USA
| | - Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA 560012, India
| |
Collapse
|
12
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Silva IT, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Chavez ES, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Abá KS, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global Genotype by Environment Prediction Competition Reveals That Diverse Modeling Strategies Can Deliver Satisfactory Maize Yield Estimates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612969. [PMID: 39345633 PMCID: PMC11429743 DOI: 10.1101/2024.09.13.612969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023 the first open-to-the-public Genomes to Fields (G2F) initiative Genotype by Environment (GxE) prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements and field management notes, gathered by the project over nine years. The competition attracted registrants from around the world with representation from academic, government, industry, and non-profit institutions as well as unaffiliated. These participants came from diverse disciplines include plant science, animal science, breeding, statistics, computational biology and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved two models combining machine learning and traditional breeding tools: one model emphasized environment using features extracted by Random Forest, Ridge Regression and Least-squares, and one focused on genetics. Other high-performing teams' methods included quantitative genetics, classical machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics; weather; and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D. Washburn
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, United States
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, 50131, USA
| | - Joseph L. Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - James B. Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
- USDA-ARS Plant Science Research Unit, Raleigh, NC, 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Cristiano Zimmer
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Julie Aubert
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Hugo Gangloff
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R. Kick
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - Emily S. Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Jason L. Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Gayara D. Fernando
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Annan J. Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Max J. Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - B K. Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - C. P. James Chen
- School of Animal Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Hawlader A. Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Monica F. Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Shriprabha R. Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
13
|
Wang H, Chen M, Wei X, Xia R, Pei D, Huang X, Han B. Computational tools for plant genomics and breeding. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1579-1590. [PMID: 38676814 DOI: 10.1007/s11427-024-2578-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/25/2024] [Indexed: 04/29/2024]
Abstract
Plant genomics and crop breeding are at the intersection of biotechnology and information technology. Driven by a combination of high-throughput sequencing, molecular biology and data science, great advances have been made in omics technologies at every step along the central dogma, especially in genome assembling, genome annotation, epigenomic profiling, and transcriptome profiling. These advances further revolutionized three directions of development. One is genetic dissection of complex traits in crops, along with genomic prediction and selection. The second is comparative genomics and evolution, which open up new opportunities to depict the evolutionary constraints of biological sequences for deleterious variant discovery. The third direction is the development of deep learning approaches for the rational design of biological sequences, especially proteins, for synthetic biology. All three directions of development serve as the foundation for a new era of crop breeding where agronomic traits are enhanced by genome design.
Collapse
Affiliation(s)
- Hai Wang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China.
| | - Mengjiao Chen
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Xin Wei
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Rui Xia
- College of Horticulture, South China Agricultural University, Guangzhou, 510640, China
| | - Dong Pei
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Bin Han
- National Center for Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200233, China
| |
Collapse
|
14
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024; 25:563-577. [PMID: 38378816 PMCID: PMC7616794 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
15
|
Conover JL, Grover CE, Sharbrough J, Sloan DB, Peterson DG, Wendel JF. Little evidence for homoeologous gene conversion and homoeologous exchange events in Gossypium allopolyploids. AMERICAN JOURNAL OF BOTANY 2024; 111:e16386. [PMID: 39107998 DOI: 10.1002/ajb2.16386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 07/01/2024] [Accepted: 07/01/2024] [Indexed: 08/24/2024]
Abstract
PREMISE A complicating factor in analyzing allopolyploid genomes is the possibility of physical interactions between homoeologous chromosomes during meiosis, resulting in either crossover (homoeologous exchanges) or non-crossover products (homoeologous gene conversion). Homoeologous gene conversion was first described in cotton by comparing SNP patterns in sequences from two diploid progenitors with those from the allopolyploid subgenomes. These analyses, however, did not explicitly consider other evolutionary scenarios that may give rise to similar SNP patterns as homoeologous gene conversion, creating uncertainties about the reality of the inferred gene conversion events. METHODS Here, we use an expanded phylogenetic sampling of high-quality genome assemblies from seven allopolyploid Gossypium species (all derived from the same polyploidy event), four diploid species (two closely related to each subgenome), and a diploid outgroup to derive a robust method for identifying potential genomic regions of gene conversion and homoeologous exchange. RESULTS We found little evidence for homoeologous gene conversion in allopolyploid cottons, and that only two of the 40 best-supported events were shared by more than one species. We did, however, reveal a single, shared homoeologous exchange event at one end of chromosome 1, which occurred shortly after allopolyploidization but prior to divergence of the descendant species. CONCLUSIONS Overall, our analyses demonstrated that homoeologous gene conversion and homoeologous exchanges are uncommon in Gossypium, affecting between zero and 24 genes per subgenome (0.0-0.065%) across the seven species. More generally, we highlighted the potential problems of using simple four-taxon tests to investigate patterns of homoeologous gene conversion in established allopolyploids.
Collapse
Affiliation(s)
- Justin L Conover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 50010, IA, USA
- Ecology and Evolutionary Biology Department, University of Arizona, Tucson, 85718, AZ, USA
- Molecular and Cellular Biology Department, University of Arizona, Tucson, 85718, AZ, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 50010, IA, USA
| | - Joel Sharbrough
- Biology Department, New Mexico Institute of Mining and Technology, Socorro, 87801, NM, USA
| | - Daniel B Sloan
- Biology Department, Colorado State University, Fort Collins, 80521, CO, USA
| | - Daniel G Peterson
- Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, 39762, MS, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 50010, IA, USA
| |
Collapse
|
16
|
Fernandes IK, Vieira CC, Dias KOG, Fernandes SB. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:189. [PMID: 39044035 PMCID: PMC11266441 DOI: 10.1007/s00122-024-04687-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 06/29/2024] [Indexed: 07/25/2024]
Abstract
KEY MESSAGE Incorporating feature-engineered environmental data into machine learning-based genomic prediction models is an efficient approach to indirectly model genotype-by-environment interactions. Complementing phenotypic traits and molecular markers with high-dimensional data such as climate and soil information is becoming a common practice in breeding programs. This study explored new ways to combine non-genetic information in genomic prediction models using machine learning. Using the multi-environment trial data from the Genomes To Fields initiative, different models to predict maize grain yield were adjusted using various inputs: genetic, environmental, or a combination of both, either in an additive (genetic-and-environmental; G+E) or a multiplicative (genotype-by-environment interaction; GEI) manner. When including environmental data, the mean prediction accuracy of machine learning genomic prediction models increased up to 7% over the well-established Factor Analytic Multiplicative Mixed Model among the three cross-validation scenarios evaluated. Moreover, using the G+E model was more advantageous than the GEI model given the superior, or at least comparable, prediction accuracy, the lower usage of computational memory and time, and the flexibility of accounting for interactions by construction. Our results illustrate the flexibility provided by the ML framework, particularly with feature engineering. We show that the feature engineering stage offers a viable option for envirotyping and generates valuable information for machine learning-based genomic prediction models. Furthermore, we verified that the genotype-by-environment interactions may be considered using tree-based approaches without explicitly including interactions in the model. These findings support the growing interest in merging high-dimensional genotypic and environmental data into predictive modeling.
Collapse
Affiliation(s)
- Igor K Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Caio C Vieira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Kaio O G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Brazil
| | - Samuel B Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA.
| |
Collapse
|
17
|
Song B, Buckler ES, Stitzer MC. New whole-genome alignment tools are needed for tapping into plant diversity. TRENDS IN PLANT SCIENCE 2024; 29:355-369. [PMID: 37749022 DOI: 10.1016/j.tplants.2023.08.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/19/2023] [Accepted: 08/23/2023] [Indexed: 09/27/2023]
Abstract
Genome alignment is one of the most foundational methods for genome sequence studies. With rapid advances in sequencing and assembly technologies, these newly assembled genomes present challenges for alignment tools to meet the increased complexity and scale. Plant genome alignment is technologically challenging because of frequent whole-genome duplications (WGDs) as well as chromosome rearrangements and fractionation, high nucleotide diversity, widespread structural variation, and high transposable element (TE) activity causing large proportions of repeat elements. We summarize classical pairwise and multiple genome alignment (MGA) methods, and highlight techniques that are widely used or are being developed by the plant research community. We also outline the remaining challenges for precise genome alignment and the interpretation of alignment results in plants.
Collapse
Affiliation(s)
- Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong 261325, China; Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA; Agricultural Research Service, United States Department of Agriculture, Ithaca, NY 14853, USA
| | - Michelle C Stitzer
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
18
|
Wang H, Bernardo A, St Amand P, Bai G, Bowden RL, Guttieri MJ, Jordan KW. Skim exome capture genotyping in wheat. THE PLANT GENOME 2023; 16:e20381. [PMID: 37604795 DOI: 10.1002/tpg2.20381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/12/2023] [Accepted: 07/29/2023] [Indexed: 08/23/2023]
Abstract
Next-generation sequencing (NGS) technology advancements continue to reduce the cost of high-throughput genome-wide genotyping for breeding and genetics research. Skim sequencing, which surveys the entire genome at low coverage, has become feasible for quantitative trait locus (QTL) mapping and genomic selection in various crops. However, the genome complexity of allopolyploid crops such as wheat (Triticum aestivum L.) still poses a significant challenge for genome-wide genotyping. Targeted sequencing of the protein-coding regions (i.e., exome) reduces sequencing costs compared to whole genome re-sequencing and can be used for marker discovery and genotyping. We developed a method called skim exome capture (SEC) that combines the strengths of these existing technologies and produces targeted genotyping data while decreasing the cost on a per-sample basis compared to traditional exome capture. Specifically, we fragmented genomic DNA using a tagmentation approach, then enriched those fragments for the low-copy genic portion of the genome using commercial wheat exome baits and multiplexed the sequencing at different levels to achieve desired coverage. We demonstrated that for a library of 48 samples, ∼7-8× target coverage was sufficient for high-quality variant detection. For higher multiplexing levels of 528 and 1056 samples per library, we achieved an average coverage of 0.76× and 0.32×, respectively. Combining these lower coverage SEC sequencing data with genotype imputation using a customized wheat practical haplotype graph database that we developed, we identified hundreds of thousands of high-quality genic variants across the genome. The SEC method can be used for high-resolution QTL mapping, genome-wide association studies, genomic selection, and other downstream applications.
Collapse
Affiliation(s)
- Hongliang Wang
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Amy Bernardo
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Paul St Amand
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Guihua Bai
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Robert L Bowden
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Mary J Guttieri
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| | - Katherine W Jordan
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Center for Grain and Animal Health Research, Manhattan, Kansas, USA
| |
Collapse
|
19
|
Lopez-Cruz M, Aguate FM, Washburn JD, de Leon N, Kaeppler SM, Lima DC, Tan R, Thompson A, De La Bretonne LW, de Los Campos G. Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America. Nat Commun 2023; 14:6904. [PMID: 37903778 PMCID: PMC10616096 DOI: 10.1038/s41467-023-42687-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/18/2023] [Indexed: 11/01/2023] Open
Abstract
Genotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014. Here, we curate and expand this data set by generating environmental covariates (using a crop model) for each of the trials. The resulting data set includes DNA genotypes and environmental data linked to more than 70,000 phenotypic records of grain yield and flowering traits for more than 4000 hybrids. We show how this valuable data set can serve as a benchmark in agricultural modeling and prediction, paving the way for countless G×E investigations in maize. We use multivariate analyses to characterize the data set's genetic and environmental structure, study the association of key environmental factors with traits, and provide benchmarks using genomic prediction models.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Fernando M Aguate
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, University of Missouri, Columbia, MO, 65211, USA
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
- Wisconsin Crop Innovation Center, University of Wisconsin, Middleton, WI, 53562, USA
| | | | - Ruijuan Tan
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Addie Thompson
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
20
|
Aylward AJ, Petrus S, Mamerto A, Hartwick NT, Michael TP. PanKmer: k-mer-based and reference-free pangenome analysis. Bioinformatics 2023; 39:btad621. [PMID: 37846049 PMCID: PMC10603592 DOI: 10.1093/bioinformatics/btad621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/29/2023] [Accepted: 10/13/2023] [Indexed: 10/18/2023] Open
Abstract
SUMMARY Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
Collapse
Affiliation(s)
- Anthony J Aylward
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Semar Petrus
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Allen Mamerto
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Nolan T Hartwick
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Todd P Michael
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| |
Collapse
|
21
|
Lima DC, Aviles AC, Alpers RT, McFarland BA, Kaeppler S, Ertl D, Romay MC, Gage JL, Holland J, Beissinger T, Bohn M, Buckler E, Edwards J, Flint-Garcia S, Hirsch CN, Hood E, Hooker DC, Knoll JE, Kolkman JM, Liu S, McKay J, Minyo R, Moreta DE, Murray SC, Nelson R, Schnable JC, Sekhon RS, Singh MP, Thomison P, Thompson A, Tuinstra M, Wallace J, Washburn JD, Weldekidan T, Wisser RJ, Xu W, de Leon N. 2018-2019 field seasons of the Maize Genomes to Fields (G2F) G x E project. BMC Genom Data 2023; 24:29. [PMID: 37231352 DOI: 10.1186/s12863-023-01129-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 05/16/2023] [Indexed: 05/27/2023] Open
Abstract
OBJECTIVES This report provides information about the public release of the 2018-2019 Maize G X E project of the Genomes to Fields (G2F) Initiative datasets. G2F is an umbrella initiative that evaluates maize hybrids and inbred lines across multiple environments and makes available phenotypic, genotypic, environmental, and metadata information. The initiative understands the necessity to characterize and deploy public sources of genetic diversity to face the challenges for more sustainable agriculture in the context of variable environmental conditions. DATA DESCRIPTION Datasets include phenotypic, climatic, and soil measurements, metadata information, and inbred genotypic information for each combination of location and year. Collaborators in the G2F initiative collected data for each location and year; members of the group responsible for coordination and data processing combined all the collected information and removed obvious erroneous data. The collaborators received the data before the DOI release to verify and declare that the data generated in their own locations was accurate. ReadMe and description files are available for each dataset. Previous years of evaluation are already publicly available, with common hybrids present to connect across all locations and years evaluated since this project's inception.
Collapse
Affiliation(s)
| | | | | | - Bridget A McFarland
- Panama-USA Commission for the Eradication and Prevention of Screwworm (COPEG), USDA-APHIS-IS, Pacora, Panama
| | - Shawn Kaeppler
- Department of Agronomy, University of WI - Madison, Madison, WI, 53706, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, 50131, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Joseph L Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - James Holland
- USDA-ARS Plant Science Research Unit, Raleigh, NC, 27606, USA
| | - Timothy Beissinger
- Department of Crop Science, University of Göttingen Center for Integrated Breeding Research, Carl-Sprengel-Weg 1, 37075, Göttingen, Germany
| | - Martin Bohn
- University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | | | - Jode Edwards
- USDA ARS CICGRU, 716 Farmhouse Ln, Ames, IA, 50011-1051, USA
| | | | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN, 55108, USA
| | - Elizabeth Hood
- College of Agriculture, Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR, 72404, USA
| | - David C Hooker
- Department of Plant Agriculture, University of Guelph, Ridgetown Campus, Ridgetown, ON, Canada
| | - Joseph E Knoll
- USDA-ARS Crop Genetics and Breeding Research Unit, Tifton, GA, 31793, USA
| | - Judith M Kolkman
- School of Integrative Plant Science, Cornell University, Ithaca, NY, 14850, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66503, USA
| | - John McKay
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Richard Minyo
- Department of Horticulture and Crop Science, Ohio State University College of Food, Agricultural, and Environmental Sciences, Wooster, OH, 44691, USA
| | - Danilo E Moreta
- School of Integrative Plant Science, Cornell University, Ithaca, NY, 14850, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, 77843, USA
| | | | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Rajandeep S Sekhon
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, 29634, USA
| | - Maninder P Singh
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Addie Thompson
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Mitchell Tuinstra
- Department of Agronomy, Purdue University, West Lafayette, IN, 49707, USA
| | - Jason Wallace
- Department of Crop & Soil Sciences, University of Georgia, Athens, GA, 30602, USA
| | | | | | - Randall J Wisser
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA
- Laboratoire d'Ecophysiologie Des Plantes Sous Stress Environmentaux, INRAE, 34060, Montpellier, France
| | - Wenwei Xu
- Texas A&M University, College Station, TX, 77843, USA
| | - Natalia de Leon
- Department of Agronomy, University of WI - Madison, Madison, WI, 53706, USA
| |
Collapse
|
22
|
Brown PJ. Haplotyping interspecific hybrids by dual alignment to both parental genomes. THE PLANT GENOME 2023:e20324. [PMID: 37057366 DOI: 10.1002/tpg2.20324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/30/2023] [Accepted: 02/21/2023] [Indexed: 06/19/2023]
Abstract
Sequencing-based genotyping of heterozygous diploids requires sufficient depth to accurately call heterozygous genotypes. In interspecific hybrids, alignment of reads to both parental genomes simultaneously can generate haploid data, potentially eliminating the problem of heterozygosity. Two populations of interspecific hybrid rootstocks of walnut (Juglans) and pistachio (Pistacia) were genotyped using alignment to the maternal genome, paternal genome, and dual alignment to both genomes simultaneously. Downsampling was used to examine concordance of imputed genotype calls as a function of sequencing depth. Dual alignment resulted in datasets essentially free of heterozygous genotypes, simplifying the identification and removal of cross-contaminated samples. Concordance between full and downsampled genotype calls was always highest after dual alignment. Nearly all single nucleotide polymorphisms (SNPs) in dual alignment datasets were shared with the corresponding single-parent datasets, but 60%-90% of single-parent SNPs were private to that dataset. Private SNPs in single-parent datasets had higher rates of heterozygosity, lower levels of concordance, and were enriched in fixed differences between parental genomes ("homeo-SNPs") compared to shared SNPs in the same dataset. In multi-parental walnut hybrids, the paternal-aligned dataset was ineffective at resolving population structure in the maternal parent. Overall, the dual alignment strategy effectively produced phased, haploid data, increasing data quality and reducing cost.
Collapse
Affiliation(s)
- Pat J Brown
- Department of Plant Sciences, University of California, Davis, California, USA
| |
Collapse
|