1
|
Arlt C, van Inghelandt D, Li J, Stich B. Assessment of genomic prediction capabilities of transcriptome data in a barley multi-parent RIL population. RESEARCH SQUARE 2025:rs.3.rs-6145169. [PMID: 40235487 PMCID: PMC11998762 DOI: 10.21203/rs.3.rs-6145169/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
The field of genomic selection (GS) is advancing rapidly on many fronts including the utilization of multi-omics datasets with the goal to increase prediction ability (PA) and to become an integral part of an increasing number of breeding programs ensuring future food security. In this study, we used RNA sequencing (RNA-Seq) data to perform genomic prediction (GP) on three related barley RIL populations investigating the potential of increasing PA by combining genomic and transcriptomic datasets, adding whole genome sequencing (WGS) SNP data, functional parameter filtering, and empirical quality filtering. Our RNA-Seq data were generated cost-efficiently using small footprint plant cultivation, high-throughput RNA extraction, and library preparation miniaturization. We also examined the depth of the sequencing as an additional cost-saving measure. We used five-fold cross-validation to evaluate the PA of the gene expression dataset, the RNA-Seq SNP dataset, and the consensus SNP dataset between the RNA-Seq and parental WGS data, resulting in PAs between 0.73 and 0.78. The consensus SNP dataset performed best, with five out of eight traits performing significantly better compared to a 50K SNP array, which served as a benchmark. The advantage of the consensus SNP dataset was most prominent in the inter-population predictions, in which the training- and validation-set originated from different RIL sub-populations. We could therefore not only show that RNA-Seq data alone are able to predict various complex traits in barley using RIL, but also that the performance can be further increased by WGS data for which the public availability will steadily increase.
Collapse
|
2
|
Gibbs PM, Paril JF, Fournier-Level A. Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations. Genetics 2025; 229:iyaf003. [PMID: 39814947 DOI: 10.1093/genetics/iyaf003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 12/29/2024] [Indexed: 01/18/2025] Open
Abstract
Genomic prediction applies to any agro- or ecologically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear, non-parametric approaches, has not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits in 1,000+ natural genotypes of the model plant Arabidopsis thaliana, we assessed the performance of penalized regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait-notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering time and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.
Collapse
Affiliation(s)
- Patrick M Gibbs
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
| | - Jefferson F Paril
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
- Agriculture Victoria Research, Department of Energy, Environment and Climate Action, La Trobe University, AgriBio, 5 Ring Road, Bundoora, VIC 3083, Australia
| | - Alexandre Fournier-Level
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
| |
Collapse
|
3
|
Zhu W, Li W, Zhang H, Li L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2025; 67:722-739. [PMID: 39467106 PMCID: PMC11951406 DOI: 10.1111/jipb.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/25/2024] [Accepted: 09/10/2024] [Indexed: 10/30/2024]
Abstract
The past decade has witnessed rapid developments in gene discovery, biological big data (BBD), artificial intelligence (AI)-aided technologies, and molecular breeding. These advancements are expected to accelerate crop breeding under the pressure of increasing demands for food. Here, we first summarize current breeding methods and discuss the need for new ways to support breeding efforts. Then, we review how to combine BBD and AI technologies for genetic dissection, exploring functional genes, predicting regulatory elements and functional domains, and phenotypic prediction. Finally, we propose the concept of intelligent precision design breeding (IPDB) driven by AI technology and offer ideas about how to implement IPDB. We hope that IPDB will enhance the predictability, efficiency, and cost of crop breeding compared with current technologies. As an example of IPDB, we explore the possibilities offered by CropGPT, which combines biological techniques, bioinformatics, and breeding art from breeders, and presents an open, shareable, and cooperative breeding system. IPDB provides integrated services and communication platforms for biologists, bioinformatics experts, germplasm resource specialists, breeders, dealers, and farmers, and should be well suited for future breeding.
Collapse
Affiliation(s)
- Wanchao Zhu
- Key Laboratory of Biology and Genetic Improvement of Maize in Arid Area of Northwest Region, College of AgronomyNorthwest A&F UniversityYangling712100China
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhan430070China
| | - Weifu Li
- College of InformaticsHuazhong Agricultural UniversityWuhan430070China
- Engineering Research Center of Intelligent Technology for AgricultureMinistry of EducationWuhan430070China
| | - Hongwei Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijing100081China
| | - Lin Li
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhan430070China
| |
Collapse
|
4
|
Klimkowski Arango N, Morgante F. Comparing statistical learning methods for complex trait prediction from gene expression. PLoS One 2025; 20:e0317516. [PMID: 39932918 PMCID: PMC11813155 DOI: 10.1371/journal.pone.0317516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 12/30/2024] [Indexed: 02/13/2025] Open
Abstract
Accurate prediction of complex traits is an important task in quantitative genetics. Genotypes have been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regression methods, dimension reduction methods, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were tested in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance and startle response from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effects-ranging from models that assume that every gene affects the trait to more sparse models-and their ability to capture gene-gene interactions. We also used functional annotation (i.e., Gene Ontology (GO)) as a source of biological information to inform prediction models. The results show that differences in prediction accuracy exist. For example, methods performing variable selection achieved higher prediction accuracy for starvation resistance in females, while they generally had lower accuracy for startle response in both sexes. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms. Notably, the Insulin-like Receptor (InR) was prevalent across methods and sexes for starvation resistance. For startle response, crumbs (crb) and imaginal disc growth factor 2 (Idgf2) were found for females and males, respectively. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.
Collapse
Affiliation(s)
- Noah Klimkowski Arango
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
5
|
Torres-Rodríguez JV, Li D, Schnable JC. Evolving best practices for transcriptome-wide association studies accelerate discovery of gene-phenotype links. CURRENT OPINION IN PLANT BIOLOGY 2025; 83:102670. [PMID: 39626491 DOI: 10.1016/j.pbi.2024.102670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/20/2024] [Accepted: 11/01/2024] [Indexed: 02/01/2025]
Abstract
Transcriptome-wide association studies (TWAS) complement genome-wide association studies (GWAS) by using gene expression data to link specific genes to phenotypes. This review examines 37 TWAS studies across eight plant species, evaluating the impact of methodological choices on outcomes using maize and soybean datasets. Large sample sizes and synchronized sample collection for gene expression measurement appear to significantly increase power for discovering gene-phenotype linkages, while matching tissue, stage, and environment may matter much less than previously believed, making it feasible to reuse large and well-collected expression datasets across multiple studies. The development of statistical approaches and computational tools specifically optimized for plant TWAS data will ultimately be needed, but further potential remains to adapt advances developed in GWAS to TWAS contexts.
Collapse
Affiliation(s)
- J Vladimir Torres-Rodríguez
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Delin Li
- Xianghu Laboratory, Hangzhou, 311231, China
| | - James C Schnable
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
6
|
Palande S, Arsenault J, Basurto‐Lozada P, Bleich A, Brown BNI, Buysse SF, Connors NA, Das Adhikari S, Dobson KC, Guerra‐Castillo FX, Guerrero‐Carrillo MF, Harlow S, Herrera‐Orozco H, Hightower AT, Izquierdo P, Jacobs M, Johnson NA, Leuenberger W, Lopez‐Hernandez A, Luckie‐Duque A, Martínez‐Avila C, Mendoza‐Galindo EJ, Plancarte DC, Schuster JM, Shomer H, Sitar SC, Steensma AK, Thomson JE, Villaseñor‐Amador D, Waterman R, Webster BM, Whyte M, Zorilla‐Azcué S, Montgomery BL, Husbands AY, Krishnan A, Percival S, Munch E, VanBuren R, Chitwood DH, Rougon‐Cardoso A. Expression-based machine learning models for predicting plant tissue identity. APPLICATIONS IN PLANT SCIENCES 2025; 13:e11621. [PMID: 39906497 PMCID: PMC11788907 DOI: 10.1002/aps3.11621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 06/25/2024] [Accepted: 06/28/2024] [Indexed: 02/06/2025]
Abstract
Premise The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural- or ecological-based model species were rejected, in favor of building knowledge in a species that would facilitate genome-enabled research. Methods Here, we examine the ability of models based on Arabidopsis gene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested on Arabidopsis data achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained on Arabidopsis data, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. Results The identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from Arabidopsis. k-nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. Discussion Our data-driven results highlight that the assertion that knowledge from Arabidopsis is translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis on Arabidopsis and prioritize plant diversity.
Collapse
Affiliation(s)
- Sourabh Palande
- Department of Computational Mathematics, Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
| | - Jeremy Arsenault
- Department of Computer Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
| | - Patricia Basurto‐Lozada
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH)Universidad Nacional Autónoma de MéxicoJuriquillaQuerétaroMexico
| | - Andrew Bleich
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
| | | | - Sophia F. Buysse
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
- Kellogg Biological StationMichigan State UniversityEast LansingMichiganUSA
| | - Noelle A. Connors
- Department of HorticultureMichigan State UniversityEast LansingMichiganUSA
| | - Sikta Das Adhikari
- Department of Computational Mathematics, Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
- Department of Statistics and ProbabilityMichigan State UniversityEast LansingMichiganUSA
| | - Kara C. Dobson
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
- Department of Integrative BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Francisco Xavier Guerra‐Castillo
- Unidad de Investigación Médica en Inmunología e InfectologíaInstituto Mexicano del Seguro SocialCiudad de MéxicoMexico
- Programa de Posgrado en Ciencias Biológicas, Facultad de MedicinaUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Maria F. Guerrero‐Carrillo
- Laboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad LeónUniversidad Nacional Autónoma de MéxicoLeónGuanajuatoMexico
| | - Sophia Harlow
- Department of HorticultureMichigan State UniversityEast LansingMichiganUSA
| | - Héctor Herrera‐Orozco
- Posgrado en Ciencias BiológicasUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
- Laboratorio de Ecología Evolutiva y Conservación de Anfibios y Reptiles, Facultad de Estudios Superiores IztacalaUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Asia T. Hightower
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
| | - Paulo Izquierdo
- Department of Plant, Soil, and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
| | - MacKenzie Jacobs
- Department of Biochemistry and Molecular BiologyMichigan State UniversityEast LansingMichiganUSA
- Molecular Plant Sciences ProgramMichigan State UniversityEast LansingMichiganUSA
| | - Nicholas A. Johnson
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
- Genetics and Genome SciencesMichigan State UniversityEast LansingMichiganUSA
| | - Wendy Leuenberger
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
- Department of Integrative BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Alessandro Lopez‐Hernandez
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH)Universidad Nacional Autónoma de MéxicoJuriquillaQuerétaroMexico
- Computational Population Genetics GroupUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Alicia Luckie‐Duque
- Laboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad LeónUniversidad Nacional Autónoma de MéxicoLeónGuanajuatoMexico
| | - Camila Martínez‐Avila
- Colección Nacional de Aves, Posgrado en Ciencias Biológicas, Instituto de BiologíaUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Eddy J. Mendoza‐Galindo
- Laboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad LeónUniversidad Nacional Autónoma de MéxicoLeónGuanajuatoMexico
| | - David Cruz Plancarte
- Departamento de Botánica, Posgrado en Ciencias Biológicas, Instituto de BiologíaUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Jenny M. Schuster
- Molecular Plant Sciences ProgramMichigan State UniversityEast LansingMichiganUSA
- Cell and Molecular BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Harry Shomer
- Department of Computer Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
| | - Sidney C. Sitar
- Department of Plant, Soil, and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
- Plant Breeding, Genetics, and BiotechnologyMichigan State UniversityEast LansingMichiganUSA
- Crop and Soil Sciences ProgramMichigan State UniversityEast LansingMichiganUSA
| | - Anne K. Steensma
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Molecular Plant Sciences ProgramMichigan State UniversityEast LansingMichiganUSA
- MSU‐DOE Plant Research LaboratoryMichigan State UniversityEast LansingMichiganUSA
| | - Joanne Elise Thomson
- Molecular Plant Sciences ProgramMichigan State UniversityEast LansingMichiganUSA
- Cell and Molecular BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Damián Villaseñor‐Amador
- Programa de Posgrado en Ciencias Biológicas, Facultad de CienciasUniversidad Nacional Autónoma de MéxicoCiudad de MéxicoMexico
| | - Robin Waterman
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Ecology, Evolution, and Behavior ProgramMichigan State UniversityEast LansingMichiganUSA
- Kellogg Biological StationMichigan State UniversityEast LansingMichiganUSA
| | - Brandon M. Webster
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Madison Whyte
- Department of Plant, Soil, and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
| | - Sofía Zorilla‐Azcué
- Programa de Posgrado en Ciencias Biológicas, Escuela Nacional de Estudios Superiores (ENES)Unidad Morelia, Universidad Nacional Autónoma de MéxicoMoreliaMichoacánMexico
| | | | - Aman Y. Husbands
- Department of BiologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arjun Krishnan
- Department of Biomedical Informatics, Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Sarah Percival
- Department of Computational Mathematics, Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
| | - Elizabeth Munch
- Department of Computational Mathematics, Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
- Department of MathematicsMichigan State UniversityEast LansingMichiganUSA
| | - Robert VanBuren
- Department of HorticultureMichigan State UniversityEast LansingMichiganUSA
- Plant Resilience InstituteMichigan State UniversityEast LansingMichiganUSA
| | - Daniel H. Chitwood
- Department of Computational Mathematics, Science and EngineeringMichigan State UniversityEast LansingMichiganUSA
- Department of HorticultureMichigan State UniversityEast LansingMichiganUSA
| | - Alejandra Rougon‐Cardoso
- Laboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad LeónUniversidad Nacional Autónoma de MéxicoLeónGuanajuatoMexico
- Plantecc National LaboratoryENES‐LeónLeónGuanajuatoMexico
| |
Collapse
|
7
|
Wu B, Xiong H, Zhuo L, Xiao Y, Yan J, Yang W. Multi-view BLUP: a promising solution for post-omics data integrative prediction. J Genet Genomics 2024:S1673-8527(24)00332-1. [PMID: 39645028 DOI: 10.1016/j.jgg.2024.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 12/09/2024]
Abstract
Phenotypic prediction is a promising strategy for accelerating plant breeding. Data from multiple sources (called multi-view data) can provide complementary information to characterize a biological object from various aspects. By integrating multi-view information into phenotypic prediction, a multi-view best linear unbiased prediction (MVBLUP) method is proposed in this paper. To measure the importance of multiple data views, the differential evolution algorithm with an early stopping mechanism is used, by which we obtain a multi-view kinship matrix and then incorporate it into the BLUP model for phenotypic prediction. To further illustrate the characteristics of MVBLUP, we perform the empirical experiments on four multi-view datasets in different crops. Compared to the single-view method, the prediction accuracy of the MVBLUP method has improved by 0.038-0.201 on average. The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.
Collapse
Affiliation(s)
- Bingjie Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Huijuan Xiong
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Lin Zhuo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Wenyu Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| |
Collapse
|
8
|
Xu F, Che Z, Qiao J, Han P, Miao N, Dai X, Fu Y, Li X, Zhu M. Integrating Gene Expression Data into Single-Step Method (ssBLUP) Improves Genomic Prediction Accuracy for Complex Traits of Duroc × Erhualian F 2 Pig Population. Curr Issues Mol Biol 2024; 46:13713-13724. [PMID: 39727947 PMCID: PMC11727526 DOI: 10.3390/cimb46120819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 11/15/2024] [Accepted: 11/20/2024] [Indexed: 12/28/2024] Open
Abstract
The development of multi-omics has increased the likelihood of further improving genomic prediction (GP) of complex traits. Gene expression data can directly reflect the genotype effect, and thus, they are widely used for GP. Generally, the gene expression data are integrated into multiple random effect models as independent data layers or used to replace genotype data for genomic prediction. In this study, we integrated pedigree, genotype, and gene expression data into the single-step method and investigated the effects of this integration on prediction accuracy. The integrated single-step method improved the genomic prediction accuracy of more than 90% of the 54 traits in the Duroc × Erhualian F2 pig population dataset. On average, the prediction accuracy of the single-step method integrating gene expression data was 20.6% and 11.8% higher than that of the pedigree-based best linear unbiased prediction (ABLUP) and genome-based best linear unbiased prediction (GBLUP) when the weighting factor (w) was set as 0, and it was 5.3% higher than that of the single-step best linear unbiased prediction (ssBLUP) under different w values. Overall, the analyses confirmed that the integration of gene expression data into a single-step method could effectively improve genomic prediction accuracy. Our findings enrich the application of multi-omics data to genomic prediction and provide a valuable reference for integrating multi-omics data into the genomic prediction model.
Collapse
Affiliation(s)
- Fangjun Xu
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Zhaoxuan Che
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Jiakun Qiao
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Pingping Han
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Na Miao
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Xiangyu Dai
- Key Lab of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; (F.X.)
| | - Yuhua Fu
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyun Li
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, China
| | - Mengjin Zhu
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, China
- College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
9
|
Fan Y, Waldmann P. Tabular deep learning: a comparative study applied to multi-task genome-wide prediction. BMC Bioinformatics 2024; 25:322. [PMID: 39367318 PMCID: PMC11452967 DOI: 10.1186/s12859-024-05940-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 09/19/2024] [Indexed: 10/06/2024] Open
Abstract
PURPOSE More accurate prediction of phenotype traits can increase the success of genomic selection in both plant and animal breeding studies and provide more reliable disease risk prediction in humans. Traditional approaches typically use regression models based on linear assumptions between the genetic markers and the traits of interest. Non-linear models have been considered as an alternative tool for modeling genomic interactions (i.e. non-additive effects) and other subtle non-linear patterns between markers and phenotype. Deep learning has become a state-of-the-art non-linear prediction method for sound, image and language data. However, genomic data is better represented in a tabular format. The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports successful results on various datasets. Tabular deep learning applications in genome-wide prediction (GWP) are still rare. In this work, we perform an overview of the main families of recent deep learning architectures for tabular data and apply them to multi-trait regression and multi-class classification for GWP on real gene datasets. METHODS The study involves an extensive overview of recent deep learning architectures for tabular data learning: NODE, TabNet, TabR, TabTransformer, FT-Transformer, AutoInt, GANDALF, SAINT and LassoNet. These architectures are applied to multi-trait GWP. Comprehensive benchmarks of various tabular deep learning methods are conducted to identify best practices and determine their effectiveness compared to traditional methods. RESULTS Extensive experimental results on several genomic datasets (three for multi-trait regression and two for multi-class classification) highlight LassoNet as a standout performer, surpassing both other tabular deep learning models and the highly efficient tree based LightGBM method in terms of both best prediction accuracy and computing efficiency. CONCLUSION Through series of evaluations on real-world genomic datasets, the study identifies LassoNet as a standout performer, surpassing decision tree methods like LightGBM and other tabular deep learning architectures in terms of both predictive accuracy and computing efficiency. Moreover, the inherent variable selection property of LassoNet provides a systematic way to find important genetic markers that contribute to phenotype expression.
Collapse
Affiliation(s)
- Yuhua Fan
- Research Unit of Mathematical Sciences, University of Oulu, P.O. Box 8000, 90014, Univesity of Oulu, Finland
| | - Patrik Waldmann
- Research Unit of Mathematical Sciences, University of Oulu, P.O. Box 8000, 90014, Univesity of Oulu, Finland.
| |
Collapse
|
10
|
Tanaka R, Kawai T, Kawakatsu T, Tanaka N, Shenton M, Yabe S, Uga Y. Transcriptome-based prediction for polygenic traits in rice using different gene subsets. BMC Genomics 2024; 25:915. [PMID: 39354337 PMCID: PMC11443665 DOI: 10.1186/s12864-024-10803-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 09/13/2024] [Indexed: 10/03/2024] Open
Abstract
BACKGROUND Transcriptome-based prediction of complex phenotypes is a relatively new statistical method that links genetic variation to phenotypic variation. The selection of large-effect genes based on a priori biological knowledge is beneficial for predicting oligogenic traits; however, such a simple gene selection method is not applicable to polygenic traits because causal genes or large-effect loci are often unknown. Here, we used several gene-level features and tested whether it was possible to select a gene subset that resulted in better predictive ability than using all genes for predicting a polygenic trait. RESULTS Using the phenotypic values of shoot and root traits and transcript abundances in leaves and roots of 57 rice accessions, we evaluated the predictive abilities of the transcriptome-based prediction models. Leaf transcripts predicted shoot phenotypes, such as plant height, more accurately than root transcripts, whereas root transcripts predicted root phenotypes, such as crown root length, more accurately than leaf transcripts. Furthermore, we used the following three features to train the prediction model: (1) tissue specificity of the transcripts, (2) ontology annotations, and (3) co-expression modules for selecting gene subsets. Although models trained by a gene subset often resulted in lower predictive abilities than the model trained by all genes, some gene subsets showed improved predictive ability. For example, using genes expressed in roots but not in leaves, the predictive ability for crown root diameter was improved by more than 10% (R2 = 0.59 when using all genes; R2 = 0.66, using 1,554 root-specifically expressed genes). Similarly, genes annotated as "gibberellic acid sensitivity" showed higher predictive ability than using all genes for root dry weight. CONCLUSIONS Our results highlight both the possibility and difficulty of selecting an appropriate gene subset to predict polygenic traits from transcript abundance, given the current biological knowledge and information. Further integration of multiple sources of information, as well as improvements in gene characterization, may enable the selection of an optimal gene set for the prediction of polygenic phenotypes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan.
| | - Tsubasa Kawai
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan
| | - Taiji Kawakatsu
- Institute of Agrobiological Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8604, Japan
| | - Nobuhiro Tanaka
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan
| | - Matthew Shenton
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan
| | - Shiori Yabe
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan
| | - Yusaku Uga
- Institute of Crop Sciences, National Agriculture & Food Research Organization, Tsukuba, Ibaraki, 305-8518, Japan
| |
Collapse
|
11
|
Khalilisamani N, Li Z, Pettolino FA, Moncuquet P, Reverter A, MacMillan CP. Leveraging transcriptomics-based approaches to enhance genomic prediction: integrating SNPs and gene networks for cotton fibre quality improvement. FRONTIERS IN PLANT SCIENCE 2024; 15:1420837. [PMID: 39372856 PMCID: PMC11450228 DOI: 10.3389/fpls.2024.1420837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 08/19/2024] [Indexed: 10/08/2024]
Abstract
Cultivated cotton plants are the world's largest source of natural fibre, where yield and quality are key traits for this renewable and biodegradable commodity. The Gossypium hirsutum cotton genome contains ~80K protein-coding genes, making precision breeding of complex traits a challenge. This study tested approaches to improving the genomic prediction (GP) accuracy of valuable cotton fibre traits to help accelerate precision breeding. With a biology-informed basis, a novel approach was tested for improving GP for key cotton fibre traits with transcriptomics of key time points during fibre development, namely, fibre cells undergoing primary, transition, and secondary wall development. Three test approaches included weighting of SNPs in DE genes overall, in target DE gene lists informed by gene annotation, and in a novel approach of gene co-expression network (GCN) clusters created with partial correlation and information theory (PCIT) as the prior information in GP models. The GCN clusters were nucleated with known genes for fibre biomechanics, i.e., fasciclin-like arabinogalactan proteins, and cluster size effects were evaluated. The most promising improvements in GP accuracy were achieved by using GCN clusters for cotton fibre elongation by 4.6%, and strength by 4.7%, where cluster sizes of two and three neighbours proved most effective. Furthermore, the improvements in GP were due to only a small number of SNPs, in the order of 30 per trait using the GCN cluster approach. Non-trait-specific biological time points, and genes, were found to have neutral effects, or even reduced GP accuracy for certain traits. As the GCN clusters were generated based on known genes for fibre biomechanics, additional candidate genes were identified for fibre elongation and strength. These results demonstrate that GCN clusters make a specific and unique contribution in improving the GP of cotton fibre traits. The findings also indicate that there is room for incorporating biology-based GCNs into GP models of genomic selection pipelines for cotton breeding to help improve precision breeding of target traits. The PCIT-GCN cluster approach may also hold potential application in other crops and trees for enhancing breeding of complex traits.
Collapse
Affiliation(s)
- Nima Khalilisamani
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | - Zitong Li
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | | | - Philippe Moncuquet
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | - Antonio Reverter
- Livestock and Aquatic Genomics, Agriculture and Food, CSIRO, St Lucia, QLD, Australia
| | | |
Collapse
|
12
|
Jiang Y, Guo S, Wang D, Tu L, Liu P, Guo X, Wang A, Zhu Y, Lu X, Chen Z, Wu X. Integrated GWAS, linkage, and transcriptome analysis to identify genetic loci and candidate genes for photoperiod sensitivity in maize. FRONTIERS IN PLANT SCIENCE 2024; 15:1441288. [PMID: 39351024 PMCID: PMC11440433 DOI: 10.3389/fpls.2024.1441288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/12/2024] [Indexed: 10/04/2024]
Abstract
Introduction Maize photosensitivity and the control of flowering not only are important for reproduction, but also play pivotal roles in the processes of domestication and environmental adaptation, especially involving the utilization strategy of tropical maize in high-latitude regions. Methods In this study, we used a linkage mapping population and an inbred association panel with the photoperiod sensitivity index (PSI) phenotyped under different environments and performed transcriptome analysis of T32 and QR273 between long-day and short-day conditions. Results The results showed that PSIs of days to tasseling (DTT), days to pollen shedding (DTP), and days to silking (DTS) indicated efficacious interactions with photoperiod sensitivity for maize latitude adaptation. A total of 48 quantitative trait loci (QTLs) and 252 quantitative trait nucleotides (QTNs) were detected using the linkage population and the inbred association panel. Thirteen candidate genes were identified by combining the genome-wide association study (GWAS) approach, linkage analysis, and transcriptome analysis, wherein five critical candidate genes, MYB163, bif1, burp8, CADR3, and Zm00001d050238, were significantly associated with photoperiod sensitivity. Discussion These results would provide much more abundant theoretical proofs to reveal the genetic basis of photoperiod sensitivity, which would be helpful to understand the genetic changes during domestication and improvement and contribute to reducing the barriers to use of tropical germplasm.
Collapse
Affiliation(s)
- Yulin Jiang
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
- Ministry of Agriculture and Rural Affairs Key Laboratory of Crop Genetic Resources and Germplasm Innovation in Karst Region, Guiyang, China
| | - Shuang Guo
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
- College of Agriculture, Guizhou University, Guiyang, China
| | - Dong Wang
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
- College of Agriculture, Guizhou University, Guiyang, China
| | - Liang Tu
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Pengfei Liu
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Xiangyang Guo
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Angui Wang
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Yunfang Zhu
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Xuefeng Lu
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
- Ministry of Agriculture and Rural Affairs Key Laboratory of Crop Genetic Resources and Germplasm Innovation in Karst Region, Guiyang, China
| | - Zehui Chen
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
| | - Xun Wu
- Institute of Upland Food Crops, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou, China
- Ministry of Agriculture and Rural Affairs Key Laboratory of Crop Genetic Resources and Germplasm Innovation in Karst Region, Guiyang, China
| |
Collapse
|
13
|
Yanarella CF, Fattel L, Lawrence-Dill CJ. Genome-wide association studies from spoken phenotypic descriptions: a proof of concept from maize field studies. G3 (BETHESDA, MD.) 2024; 14:jkae161. [PMID: 39099140 PMCID: PMC11373645 DOI: 10.1093/g3journal/jkae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Accepted: 06/23/2024] [Indexed: 08/06/2024]
Abstract
We present a novel approach to genome-wide association studies (GWAS) by leveraging unstructured, spoken phenotypic descriptions to identify genomic regions associated with maize traits. Utilizing the Wisconsin Diversity panel, we collected spoken descriptions of Zea mays ssp. mays traits, converting these qualitative observations into quantitative data amenable to GWAS analysis. First, we determined that visually striking phenotypes could be detected from unstructured spoken phenotypic descriptions. Next, we developed two methods to process the same descriptions to derive the trait plant height, a well-characterized phenotypic feature in maize: (1) a semantic similarity metric that assigns a score based on the resemblance of each observation to the concept of 'tallness' and (2) a manual scoring system that categorizes and assigns values to phrases related to plant height. Our analysis successfully corroborated known genomic associations and uncovered novel candidate genes potentially linked to plant height. Some of these genes are associated with gene ontology terms that suggest a plausible involvement in determining plant stature. This proof-of-concept demonstrates the viability of spoken phenotypic descriptions in GWAS and introduces a scalable framework for incorporating unstructured language data into genetic association studies. This methodology has the potential not only to enrich the phenotypic data used in GWAS and to enhance the discovery of genetic elements linked to complex traits but also to expand the repertoire of phenotype data collection methods available for use in the field environment.
Collapse
Affiliation(s)
- Colleen F Yanarella
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Leila Fattel
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- Interdepartmental Genetics and Genomics Program, Iowa State University, Ames, IA 50011, USA
| | - Carolyn J Lawrence-Dill
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Interdepartmental Genetics and Genomics Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- College of Agriculture and Life Sciences, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
14
|
Wang P, Lehti-Shiu MD, Lotreck S, Segura Abá K, Krysan PJ, Shiu SH. Prediction of plant complex traits via integration of multi-omics data. Nat Commun 2024; 15:6856. [PMID: 39127735 PMCID: PMC11316822 DOI: 10.1038/s41467-024-50701-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 07/18/2024] [Indexed: 08/12/2024] Open
Abstract
The formation of complex traits is the consequence of genotype and activities at multiple molecular levels. However, connecting genotypes and these activities to complex traits remains challenging. Here, we investigate whether integrating genomic, transcriptomic, and methylomic data can improve prediction for six Arabidopsis traits. We find that transcriptome- and methylome-based models have performances comparable to those of genome-based models. However, models built for flowering time using different omics data identify different benchmark genes. Nine additional genes identified as important for flowering time from our models are experimentally validated as regulating flowering. Gene contributions to flowering time prediction are accession-dependent and distinct genes contribute to trait prediction in different genotypes. Models integrating multi-omics data perform best and reveal known and additional gene interactions, extending knowledge about existing regulatory networks underlying flowering time determination. These results demonstrate the feasibility of revealing molecular mechanisms underlying complex traits through multi-omics data integration.
Collapse
Affiliation(s)
- Peipei Wang
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, USA.
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA.
| | | | - Serena Lotreck
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, USA
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI, USA
| | - Patrick J Krysan
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, USA.
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA.
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, USA.
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
15
|
Arango NK, Morgante F. Comparing statistical learning methods for complex trait prediction from gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596951. [PMID: 38895364 PMCID: PMC11185554 DOI: 10.1101/2024.06.01.596951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Accurate prediction of complex traits is an important task in quantitative genetics that has become increasingly relevant for personalized medicine. Genotypes have traditionally been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regressions, dimension reductions, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were used in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effect sizes - ranging from models that assume that every gene affects the trait to more sparse models - and their ability to capture gene-gene interactions. We also used functional annotation (i.e., Gene Ontology (GO)) as an external source of biological information to inform prediction models. The results show that differences in prediction accuracy between methods exist, although they are generally not large. Methods performing variable selection gave higher accuracy in females while methods assuming a more polygenic architecture performed better in males. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms with different genes emerging between sexes. Notably, the Insulin-like Receptor (InR) was prevalent across methods and sexes. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.
Collapse
Affiliation(s)
- Noah Klimkowski Arango
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| |
Collapse
|
16
|
Wu TY, Li YR, Chang KJ, Fang JC, Urano D, Liu MJ. Modeling alternative translation initiation sites in plants reveals evolutionarily conserved cis-regulatory codes in eukaryotes. Genome Res 2024; 34:272-285. [PMID: 38479836 PMCID: PMC10984385 DOI: 10.1101/gr.278100.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]
Abstract
mRNA translation relies on identifying translation initiation sites (TISs) in mRNAs. Alternative TISs are prevalent across plant transcriptomes, but the mechanisms for their recognition are unclear. Using ribosome profiling and machine learning, we developed models for predicting alternative TISs in the tomato (Solanum lycopersicum). Distinct feature sets were predictive of AUG and nonAUG TISs in 5' untranslated regions and coding sequences, including a novel CU-rich sequence that promoted plant TIS activity, a translational enhancer found across dicots and monocots, and humans and viruses. Our results elucidate the mechanistic and evolutionary basis of TIS recognition, whereby cis-regulatory RNA signatures affect start site selection. The TIS prediction model provides global estimates of TISs to discover neglected protein-coding genes across plant genomes. The prevalence of cis-regulatory signatures across plant species, humans, and viruses suggests their broad and critical roles in reprogramming the translational landscape.
Collapse
Affiliation(s)
- Ting-Ying Wu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei 11529, Taiwan;
| | - Ya-Ru Li
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Kai-Jyun Chang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
| | - Jhen-Cheng Fang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Daisuke Urano
- Temasek Life Sciences Laboratory, Singapore 117604, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
| | - Ming-Jung Liu
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan;
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
17
|
Guo Z, Wang S, Zhang F, Xiang D, Yang J, Li D, Bai B, Dai M, Luo J, Xiong L. Common and specific genetic basis of metabolite-mediated drought responses in rice. STRESS BIOLOGY 2024; 4:6. [PMID: 38253937 PMCID: PMC10803723 DOI: 10.1007/s44154-024-00150-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024]
Abstract
Plants orchestrate drought responses at metabolic level but the genetic basis remains elusive in rice. In this study, 233 drought-responsive metabolites (DRMs) were quantified in a large rice population comprised of 510 diverse accessions at the reproductive stage. Large metabolic variations in drought responses were detected, and little correlation of metabolic levels between drought and normal conditions were observed. Interestingly, most of these DRMs could predict drought resistance in high accuracy. Genome-wide association study revealed 2522 significant association signals for 233 DRMs, and 98% (2471/2522) of the signals were co-localized with the association loci for drought-related phenotypic traits in the same population or the linkage-mapped QTLs for drought resistance in other populations. Totally, 10 candidate genes were efficiently identified for nine DRMs, seven of which harbored cis-eQTLs under drought condition. Based on comparative GWAS of common DRMs in rice and maize, representing irrigated and upland crops, we have identified three pairs of homologous genes associated with three DRMs between the two crops. Among the homologous genes, a transferase gene responsible for metabolic variation of N-feruloylputrescine was confirmed to confer enhanced drought resistance in rice. Our study provides not only genetic architecture of metabolic responses to drought stress in rice but also metabolic data resources to reveal the common and specific metabolite-mediated drought responses in different crops.
Collapse
Affiliation(s)
- Zilong Guo
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Shouchuang Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China
| | - Feng Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Denghao Xiang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jun Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China
| | - Dong Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Baowei Bai
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mingqiu Dai
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jie Luo
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China.
| | - Lizhong Xiong
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
18
|
Martins FB, Aono AH, Moraes ADCL, Ferreira RCU, Vilela MDM, Pessoa-Filho M, Rodrigues-Motta M, Simeão RM, de Souza AP. Genome-wide family prediction unveils molecular mechanisms underlying the regulation of agronomic traits in Urochloa ruziziensis. FRONTIERS IN PLANT SCIENCE 2023; 14:1303417. [PMID: 38148869 PMCID: PMC10749977 DOI: 10.3389/fpls.2023.1303417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
Tropical forage grasses, particularly those belonging to the Urochloa genus, play a crucial role in cattle production and serve as the main food source for animals in tropical and subtropical regions. The majority of these species are apomictic and tetraploid, highlighting the significance of U. ruziziensis, a sexual diploid species that can be tetraploidized for use in interspecific crosses with apomictic species. As a means to support breeding programs, our study investigates the feasibility of genome-wide family prediction in U. ruziziensis families to predict agronomic traits. Fifty half-sibling families were assessed for green matter yield, dry matter yield, regrowth capacity, leaf dry matter, and stem dry matter across different clippings established in contrasting seasons with varying available water capacity. Genotyping was performed using a genotyping-by-sequencing approach based on DNA samples from family pools. In addition to conventional genomic prediction methods, machine learning and feature selection algorithms were employed to reduce the necessary number of markers for prediction and enhance predictive accuracy across phenotypes. To explore the regulation of agronomic traits, our study evaluated the significance of selected markers for prediction using a tree-based approach, potentially linking these regions to quantitative trait loci (QTLs). In a multiomic approach, genes from the species transcriptome were mapped and correlated to those markers. A gene coexpression network was modeled with gene expression estimates from a diverse set of U. ruziziensis genotypes, enabling a comprehensive investigation of molecular mechanisms associated with these regions. The heritabilities of the evaluated traits ranged from 0.44 to 0.92. A total of 28,106 filtered SNPs were used to predict phenotypic measurements, achieving a mean predictive ability of 0.762. By employing feature selection techniques, we could reduce the dimensionality of SNP datasets, revealing potential genotype-phenotype associations. The functional annotation of genes near these markers revealed associations with auxin transport and biosynthesis of lignin, flavonol, and folic acid. Further exploration with the gene coexpression network uncovered associations with DNA metabolism, stress response, and circadian rhythm. These genes and regions represent important targets for expanding our understanding of the metabolic regulation of agronomic traits and offer valuable insights applicable to species breeding. Our work represents an innovative contribution to molecular breeding techniques for tropical forages, presenting a viable marker-assisted breeding approach and identifying target regions for future molecular studies on these agronomic traits.
Collapse
Affiliation(s)
- Felipe Bitencourt Martins
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Aline da Costa Lima Moraes
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | | | | | - Marco Pessoa-Filho
- Embrapa Cerrados, Brazilian Agricultural Research Corporation, Brasília, Brazil
| | | | - Rosangela Maria Simeão
- Embrapa Gado de Corte, Brazilian Agricultural Research Corporation, Campo Grande, Mato Grosso, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| |
Collapse
|
19
|
Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023; 16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
With an essential role in human health, tocochromanols are mostly obtained by consuming seed oils; however, the vitamin E content of the most abundant tocochromanols in maize (Zea mays L.) grain is low. Several large-effect genes with cis-acting variants affecting messenger RNA (mRNA) expression are mostly responsible for tocochromanol variation in maize grain, with other relevant associated quantitative trait loci (QTL) yet to be fully resolved. Leveraging existing genomic and transcriptomic information for maize inbreds could improve prediction when selecting for higher vitamin E content. Here, we first evaluated a multikernel genomic best linear unbiased prediction (MK-GBLUP) approach for modeling known QTL in the prediction of nine tocochromanol grain phenotypes (12-21 QTL per trait) within and between two panels of 1,462 and 242 maize inbred lines. On average, MK-GBLUP models improved predictive abilities by 7.0-13.6% when compared with GBLUP. In a second approach with a subset of 545 lines from the larger panel, the highest average improvement in predictive ability relative to GBLUP was achieved with a multi-trait GBLUP model (15.4%) that had a tocochromanol phenotype and transcript abundances in developing grain for a few large-effect candidate causal genes (1-3 genes per trait) as multiple response variables. Taken together, our study illustrates the enhancement of prediction models when informed by existing biological knowledge pertaining to QTL and candidate causal genes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | | | - Joshua C Wood
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | | | - Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - John P Hamilton
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Brieanne Vaillancourt
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Xianran Li
- USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
| | - Nicholas T Deason
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | | | - C Robin Buell
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
20
|
Palande S, Kaste JAM, Roberts MD, Segura Abá K, Claucherty C, Dacon J, Doko R, Jayakody TB, Jeffery HR, Kelly N, Manousidaki A, Parks HM, Roggenkamp EM, Schumacher AM, Yang J, Percival S, Pardo J, Husbands AY, Krishnan A, Montgomery BL, Munch E, Thompson AM, Rougon-Cardoso A, Chitwood DH, VanBuren R. Topological data analysis reveals a core gene expression backbone that defines form and function across flowering plants. PLoS Biol 2023; 21:e3002397. [PMID: 38051702 PMCID: PMC10723737 DOI: 10.1371/journal.pbio.3002397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 12/15/2023] [Accepted: 10/20/2023] [Indexed: 12/07/2023] Open
Abstract
Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.
Collapse
Affiliation(s)
- Sourabh Palande
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Joshua A. M. Kaste
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Miles D. Roberts
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Kenia Segura Abá
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Carly Claucherty
- Department of Plant, Soil & Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Jamell Dacon
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Rei Doko
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Thilani B. Jayakody
- Department of Plant, Soil & Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Hannah R. Jeffery
- Department of Plant, Soil & Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Nathan Kelly
- Department of Horticulture, Michigan State University, East Lansing, Michigan, United States of America
| | - Andriana Manousidaki
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America
| | - Hannah M. Parks
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Emily M. Roggenkamp
- Department of Plant, Soil & Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - Ally M. Schumacher
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Jiaxin Yang
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Sarah Percival
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Jeremy Pardo
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Aman Y. Husbands
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Arjun Krishnan
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Beronda L Montgomery
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Microbiology & Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, Michigan, United States of America
| | - Elizabeth Munch
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, United States of America
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
| | - Addie M. Thompson
- Department of Plant, Soil & Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan, United States of America
| | - Alejandra Rougon-Cardoso
- Laboratory of Agrigenomic Sciences, Universidad Nacional Autónoma de México, ENES-León, León, Mexico
- Laboratorio Nacional Plantecc, ENES-León, León, Mexico
| | - Daniel H. Chitwood
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, United States of America
- Department of Horticulture, Michigan State University, East Lansing, Michigan, United States of America
| | - Robert VanBuren
- Department of Horticulture, Michigan State University, East Lansing, Michigan, United States of America
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
21
|
Bonnot T, Somayanda I, Jagadish SVK, Nagel DH. Time of day and genotype sensitivity adjust molecular responses to temperature stress in sorghum. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:1081-1096. [PMID: 37715988 DOI: 10.1111/tpj.16467] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 09/18/2023]
Abstract
Sorghum is one of the four major C4 crops that are considered to be tolerant to environmental extremes. Sorghum shows distinct growth responses to temperature stress depending on the sensitivity of the genetic background. About half of the transcripts in sorghum exhibit diurnal rhythmic expressions emphasizing significant coordination with the environment. However, an understanding of how molecular dynamics contribute to genotype-specific stress responses in the context of the time of day is not known. We examined whether temperature stress and the time of day impact the gene expression dynamics in thermo-sensitive and thermo-tolerant sorghum genotypes. We found that time of day is highly influencing the temperature stress responses, which can be explained by the rhythmic expression of most thermo-responsive genes. This effect is more pronounced in thermo-tolerant genotypes, suggesting a stronger regulation of gene expression by the time of day and/or by the circadian clock. Genotypic differences were mostly observed on average gene expression levels, which may be responsible for contrasting sensitivities to temperature stress in tolerant versus susceptible sorghum varieties. We also identified groups of genes altered by temperature stress in a time-of-day and genotype-specific manner. These include transcriptional regulators and several members of the Ca2+ -binding EF-hand protein family. We hypothesize that expression variation of these genes between genotypes along with time-of-day independent regulation may contribute to genotype-specific fine-tuning of thermo-responsive pathways. These findings offer a new opportunity to selectively target specific genes in efforts to develop climate-resilient crops based on their time-of-day and genotype variation responses to temperature stress.
Collapse
Affiliation(s)
- Titouan Bonnot
- Department of Botany and Plant Sciences, University of California, Riverside, Riverside, California, 92507, USA
| | - Impa Somayanda
- Department of Plant and Soil Science, Texas Tech University, Lubbock, Texas, 79409-2122, USA
| | - S V Krishna Jagadish
- Department of Plant and Soil Science, Texas Tech University, Lubbock, Texas, 79409-2122, USA
| | - Dawn H Nagel
- Department of Botany and Plant Sciences, University of California, Riverside, Riverside, California, 92507, USA
| |
Collapse
|
22
|
Della Coletta R, Fernandes SB, Monnahan PJ, Mikel MA, Bohn MO, Lipka AE, Hirsch CN. Importance of genetic architecture in marker selection decisions for genomic prediction. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:220. [PMID: 37819415 DOI: 10.1007/s00122-023-04469-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/25/2023] [Indexed: 10/13/2023]
Abstract
KEY MESSAGE We demonstrate potential for improved multi-environment genomic prediction accuracy using structural variant markers. However, the degree of observed improvement is highly dependent on the genetic architecture of the trait. Breeders commonly use genetic markers to predict the performance of untested individuals as a way to improve the efficiency of breeding programs. These genomic prediction models have almost exclusively used single nucleotide polymorphisms (SNPs) as their source of genetic information, even though other types of markers exist, such as structural variants (SVs). Given that SVs are associated with environmental adaptation and not all of them are in linkage disequilibrium to SNPs, SVs have the potential to bring additional information to multi-environment prediction models that are not captured by SNPs alone. Here, we evaluated different marker types (SNPs and/or SVs) on prediction accuracy across a range of genetic architectures for simulated traits across multiple environments. Our results show that SVs can improve prediction accuracy, but it is highly dependent on the genetic architecture of the trait and the relative gain in accuracy is minimal. When SVs are the only causative variant type, 70% of the time SV predictors outperform SNP predictors. However, the improvement in accuracy in these instances is only 1.5% on average. Further simulations with predictors in varying degrees of LD with causative variants of different types (e.g., SNPs, SVs, SNPs and SVs) showed that prediction accuracy increased as linkage disequilibrium between causative variants and predictors increased regardless of the marker type. This study demonstrates that knowing the genetic architecture of a trait in deciding what markers to use in large-scale genomic prediction modeling in a breeding program is more important than what types of markers to use.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Samuel B Fernandes
- Department of Crop, Soil and Environmental Sciences at University of Arkansas, Fayetteville, AR, 72701, USA
| | - Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Mark A Mikel
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Martin O Bohn
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
23
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
24
|
Peng L, Li Y, Tan W, Wu S, Hao Q, Tong N, Wang Z, Liu Z, Shu Q. Combined genome-wide association studies and expression quantitative trait locus analysis uncovers a genetic regulatory network of floral organ number in a tree peony ( Paeonia suffruticosa Andrews) breeding population. HORTICULTURE RESEARCH 2023; 10:uhad110. [PMID: 37577399 PMCID: PMC10419549 DOI: 10.1093/hr/uhad110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/16/2023] [Indexed: 08/15/2023]
Abstract
Great progress has been made in our understanding of floral organ identity determination and its regulatory network in many species; however, the quantitative genetic basis of floral organ number variation is far less well understood for species-specific traits from the perspective of population variation. Here, using a tree peony (Paeonia suffruticosa Andrews, Paeoniaceae) cultivar population as a model, the phenotypic polymorphism and genetic variation based on genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) analysis were analyzed. Based on 24 phenotypic traits of 271 representative cultivars, the transcript profiles of 119 cultivars were obtained, which indicated abundant genetic variation in tree peony. In total, 86 GWAS-related cis-eQTLs and 3188 trans-eQTL gene pairs were found to be associated with the numbers of petals, stamens, and carpels. In addition, 19 floral organ number-related hub genes with 121 cis-eQTLs were obtained by weighted gene co-expression network analysis, among which five hub genes belonging to the ABCE genes of the MADS-box family and their spatial-temporal co-expression and regulatory network were constructed. These results not only help our understanding of the genetic basis of floral organ number variation during domestication, but also pave the way to studying the quantitative genetics and evolution of flower organ number and their regulatory network within populations.
Collapse
Affiliation(s)
- Liping Peng
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Yang Li
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Wanqing Tan
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shangwei Wu
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Hao
- College of Landscape Architecture and Forestry, Qingdao Agricultural University, Qingdao 266109, China
| | - Ningning Tong
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Zhanying Wang
- Peony Research Institute, Luoyang Academy of Agricultural and Forestry Sciences, Luoyang 471000, China
| | - Zheng’an Liu
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingyan Shu
- Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
25
|
De Meyer S, Cruz DF, De Swaef T, Lootens P, De Block J, Bird K, Sprenger H, Van de Voorde M, Hawinkel S, Van Hautegem T, Inzé D, Nelissen H, Roldán-Ruiz I, Maere S. Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression. PLoS Comput Biol 2023; 19:e1011161. [PMID: 37253069 PMCID: PMC10256231 DOI: 10.1371/journal.pcbi.1011161] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/09/2023] [Accepted: 05/05/2023] [Indexed: 06/01/2023] Open
Abstract
In the plant sciences, results of laboratory studies often do not translate well to the field. To help close this lab-field gap, we developed a strategy for studying the wiring of plant traits directly in the field, based on molecular profiling and phenotyping of individual plants. Here, we use this single-plant omics strategy on winter-type Brassica napus (rapeseed). We investigate to what extent early and late phenotypes of field-grown rapeseed plants can be predicted from their autumnal leaf gene expression, and find that autumnal leaf gene expression not only has substantial predictive power for autumnal leaf phenotypes but also for final yield phenotypes in spring. Many of the top predictor genes are linked to developmental processes known to occur in autumn in winter-type B. napus accessions, such as the juvenile-to-adult and vegetative-to-reproductive phase transitions, indicating that the yield potential of winter-type B. napus is influenced by autumnal development. Our results show that single-plant omics can be used to identify genes and processes influencing crop yield in the field.
Collapse
Affiliation(s)
- Sam De Meyer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Daniel Felipe Cruz
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Tom De Swaef
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Peter Lootens
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Jolien De Block
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Kevin Bird
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Heike Sprenger
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Michael Van de Voorde
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Stijn Hawinkel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Tom Van Hautegem
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Dirk Inzé
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Isabel Roldán-Ruiz
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Steven Maere
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| |
Collapse
|
26
|
Zhao W, Qadri QR, Zhang Z, Wang Z, Pan Y, Wang Q, Zhang Z. PyAGH: a python package to fast construct kinship matrices based on different levels of omic data. BMC Bioinformatics 2023; 24:153. [PMID: 37072709 PMCID: PMC10111838 DOI: 10.1186/s12859-023-05280-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 04/10/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Construction of kinship matrices among individuals is an important step for both association studies and prediction studies based on different levels of omic data. Methods for constructing kinship matrices are becoming diverse and different methods have their specific appropriate scenes. However, software that can comprehensively calculate kinship matrices for a variety of scenarios is still in an urgent demand. RESULTS In this study, we developed an efficient and user-friendly python module, PyAGH, that can accomplish (1) conventional additive kinship matrces construction based on pedigree, genotypes, abundance data from transcriptome or microbiome; (2) genomic kinship matrices construction in combined population; (3) dominant and epistatic effects kinship matrices construction; (4) pedigree selection, tracing, detection and visualization; (5) visualization of cluster, heatmap and PCA analysis based on kinship matrices. The output from PyAGH can be easily integrated in other mainstream software based on users' purposes. Compared with other softwares, PyAGH integrates multiple methods for calculating the kinship matrix and has advantages in terms of speed and data size compared to other software. PyAGH is developed in python and C + + and can be easily installed by pip tool. Installation instructions and a manual document can be freely available from https://github.com/zhaow-01/PyAGH . CONCLUSION PyAGH is a fast and user-friendly Python package for calculating kinship matrices using pedigree, genotype, microbiome and transcriptome data as well as processing, analyzing and visualizing data and results. This package makes it easier to perform predictions and association studies processes based on different levels of omic data.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Qamar Raza Qadri
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
- Hainan Research Institute, Zhejiang University, 11# Yonyou Industrial Park, Yazhou Bay Science and Technology City, Sanya, 572025, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| |
Collapse
|
27
|
Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOLECULAR PLANT 2023; 16:279-293. [PMID: 36366781 DOI: 10.1016/j.molp.2022.11.004] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants. Traditional methods typically use linear regression models with clear assumptions; such methods are unable to capture the complex relationships between genotypes and phenotypes. Non-linear models (e.g., deep neural networks) have been proposed as a superior alternative to linear models because they can capture complex non-additive effects. Here we introduce a deep learning (DL) method, deep neural network genomic prediction (DNNGP), for integration of multi-omics data in plants. We trained DNNGP on four datasets and compared its performance with methods built with five classic models: genomic best linear unbiased prediction (GBLUP); two methods based on a machine learning (ML) framework, light gradient boosting machine (LightGBM) and support vector regression (SVR); and two methods based on a DL framework, deep learning genomic selection (DeepGS) and deep learning genome-wide association study (DLGWAS). DNNGP is novel in five ways. First, it can be applied to a variety of omics data to predict phenotypes. Second, the multilayered hierarchical structure of DNNGP dynamically learns features from raw data, avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation (rectified linear unit) functions. Third, when small datasets were used, DNNGP produced results that are competitive with results from the other five methods, showing greater prediction accuracy than the other methods when large-scale breeding data were used. Fourth, the computation time required by DNNGP was comparable with that of commonly used methods, up to 10 times faster than DeepGS. Fifth, hyperparameters can easily be batch tuned on a local machine. Compared with GBLUP, LightGBM, SVR, DeepGS and DLGWAS, DNNGP is superior to these existing widely used genomic selection (GS) methods. Moreover, DNNGP can generate robust assessments from diverse datasets, including omics data, and quickly incorporate complex and large datasets into usable models, making it a promising and practical approach for straightforward integration into existing GS platforms.
Collapse
Affiliation(s)
- Kelin Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | | | - Awais Rasheed
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Department of Plant Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Huihui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China.
| |
Collapse
|
28
|
Malinowska M, Ruud AK, Jensen J, Svane SF, Smith AG, Bellucci A, Lenk I, Nagy I, Fois M, Didion T, Thorup-Kristensen K, Jensen CS, Asp T. Relative importance of genotype, gene expression, and DNA methylation on complex traits in perennial ryegrass. THE PLANT GENOME 2022; 15:e20253. [PMID: 35975565 DOI: 10.1002/tpg2.20253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
The growing demand for food and feed crops in the world because of growing population and more extreme weather events requires high-yielding and resilient crops. Many agriculturally important traits are polygenic, controlled by multiple regulatory layers, and with a strong interaction with the environment. In this study, 120 F2 families of perennial ryegrass (Lolium perenne L.) were grown across a water gradient in a semifield facility with subsoil irrigation. Genomic (single-nucleotide polymorphism [SNP]), transcriptomic (gene expression [GE]), and DNA methylomic (MET) data were integrated with feed quality trait data collected from control and drought sections in the semifield facility, providing a treatment effect. Deep root length (DRL) below 110 cm was assessed with convolutional neural network image analysis. Bayesian prediction models were used to partition phenotypic variance into its components and evaluated the proportion of phenotypic variance in all traits captured by different regulatory layers (SNP, GE, and MET). The spatial effects and effects of SNP, GE, MET, the interaction between GE and MET (GE × MET) and GE × treatment (GEControl and GEDrought ) interaction were investigated. Gene expression explained a substantial part of the genetic and spatial variance for all the investigated phenotypes, whereas MET explained residual variance not accounted for by SNPs or GE. For DRL, MET also contributed to explaining spatial variance. The study provides a statistically elegant analytical paradigm that integrates genomic, transcriptomic, and MET information to understand the regulatory mechanisms of polygenic effects for complex traits.
Collapse
Affiliation(s)
- Marta Malinowska
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Anja Karine Ruud
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Simon Fiil Svane
- Dep. of Plant and Environmental Sciences, Univ. of Copenhagen, Taastrup, Denmark
| | | | - Andrea Bellucci
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Ingo Lenk
- Research Division, DLF Seeds A/S, Store Heddinge, Denmark
| | - Istvan Nagy
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Mattia Fois
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| | - Thomas Didion
- Research Division, DLF Seeds A/S, Store Heddinge, Denmark
| | | | | | - Torben Asp
- Center for Quantitative Genetics and Genomics, Aarhus Univ., Slagelse, Denmark
| |
Collapse
|
29
|
Liang Z, Myers ZA, Petrella D, Engelhorn J, Hartwig T, Springer NM. Mapping responsive genomic elements to heat stress in a maize diversity panel. Genome Biol 2022; 23:234. [PMID: 36345007 PMCID: PMC9639295 DOI: 10.1186/s13059-022-02807-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/29/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Many plant species exhibit genetic variation for coping with environmental stress. However, there are still limited approaches to effectively uncover the genomic region that regulates distinct responsive patterns of the gene across multiple varieties within the same species under abiotic stress. RESULTS By analyzing the transcriptomes of more than 100 maize inbreds, we reveal many cis- and trans-acting eQTLs that influence the expression response to heat stress. The cis-acting eQTLs in response to heat stress are identified in genes with differential responses to heat stress between genotypes as well as genes that are only expressed under heat stress. The cis-acting variants for heat stress-responsive expression likely result from distinct promoter activities, and the differential heat responses of the alleles are confirmed for selected genes using transient expression assays. Global footprinting of transcription factor binding is performed in control and heat stress conditions to document regions with heat-enriched transcription factor binding occupancies. CONCLUSIONS Footprints enriched near proximal regions of characterized heat-responsive genes in a large association panel can be utilized for prioritizing functional genomic regions that regulate genotype-specific responses under heat stress.
Collapse
Affiliation(s)
- Zhikai Liang
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA.
| | - Zachary A Myers
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA
| | - Dominic Petrella
- Department of Horticulture, University of Minnesota, Saint Paul, MN, 55108, USA
- Present address: Agricultural Technical Institute, The Ohio State University, Wooster, OH, 44691, USA
| | - Julia Engelhorn
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Heinrich-Heine University, 40225, Dusseldorf, Germany
| | - Thomas Hartwig
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Heinrich-Heine University, 40225, Dusseldorf, Germany
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA.
| |
Collapse
|
30
|
Robert P, Goudemand E, Auzanneau J, Oury FX, Rolland B, Heumez E, Bouchet S, Caillebotte A, Mary-Huard T, Le Gouis J, Rincent R. Phenomic selection in wheat breeding: prediction of the genotype-by-environment interaction in multi-environment breeding trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3337-3356. [PMID: 35939074 DOI: 10.1007/s00122-022-04170-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
Phenomic prediction of wheat grain yield and heading date in different multi-environmental trial scenarios is accurate. Modelling the genotype-by-environment interaction effect using phenomic data is a potentially low-cost complement to genomic prediction. The performance of wheat cultivars in multi-environmental trials (MET) is difficult to predict because of the genotype-by-environment interactions (G × E). Phenomic selection is supposed to be efficient for modelling the G × E effect because it accounts for non-additive effects. Here, phenomic data are near-infrared (NIR) spectra obtained from plant material. While phenomic selection has recently been shown to accurately predict wheat grain yield in single environments, its accuracy needs to be investigated for MET. We used four datasets from two winter wheat breeding programs to test and compare the predictive abilities of phenomic and genomic models for grain yield and heading date in different MET scenarios. We also compared different methods to model the G × E using different covariance matrices based on spectra. On average, phenomic and genomic prediction abilities are similar in all different MET scenarios. Better predictive abilities were obtained when G × E effects were modelled with NIR spectra than without them, and it was better to use all the spectra of all genotypes in all environments for modelling the G × E. To facilitate the implementation of phenomic prediction, we tested MET designs where the NIR spectra were measured only on the genotype-environment combinations phenotyped for the target trait. Missing spectra were predicted with a weighted multivariate ridge regression. Intermediate predictive abilities for grain yield were obtained in a sparse testing scenario and for new genotypes, which shows that phenomic selection is an efficient and practicable prediction method for dealing with G × E.
Collapse
Affiliation(s)
- Pauline Robert
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Ellen Goudemand
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Jérôme Auzanneau
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | - François-Xavier Oury
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Bernard Rolland
- INRAE-Agrocampus Ouest-Université Rennes 1, UMR1349, IGEPP, Domaine de la Motte, 35653, Le Rheu, France
| | - Emmanuel Heumez
- INRAE, UE 972, Grandes Cultures Innovation Environnement, 2 Chaussée Brunehaut, 80200, Estrées-Mons, France
| | - Sophie Bouchet
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Antoine Caillebotte
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- MIA, INRAE, AgroParisTech, Université Paris-Saclay, 75005, Paris, France
| | - Jacques Le Gouis
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Renaud Rincent
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France.
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
31
|
Mural RV, Sun G, Grzybowski M, Tross MC, Jin H, Smith C, Newton L, Andorf CM, Woodhouse MR, Thompson AM, Sigmon B, Schnable JC. Association mapping across a multitude of traits collected in diverse environments in maize. Gigascience 2022; 11:giac080. [PMID: 35997208 PMCID: PMC9396454 DOI: 10.1093/gigascience/giac080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/25/2022] [Indexed: 11/14/2022] Open
Abstract
Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker data-18M markers-from 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g., above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction.
Collapse
Affiliation(s)
- Ravi V Mural
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Guangchao Sun
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Marcin Grzybowski
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Michael C Tross
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Hongyu Jin
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Christine Smith
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Linsey Newton
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Carson M Andorf
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50010, USA
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | | - Addie M Thompson
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Brandi Sigmon
- Department of Plant Pathology, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - James C Schnable
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
32
|
Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics 2022; 23:476. [PMID: 35764918 PMCID: PMC9238188 DOI: 10.1186/s12864-022-08690-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Multi-omics represent a promising link between phenotypes and genome variation. Few studies yet address their integration to understand genetic architecture and improve predictability. Results Our study used 241 poplar genotypes, phenotyped in two common gardens, with xylem and cambium RNA sequenced at one site, yielding large phenotypic, genomic (SNP), and transcriptomic datasets. Prediction models for each trait were built separately for SNPs and transcripts, and compared to a third model integrated by concatenation of both omics. The advantage of integration varied across traits and, to understand such differences, an eQTL analysis was performed to characterize the interplay between the genome and transcriptome and classify the predicting features into cis or trans relationships. A strong, significant negative correlation was found between the change in predictability and the change in predictor ranking for trans eQTLs for traits evaluated in the site of transcriptomic sampling. Conclusions Consequently, beneficial integration happens when the redundancy of predictors is decreased, likely leaving the stage to other less prominent but complementary predictors. An additional gene ontology (GO) enrichment analysis appeared to corroborate such statistical output. To our knowledge, this is a novel finding delineating a promising method to explore data integration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08690-7.
Collapse
|
33
|
Hershberger J, Tanaka R, Wood JC, Kaczmar N, Wu D, Hamilton JP, DellaPenna D, Buell CR, Gore MA. Transcriptome-wide association and prediction for carotenoids and tocochromanols in fresh sweet corn kernels. THE PLANT GENOME 2022; 15:e20197. [PMID: 35262278 DOI: 10.1002/tpg2.20197] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]
Abstract
Sweet corn (Zea mays L.) is consistently one of the most highly consumed vegetables in the United States, providing a valuable opportunity to increase nutrient intake through biofortification. Significant variation for carotenoid (provitamin A, lutein, zeaxanthin) and tocochromanol (vitamin E, antioxidants) levels is present in temperate sweet corn germplasm, yet previous genome-wide association studies (GWAS) of these traits have been limited by low statistical power and mapping resolution. Here, we employed a high-quality transcriptomic dataset collected from fresh sweet corn kernels to conduct transcriptome-wide association studies (TWAS) and transcriptome prediction studies for 39 carotenoid and tocochromanol traits. In agreement with previous GWAS findings, TWAS detected significant associations for four causal genes, β-carotene hydroxylase (crtRB1), lycopene epsilon cyclase (lcyE), γ-tocopherol methyltransferase (vte4), and homogentisate geranylgeranyltransferase (hggt1) on a transcriptome-wide level. Pathway-level analysis revealed additional associations for deoxy-xylulose synthase2 (dxs2), diphosphocytidyl methyl erythritol synthase2 (dmes2), cytidine methyl kinase1 (cmk1), and geranylgeranyl hydrogenase1 (ggh1), of which, dmes2, cmk1, and ggh1 have not previously been identified through maize association studies. Evaluation of prediction models incorporating genome-wide markers and transcriptome-wide abundances revealed a trait-dependent benefit to the inclusion of both genomic and transcriptomic data over solely genomic data, but both transcriptome- and genome-wide datasets outperformed a priori candidate gene-targeted prediction models for most traits. Altogether, this study represents an important step toward understanding the role of regulatory variation in the accumulation of vitamins in fresh sweet corn kernels.
Collapse
Affiliation(s)
- Jenna Hershberger
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Joshua C Wood
- Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Nicholas Kaczmar
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - John P Hamilton
- Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - C Robin Buell
- Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
34
|
Conaty WC, Broughton KJ, Egan LM, Li X, Li Z, Liu S, Llewellyn DJ, MacMillan CP, Moncuquet P, Rolland V, Ross B, Sargent D, Zhu QH, Pettolino FA, Stiller WN. Cotton Breeding in Australia: Meeting the Challenges of the 21st Century. FRONTIERS IN PLANT SCIENCE 2022; 13:904131. [PMID: 35646011 PMCID: PMC9136452 DOI: 10.3389/fpls.2022.904131] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/08/2022] [Indexed: 06/15/2023]
Abstract
The Commonwealth Scientific and Industrial Research Organisation (CSIRO) cotton breeding program is the sole breeding effort for cotton in Australia, developing high performing cultivars for the local industry which is worth∼AU$3 billion per annum. The program is supported by Cotton Breeding Australia, a Joint Venture between CSIRO and the program's commercial partner, Cotton Seed Distributors Ltd. (CSD). While the Australian industry is the focus, CSIRO cultivars have global impact in North America, South America, and Europe. The program is unique compared with many other public and commercial breeding programs because it focuses on diverse and integrated research with commercial outcomes. It represents the full research pipeline, supporting extensive long-term fundamental molecular research; native and genetically modified (GM) trait development; germplasm enhancement focused on yield and fiber quality improvements; integration of third-party GM traits; all culminating in the release of new commercial cultivars. This review presents evidence of past breeding successes and outlines current breeding efforts, in the areas of yield and fiber quality improvement, as well as the development of germplasm that is resistant to pests, diseases and abiotic stressors. The success of the program is based on the development of superior germplasm largely through field phenotyping, together with strong commercial partnerships with CSD and Bayer CropScience. These relationships assist in having a shared focus and ensuring commercial impact is maintained, while also providing access to markets, traits, and technology. The historical successes, current foci and future requirements of the CSIRO cotton breeding program have been used to develop a framework designed to augment our breeding system for the future. This will focus on utilizing emerging technologies from the genome to phenome, as well as a panomics approach with data management and integration to develop, test and incorporate new technologies into a breeding program. In addition to streamlining the breeding pipeline for increased genetic gain, this technology will increase the speed of trait and marker identification for use in genome editing, genomic selection and molecular assisted breeding, ultimately producing novel germplasm that will meet the coming challenges of the 21st Century.
Collapse
Affiliation(s)
| | | | - Lucy M. Egan
- CSIRO Agriculture and Food, Narrabri, NSW, Australia
| | - Xiaoqing Li
- CSIRO Agriculture and Food, Canberra, ACT, Australia
| | - Zitong Li
- CSIRO Agriculture and Food, Canberra, ACT, Australia
| | - Shiming Liu
- CSIRO Agriculture and Food, Narrabri, NSW, Australia
| | | | | | | | | | - Brett Ross
- Cotton Seed Distributors Ltd., Wee Waa, NSW, Australia
| | - Demi Sargent
- CSIRO Agriculture and Food, Narrabri, NSW, Australia
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, Australia
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Canberra, ACT, Australia
| | | | | |
Collapse
|
35
|
Mathew B, Hauptmann A, Léon J, Sillanpää MJ. NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:800161. [PMID: 35574107 PMCID: PMC9100816 DOI: 10.3389/fpls.2022.800161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.
Collapse
Affiliation(s)
- Boby Mathew
- Bayer CropScience, Monheim am Rhein, Germany
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Andreas Hauptmann
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- Department of Computer Science, University College London, London, United Kingdom
| | - Jens Léon
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
36
|
Robert P, Auzanneau J, Goudemand E, Oury FX, Rolland B, Heumez E, Bouchet S, Le Gouis J, Rincent R. Phenomic selection in wheat breeding: identification and optimisation of factors influencing prediction accuracy and comparison to genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:895-914. [PMID: 34988629 DOI: 10.1007/s00122-021-04005-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/23/2021] [Indexed: 05/15/2023]
Abstract
Phenomic selection is a promising alternative or complement to genomic selection in wheat breeding. Models combining spectra from different environments maximise the predictive ability of grain yield and heading date of wheat breeding lines. Phenomic selection (PS) is a recent breeding approach similar to genomic selection (GS) except that genotyping is replaced by near-infrared (NIR) spectroscopy. PS can potentially account for non-additive effects and has the major advantage of being low cost and high throughput. Factors influencing GS predictive abilities have been intensively studied, but little is known about PS. We tested and compared the abilities of PS and GS to predict grain yield and heading date from several datasets of bread wheat lines corresponding to the first or second years of trial evaluation from two breeding companies and one research institute in France. We evaluated several factors affecting PS predictive abilities including the possibility of combining spectra collected in different environments. A simple H-BLUP model predicted both traits with prediction ability from 0.26 to 0.62 and with an efficient computation time. Our results showed that the environments in which lines are grown had a crucial impact on predictive ability based on the spectra acquired and was specific to the trait considered. Models combining NIR spectra from different environments were the best PS models and were at least as accurate as GS in most of the datasets. Furthermore, a GH-BLUP model combining genotyping and NIR spectra was the best model of all (prediction ability from 0.31 to 0.73). We demonstrated also that as for GS, the size and the composition of the training set have a crucial impact on predictive ability. PS could therefore replace or complement GS for efficient wheat breeding programs.
Collapse
Affiliation(s)
- Pauline Robert
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Jérôme Auzanneau
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | - Ellen Goudemand
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - François-Xavier Oury
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Bernard Rolland
- INRAE-Agrocampus Ouest-Université Rennes 1, UMR1349, IGEPP, Domaine de la Motte, 35653, Le Rheu, France
| | - Emmanuel Heumez
- INRAE, UE 972, Grandes Cultures Innovation Environnement, 2 Chaussée Brunehaut, 80200, EstréesMons, France
| | - Sophie Bouchet
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Jacques Le Gouis
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France.
| |
Collapse
|
37
|
Van Tassel DL, DeHaan LR, Diaz-Garcia L, Hershberger J, Rubin MJ, Schlautman B, Turner K, Miller AJ. Re-imagining crop domestication in the era of high throughput phenomics. CURRENT OPINION IN PLANT BIOLOGY 2022; 65:102150. [PMID: 34883308 DOI: 10.1016/j.pbi.2021.102150] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 10/19/2021] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
De novo domestication is an exciting option for increasing species diversity and ecosystem service functionality of agricultural landscapes. Genomic selection (GS), the application of genomic markers to predict phenotypic traits in a breeding population, offers the possibility of rapid genetic improvement, making GS especially attractive for modifying traits of long-lived species. However, for some wild species just entering the domestication pipeline, especially those with large and complex genomes, a lack of funding and/or prior genome characterization, GS is often out of reach. High throughput phenomics has the potential to augment traditional pedigree selection, reduce costs and amplify impacts of genomic selection, and even create new predictive selection approaches independent of sequencing or pedigrees.
Collapse
Affiliation(s)
| | - Lee R DeHaan
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA
| | | | - Jenna Hershberger
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA; Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA
| | - Matthew J Rubin
- Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA
| | | | - Kathryn Turner
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA
| | - Allison J Miller
- Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA; Saint Louis University Department of Biology, 3507 Laclede Avenue, St. Louis, MO, 63103, USA.
| |
Collapse
|
38
|
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
39
|
Dan Z, Chen Y, Li H, Zeng Y, Xu W, Zhao W, He R, Huang W. The metabolomic landscape of rice heterosis highlights pathway biomarkers for predicting complex phenotypes. PLANT PHYSIOLOGY 2021; 187:1011-1025. [PMID: 34608951 PMCID: PMC8491067 DOI: 10.1093/plphys/kiab273] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 05/27/2021] [Indexed: 06/13/2023]
Abstract
Understanding the molecular mechanisms underlying complex phenotypes requires systematic analyses of complicated metabolic networks and contributes to improvements in the breeding efficiency of staple cereal crops and diagnostic accuracy for human diseases. Here, we selected rice (Oryza sativa) heterosis as a complex phenotype and investigated the mechanisms of both vegetative and reproductive traits using an untargeted metabolomics strategy. Heterosis-associated analytes were identified, and the overlapping analytes were shown to underlie the association patterns for six agronomic traits. The heterosis-associated analytes of four yield components and plant height collectively contributed to yield heterosis, and the degree of contribution differed among the five traits. We performed dysregulated network analyses of the high- and low-better parent heterosis hybrids and found multiple types of metabolic pathways involved in heterosis. The metabolite levels of the significantly enriched pathways (especially those from amino acid and carbohydrate metabolism) were predictive of yield heterosis (area under the curve = 0.907 with 10 features), and the predictability of these pathway biomarkers was validated with hybrids across environments and populations. Our findings elucidate the metabolomic landscape of rice heterosis and highlight the potential application of pathway biomarkers in achieving accurate predictions of complex phenotypes.
Collapse
Affiliation(s)
- Zhiwu Dan
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Yunping Chen
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Hui Li
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Yafei Zeng
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Wuwu Xu
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Weibo Zhao
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Ruifeng He
- Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6414, USA
| | - Wenchao Huang
- State Key Laboratory of Hybrid Rice, Key Laboratory for Research and Utilization of Heterosis in Indica Rice, the Ministry of Agriculture, College of Life Sciences, Wuhan University, Wuhan 430072, China
| |
Collapse
|
40
|
Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize. PLoS Genet 2021; 17:e1009568. [PMID: 34606492 PMCID: PMC8516254 DOI: 10.1371/journal.pgen.1009568] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/14/2021] [Accepted: 09/07/2021] [Indexed: 11/19/2022] Open
Abstract
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations. Genomic marker data is widely used in the prediction of many traits. However, prediction has been primarily carried out within populations and without explicit modeling of RNA or protein expression. In this study, we explored the prediction of field traits within and across populations using estimated RNA expression attributable to only the DNA sequence around a gene. We showed that the estimated RNA expression was more transferable across populations and tissues than measured RNA expression. We improved prediction of field traits up to 15% using estimated gene expression as compared to observed expression or gene sequence alone. Overall, these findings indicate that structural and functional information in the gene sequence is highly transferable.
Collapse
|
41
|
Sharma S, Pinson SRM, Gealy DR, Edwards JD. Genomic prediction and QTL mapping of root system architecture and above-ground agronomic traits in rice (Oryza sativa L.) with a multitrait index and Bayesian networks. G3 (BETHESDA, MD.) 2021; 11:jkab178. [PMID: 34568907 PMCID: PMC8496310 DOI: 10.1093/g3journal/jkab178] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/17/2021] [Indexed: 11/13/2022]
Abstract
Root system architecture (RSA) is a crucial factor in resource acquisition and plant productivity. Roots are difficult to phenotype in the field, thus new tools for predicting phenotype from genotype are particularly valuable for plant breeders aiming to improve RSA. This study identifies quantitative trait loci (QTLs) for RSA and agronomic traits in a rice (Oryza sativa) recombinant inbred line (RIL) population derived from parents with contrasting RSA traits (PI312777 × Katy). The lines were phenotyped for agronomic traits in the field, and separately grown as seedlings on agar plates which were imaged to extract RSA trait measurements. QTLs were discovered from conventional linkage analysis and from a machine learning approach using a Bayesian network (BN) consisting of genome-wide SNP data and phenotypic data. The genomic prediction abilities (GPAs) of multi-QTL models and the BN analysis were compared with the several standard genomic prediction (GP) methods. We found GPAs were improved using multitrait (BN) compared to single trait GP in traits with low to moderate heritability. Two groups of individuals were selected based on GPs and a modified rank sum index (GSRI) indicating their divergence across multiple RSA traits. Selections made on GPs did result in differences between the group means for numerous RSA. The ranking accuracy across RSA traits among the individual selected RILs ranged from 0.14 for root volume to 0.59 for lateral root tips. We conclude that the multitrait GP model using BN can in some cases improve the GPA of RSA and agronomic traits, and the GSRI approach is useful to simultaneously select for a desired set of RSA traits in a segregating population.
Collapse
Affiliation(s)
- Santosh Sharma
- Dale Bumpers National Rice Research Center, United States Department of Agriculture—Agricultural Research Service, Stuttgart, AR 72160, USA
| | - Shannon R M Pinson
- Dale Bumpers National Rice Research Center, United States Department of Agriculture—Agricultural Research Service, Stuttgart, AR 72160, USA
| | - David R Gealy
- Dale Bumpers National Rice Research Center, United States Department of Agriculture—Agricultural Research Service, Stuttgart, AR 72160, USA
| | - Jeremy D Edwards
- Dale Bumpers National Rice Research Center, United States Department of Agriculture—Agricultural Research Service, Stuttgart, AR 72160, USA
| |
Collapse
|
42
|
Nakhle F, Harfouche AL. Ready, Steady, Go AI: A practical tutorial on fundamentals of artificial intelligence and its applications in phenomics image analysis. PATTERNS (NEW YORK, N.Y.) 2021; 2:100323. [PMID: 34553170 PMCID: PMC8441561 DOI: 10.1016/j.patter.2021.100323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
High-throughput image-based technologies are now widely used in the rapidly developing field of digital phenomics and are generating ever-increasing amounts and diversity of data. Artificial intelligence (AI) is becoming a game changer in turning the vast seas of data into valuable predictions and insights. However, this requires specialized programming skills and an in-depth understanding of machine learning, deep learning, and ensemble learning algorithms. Here, we attempt to methodically review the usage of different tools, technologies, and services available to the phenomics data community and show how they can be applied to selected problems in explainable AI-based image analysis. This tutorial provides practical and useful resources for novices and experts to harness the potential of the phenomic data in explainable AI-led breeding programs.
Collapse
Affiliation(s)
- Farid Nakhle
- Department for Innovation in Biological, Agro-food and Forest systems, University of Tuscia, Via S. Camillo de Lellis, Viterbo 01100, Italy
| | - Antoine L. Harfouche
- Department for Innovation in Biological, Agro-food and Forest systems, University of Tuscia, Via S. Camillo de Lellis, Viterbo 01100, Italy
| |
Collapse
|
43
|
Zhang T, Jiang L, Ruan L, Qian Y, Liang S, Lin F, Lu H, Dai H, Zhao H. Heterotic quantitative trait loci analysis and genomic prediction of seedling biomass-related traits in maize triple testcross populations. PLANT METHODS 2021; 17:85. [PMID: 34330310 PMCID: PMC8325263 DOI: 10.1186/s13007-021-00785-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 07/23/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Heterosis has been widely used in maize breeding. However, we know little about the heterotic quantitative trait loci and their roles in genomic prediction. In this study, we sought to identify heterotic quantitative trait loci for seedling biomass-related traits using triple testcross design and compare their prediction accuracies by fitting molecular markers and heterotic quantitative trait loci. RESULTS A triple testcross population comprised of 366 genotypes was constructed by crossing each of 122 intermated B73 × Mo17 genotypes with B73, Mo17, and B73 × Mo17. The mid-parent heterosis of seedling biomass-related traits involved in leaf length, leaf width, leaf area, and seedling dry weight displayed a large range, from less than 50 to ~ 150%. Relationships between heterosis of seedling biomass-related traits showed congruency with that between performances. Based on a linkage map comprised of 1631 markers, 14 augmented additive, two augmented dominance, and three dominance × additive epistatic quantitative trait loci for heterosis of seedling biomass-related traits were identified, with each individually explaining 4.1-20.5% of the phenotypic variation. All modes of gene action, i.e., additive, partially dominant, dominant, and overdominant modes were observed. In addition, ten additive × additive and six dominance × dominance epistatic interactions were identified. By implementing the general and special combining ability model, we found that prediction accuracy ranged from 0.29 for leaf length to 0.56 for leaf width. Different number of marker analysis showed that ~ 800 markers almost capture the largest prediction accuracies. When incorporating the heterotic quantitative trait loci into the model, we did not find the significant change of prediction accuracy, with only leaf length showing the marginal improvement by 1.7%. CONCLUSIONS Our results demonstrated that the triple testcross design is suitable for detecting heterotic quantitative trait loci and evaluating the prediction accuracy. Seedling leaf width can be used as the representative trait for seedling prediction. The heterotic quantitative trait loci are not necessary for genomic prediction of seedling biomass-related traits.
Collapse
Affiliation(s)
- Tifu Zhang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Lu Jiang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Long Ruan
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Yiliang Qian
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Shuaiqiang Liang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Feng Lin
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Haiyan Lu
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Huixue Dai
- Nanjing Institute of Vegetable Sciences, Nanjing, 210042, China
| | - Han Zhao
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China.
| |
Collapse
|
44
|
Pazhamala LT, Kudapa H, Weckwerth W, Millar AH, Varshney RK. Systems biology for crop improvement. THE PLANT GENOME 2021; 14:e20098. [PMID: 33949787 DOI: 10.1002/tpg2.20098] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 03/09/2021] [Indexed: 05/19/2023]
Abstract
In recent years, generation of large-scale data from genome, transcriptome, proteome, metabolome, epigenome, and others, has become routine in several plant species. Most of these datasets in different crop species, however, were studied independently and as a result, full insight could not be gained on the molecular basis of complex traits and biological networks. A systems biology approach involving integration of multiple omics data, modeling, and prediction of the cellular functions is required to understand the flow of biological information that underlies complex traits. In this context, systems biology with multiomics data integration is crucial and allows a holistic understanding of the dynamic system with the different levels of biological organization interacting with external environment for a phenotypic expression. Here, we present recent progress made in the area of various omics studies-integrative and systems biology approaches with a special focus on application to crop improvement. We have also discussed the challenges and opportunities in multiomics data integration, modeling, and understanding of the biology of complex traits underpinning yield and stress tolerance in major cereals and legumes.
Collapse
Affiliation(s)
- Lekha T Pazhamala
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Himabindu Kudapa
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria
- Vienna Metabolomics Center, University of Vienna, Vienna, Austria
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
- State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia
| |
Collapse
|
45
|
Jocković M, Jocić S, Cvejić S, Marjanović-Jeromela A, Jocković J, Radanović A, Miladinović D. Genetic Improvement in Sunflower Breeding—Integrated Omics Approach. PLANTS 2021; 10:plants10061150. [PMID: 34200113 PMCID: PMC8228292 DOI: 10.3390/plants10061150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/31/2021] [Accepted: 06/01/2021] [Indexed: 01/23/2023]
Abstract
Foresight in climate change and the challenges ahead requires a systematic approach to sunflower breeding that will encompass all available technologies. There is a great scarcity of desirable genetic variation, which is in fact undiscovered because it has not been sufficiently researched as detection and designing favorable genetic variation largely depends on thorough genome sequencing through broad and deep resequencing. Basic exploration of genomes is insufficient to find insight about important physiological and molecular mechanisms unique to crops. That is why integrating information from genomics, epigenomics, transcriptomics, proteomics, metabolomics and phenomics enables a comprehensive understanding of the molecular mechanisms in the background of architecture of many important quantitative traits. Omics technologies offer novel possibilities for deciphering the complex pathways and molecular profiling through the level of systems biology and can provide important answers that can be utilized for more efficient breeding of sunflower. In this review, we present omics profiling approaches in order to address their possibilities and usefulness as a potential breeding tools in sunflower genetic improvement.
Collapse
Affiliation(s)
- Milan Jocković
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
- Correspondence:
| | - Siniša Jocić
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
| | - Sandra Cvejić
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
| | - Ana Marjanović-Jeromela
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
| | - Jelena Jocković
- Department of Biology and Ecology, Faculty of Sciences, University of Novi Sad, Dositeja Obradovića 3, 21000 Novi Sad, Serbia;
| | - Aleksandra Radanović
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
| | - Dragana Miladinović
- Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (S.J.); (S.C.); (A.M.-J.); (A.R.); (D.M.)
| |
Collapse
|
46
|
Arouisse B, Theeuwen TPJM, van Eeuwijk FA, Kruijer W. Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes. Front Genet 2021; 12:667358. [PMID: 34108993 PMCID: PMC8181460 DOI: 10.3389/fgene.2021.667358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 11/17/2022] Open
Abstract
In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
Collapse
Affiliation(s)
- Bader Arouisse
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| | - Tom P J M Theeuwen
- Laboratory of Genetics, Wageningen University and Research, Wageningen, Netherlands
| | | | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
47
|
Rohde PD, Kristensen TN, Sarup P, Muñoz J, Malmendal A. Prediction of complex phenotypes using the Drosophila melanogaster metabolome. Heredity (Edinb) 2021; 126:717-732. [PMID: 33510469 PMCID: PMC8102504 DOI: 10.1038/s41437-021-00404-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/04/2021] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
Understanding the genotype-phenotype map and how variation at different levels of biological organization is associated are central topics in modern biology. Fast developments in sequencing technologies and other molecular omic tools enable researchers to obtain detailed information on variation at DNA level and on intermediate endophenotypes, such as RNA, proteins and metabolites. This can facilitate our understanding of the link between genotypes and molecular and functional organismal phenotypes. Here, we use the Drosophila melanogaster Genetic Reference Panel and nuclear magnetic resonance (NMR) metabolomics to investigate the ability of the metabolome to predict organismal phenotypes. We performed NMR metabolomics on four replicate pools of male flies from each of 170 different isogenic lines. Our results show that metabolite profiles are variable among the investigated lines and that this variation is highly heritable. Second, we identify genes associated with metabolome variation. Third, using the metabolome gave better prediction accuracies than genomic information for four of five quantitative traits analyzed. Our comprehensive characterization of population-scale diversity of metabolomes and its genetic basis illustrates that metabolites have large potential as predictors of organismal phenotypes. This finding is of great importance, e.g., in human medicine, evolutionary biology and animal and plant breeding.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| | - Torsten Nygaard Kristensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Animal Science, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | - Joaquin Muñoz
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Anders Malmendal
- Department of Science and Environment, Roskilde University, Roskilde, Denmark.
| |
Collapse
|
48
|
Canales J, Verdejo J, Carrasco-Puga G, Castillo FM, Arenas-M A, Calderini DF. Transcriptome Analysis of Seed Weight Plasticity in Brassica napus. Int J Mol Sci 2021; 22:4449. [PMID: 33923211 PMCID: PMC8123204 DOI: 10.3390/ijms22094449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/22/2021] [Accepted: 04/23/2021] [Indexed: 11/17/2022] Open
Abstract
A critical barrier to improving crop yield is the trade-off between seed weight (SW) and seed number (SN), which has been commonly reported in several crops, including Brassica napus. Despite the agronomic relevance of this issue, the molecular factors involved in the interaction between SW and SN are largely unknown in crops. In this work, we performed a detailed transcriptomic analysis of 48 seed samples obtained from two rapeseed spring genotypes subjected to different source-sink (S-S) ratios in order to examine the relationship between SW and SN under different field conditions. A multifactorial analysis of the RNA-seq data was used to identify a group of 1014 genes exclusively regulated by the S-S ratio. We found that a reduction in the S-S ratio during seed filling induces the expression of genes involved in sucrose transport, seed weight, and stress responses. Moreover, we identified five co-expression modules that are positively correlated with SW and negatively correlated with SN. Interestingly, one of these modules was significantly enriched in transcription factors (TFs). Furthermore, our network analysis predicted several NAC TFs as major hubs underlying SW and SN compensation. Taken together, our study provides novel insights into the molecular factors associated with the SW-SN relationship in rapeseed and identifies TFs as potential targets when improving crop yield.
Collapse
Affiliation(s)
- Javier Canales
- Institute of Biochemistry and Microbiology, Faculty of Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile; (F.M.C.); (A.A.-M.)
- ANID–Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), 8331150 Santiago, Chile
| | - José Verdejo
- Graduate School, Faculty of Agricultural Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile;
- Plant Production and Plant Protection Institute, Faculty of Agricultural Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile;
| | - Gabriela Carrasco-Puga
- Plant Production and Plant Protection Institute, Faculty of Agricultural Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile;
| | - Francisca M. Castillo
- Institute of Biochemistry and Microbiology, Faculty of Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile; (F.M.C.); (A.A.-M.)
- ANID–Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), 8331150 Santiago, Chile
| | - Anita Arenas-M
- Institute of Biochemistry and Microbiology, Faculty of Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile; (F.M.C.); (A.A.-M.)
- ANID–Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), 8331150 Santiago, Chile
| | - Daniel F. Calderini
- Plant Production and Plant Protection Institute, Faculty of Agricultural Sciences, Universidad Austral de Chile, 5110566 Valdivia, Chile;
| |
Collapse
|
49
|
Urzúa-Traslaviña CG, Leeuwenburgh VC, Bhattacharya A, Loipfinger S, van Vugt MATM, de Vries EGE, Fehrmann RSN. Improving gene function predictions using independent transcriptional components. Nat Commun 2021; 12:1464. [PMID: 33674610 PMCID: PMC7935959 DOI: 10.1038/s41467-021-21671-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 02/05/2021] [Indexed: 02/07/2023] Open
Abstract
The interpretation of high throughput sequencing data is limited by our incomplete functional understanding of coding and non-coding transcripts. Reliably predicting the function of such transcripts can overcome this limitation. Here we report the use of a consensus independent component analysis and guilt-by-association approach to predict over 23,000 functional groups comprised of over 55,000 coding and non-coding transcripts using publicly available transcriptomic profiles. We show that, compared to using Principal Component Analysis, Independent Component Analysis-derived transcriptional components enable more confident functionality predictions, improve predictions when new members are added to the gene sets, and are less affected by gene multi-functionality. Predictions generated using human or mouse transcriptomic data are made available for exploration in a publicly available web portal.
Collapse
Affiliation(s)
- Carlos G Urzúa-Traslaviña
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Vincent C Leeuwenburgh
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.,The Stratingh Institute for Chemistry, University of Groningen, Groningen, The Netherlands
| | - Arkajyoti Bhattacharya
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Stefan Loipfinger
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Marcel A T M van Vugt
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Elisabeth G E de Vries
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rudolf S N Fehrmann
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
| |
Collapse
|
50
|
Scossa F, Alseekh S, Fernie AR. Integrating multi-omics data for crop improvement. JOURNAL OF PLANT PHYSIOLOGY 2021; 257:153352. [PMID: 33360148 DOI: 10.1016/j.jplph.2020.153352] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 05/26/2023]
Abstract
Our agricultural systems are now in urgent need to secure food for a growing world population. To meet this challenge, we need a better characterization of plant genetic and phenotypic diversity. The combination of genomics, transcriptomics and metabolomics enables a deeper understanding of the mechanisms underlying the complex architecture of many phenotypic traits of agricultural relevance. We review the recent advances in plant genomics to see how these can be integrated with broad molecular profiling approaches to improve our understanding of plant phenotypic variation and inform crop breeding strategies.
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam, Golm, Germany; Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), 00178, Rome, Italy.
| | - Saleh Alseekh
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam, Golm, Germany; Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Alisdair R Fernie
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam, Golm, Germany; Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria.
| |
Collapse
|