1
|
Pierotti S, Fitzgerald T, Birney E. FlexLMM: a Nextflow linear mixed model framework for GWAS. Bioinformatics 2024; 41:btaf021. [PMID: 39814073 PMCID: PMC11783306 DOI: 10.1093/bioinformatics/btaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/10/2024] [Accepted: 01/13/2025] [Indexed: 01/18/2025] Open
Abstract
SUMMARY Linear mixed models (LMMs) are a commonly used statistical approach in genome-wide association studies when population structure is present. However, naive permutations of the phenotype to empirically estimate the null distribution of a statistic of interest are not appropriate in the presence of population structure or covariates. This is because the samples are not exchangeable with each other under the null hypothesis, and because permuting the phenotypes breaks the relationship among those and eventual covariates. For this reason, we developed FlexLMM, a Nextflow pipeline that can perform appropriate permutations in LMMs while allowing for flexibility in the definition of the exact statistical model to be used. FlexLMM can set a significance threshold via permutations, thanks to a two-step process where the population structure is first regressed out, and only then are the permutations performed on the uncorrelated residuals. We envision this pipeline will be particularly useful for researchers working on multi-parental crosses among inbred lines of model organisms or farm animals and plants. AVAILABILITY AND IMPLEMENTATION The source code and documentation for the FlexLMM is available at https://github.com/birneylab/flexlmm.
Collapse
Affiliation(s)
- Saul Pierotti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, United Kingdom
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, United Kingdom
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
2
|
Hlaing MM, Win KT, Yasui H, Yoshimura A, Yamagata Y. A genome-wide association study using Myanmar indica diversity panel reveals a significant genomic region associated with heading date in rice. BREEDING SCIENCE 2024; 74:415-426. [PMID: 39897663 PMCID: PMC11780332 DOI: 10.1270/jsbbs.23083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 07/29/2024] [Indexed: 02/04/2025]
Abstract
Heading date is a key agronomic trait for adapting rice varieties to different growing areas and crop seasons. The genetic mechanism of heading date in Myanmar rice accessions was investigated using a genome-wide association study (GWAS) in a 250-variety indica diversity panel collected from different geographical regions. Using the days to heading data collected in 2019 and 2020, a major genomic region associated with the heading date, designated as MTA3, was found on chromosome 3. The linkage disequilibrium block of the MTA3 contained the coding sequence (CDS) of the phytochrome gene PhyC but not in its promoter region. Haplotype analysis of the 2-kb promoter and gene regions of PhyC revealed the six haplotypes, PHYCHapA, B, C, D, E, and F. The most prominent haplotypes, PHYCHapA and PHYCHapC, had different CDS and were associated with late heading and early heading phenotypes in MIDP, respectively. The difference in CDS effects between the PHYCHapB, which has identical CDS to PHYCHapA, and PHYCHapC was validated by QTL analysis using an F2 population. The distribution of PHYCHapA in the southern coastal and delta regions and of PHYCHapC in the northern highlands appears to ensure heading at the appropriate time in each area under the local day-length conditions in Myanmar. The natural variation in PhyC would be a major determinant of heading date in Myanmar accessions.
Collapse
Affiliation(s)
- Moe Moe Hlaing
- Plant Breeding Laboratory, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Khin Thanda Win
- Plant Breeding Laboratory, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Hideshi Yasui
- Plant Breeding Laboratory, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Atsushi Yoshimura
- Plant Breeding Laboratory, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Yoshiyuki Yamagata
- Plant Breeding Laboratory, Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| |
Collapse
|
3
|
John M, Korte A, Todesco M, Grimm DG. Population-aware permutation-based significance thresholds for genome-wide association studies. BIOINFORMATICS ADVANCES 2024; 4:vbae168. [PMID: 39678204 PMCID: PMC11639184 DOI: 10.1093/bioadv/vbae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 10/02/2024] [Accepted: 10/25/2024] [Indexed: 12/17/2024]
Abstract
Motivation Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure. Results We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers (Helianthus annuus L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support. Availability and implementation permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
| | - Arthur Korte
- Faculty of Biology, University of Würzburg, 97074 Würzburg, Germany
| | - Marco Todesco
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Biology, University of British Columbia, Kelowna, BC V1V 1V7, Canada
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology, 85748 Garching, Germany
| |
Collapse
|
4
|
Torkler P, Sauer M, Schwartz U, Corbacioglu S, Sommer G, Heise T. LoDEI: a robust and sensitive tool to detect transcriptome-wide differential A-to-I editing in RNA-seq data. Nat Commun 2024; 15:9121. [PMID: 39443485 PMCID: PMC11500352 DOI: 10.1038/s41467-024-53298-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 10/02/2024] [Indexed: 10/25/2024] Open
Abstract
RNA editing is a highly conserved process. Adenosine deaminase acting on RNA (ADAR) mediated deamination of adenosine (A-to-I editing) is associated with human disease and immune checkpoint control. Functional implications of A-to-I editing are currently of broad interest to academic and industrial research as underscored by the fast-growing number of clinical studies applying base editors as therapeutic tools. Analyzing the dynamics of A-to-I editing, in a biological or therapeutic context, requires the sensitive detection of differential A-to-I editing, a currently unmet need. We introduce the local differential editing index (LoDEI) to detect differential A-to-I editing in RNA-seq datasets using a sliding-window approach coupled with an empirical q value calculation that detects more A-to-I editing sites at the same false-discovery rate compared to existing methods. LoDEI is validated on known and novel datasets revealing that the oncogene MYCN increases and that a specific small non-coding RNA reduces A-to-I editing.
Collapse
Affiliation(s)
- Phillipp Torkler
- Faculty of Computer Science, Deggendorf Institute of Technology, Dieter-Görlitz-Platz 1, Deggendorf, 94469, Bavaria, Germany
| | - Marina Sauer
- Department for Pediatric Hematology, Oncology and Stem Cell Transplantation, University Hospital Regensburg, Franz-Josef-Strauß-Allee 11, Regensbug, 93053, Bavaria, Germany
| | - Uwe Schwartz
- NGS Analysis Center, University of Regensburg, Universitätsstraße 31, Regensburg, 93053, Bavaria, Germany
| | - Selim Corbacioglu
- Department for Pediatric Hematology, Oncology and Stem Cell Transplantation, University Hospital Regensburg, Franz-Josef-Strauß-Allee 11, Regensbug, 93053, Bavaria, Germany
| | - Gunhild Sommer
- Department for Pediatric Hematology, Oncology and Stem Cell Transplantation, University Hospital Regensburg, Franz-Josef-Strauß-Allee 11, Regensbug, 93053, Bavaria, Germany
| | - Tilman Heise
- Department for Pediatric Hematology, Oncology and Stem Cell Transplantation, University Hospital Regensburg, Franz-Josef-Strauß-Allee 11, Regensbug, 93053, Bavaria, Germany.
| |
Collapse
|
5
|
Freda PJ, Ghosh A, Bhandary P, Matsumoto N, Chitre AS, Zhou J, Hall MA, Palmer AA, Obafemi-Ajayi T, Moore JH. PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies. BioData Min 2024; 17:41. [PMID: 39394173 PMCID: PMC11468469 DOI: 10.1186/s13040-024-00393-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 09/30/2024] [Indexed: 10/13/2024] Open
Abstract
BACKGROUND The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies. RESULTS Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model. CONCLUSIONS Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.
Collapse
Affiliation(s)
- Philip J Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vincente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vincente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vincente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vincente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093-0667, USA
| | - Jiayan Zhou
- Department of Medicine, Stanford University School of Medicine, 291 Campus Dr., Li Ka Shing Building, Stanford, CA, 94305, USA
| | - Molly A Hall
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Richards Building A301, Philadelphia, PA, 19104, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093-0667, USA
- Institute for Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093-0667, USA
| | - Tayo Obafemi-Ajayi
- Cooperative Engineering Program, Missouri State University, 901 S. National Ave, Springfield, MO, 65897, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vincente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA.
| |
Collapse
|
6
|
John M, Korte A, Grimm DG. The benefits of permutation-based genome-wide association studies. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:5377-5389. [PMID: 38954539 PMCID: PMC11389838 DOI: 10.1093/jxb/erae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 07/01/2024] [Indexed: 07/04/2024]
Abstract
Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis phenotypes from the AraPheno database.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
| | - Arthur Korte
- University of Würzburg, Faculty of Biology, Julius-von-Sachs Institute, Julius-von-Sachs-Platz 3, 97082 Würzburg, Germany
| | - Dominik G Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology, Boltzmannstraße 3, 85748 Garching, Germany
| |
Collapse
|
7
|
Zhou J, Yu JZ, Zhu MY, Yang FX, Hao JP, He Y, Zhu XL, Hou ZC, Zhu F. Genome-Wide Association Analysis and Genetic Parameters for Egg Production Traits in Peking Ducks. Animals (Basel) 2024; 14:1891. [PMID: 38998005 PMCID: PMC11240742 DOI: 10.3390/ani14131891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 06/08/2024] [Accepted: 06/12/2024] [Indexed: 07/14/2024] Open
Abstract
Egg production traits are crucial in the poultry industry, including age at first egg (AFE), egg number (EN) at different stages, and laying rate (LR). Ducks exhibit higher egg production capacity than other poultry species, but the genetic mechanisms are still poorly understood. In this study, we collected egg-laying data of 618 Peking ducks from 22 to 66 weeks of age and genotyped them by whole-genome resequencing. Genetic parameters were calculated based on SNPs, and a genome-wide association study (GWAS) was performed for these traits. The SNP-based heritability of egg production traits ranged from 0.09 to 0.54. The GWAS identified nine significant SNP loci associated with AFE and egg number from 22 to 66 weeks. These loci showed that the corresponding alleles were positively correlated with a decrease in the traits. Moreover, three potential candidate genes (ENSAPLG00020011445, ENSAPLG00020012564, TMEM260) were identified. Functional enrichment analyses suggest that specific immune responses may have a critical impact on egg production capacity by influencing ovarian function and oocyte maturation processes. In conclusion, this study deepens the understanding of egg-laying genetics in Peking duck and provides a sound theoretical basis for future genetic improvement and genomic selection strategies in poultry.
Collapse
Affiliation(s)
- Jun Zhou
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Jiang-Zhou Yu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Mei-Yi Zhu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Fang-Xi Yang
- Beijing Nankou Duck Breeding Technology Co., Ltd., Beijing 102202, China
| | - Jin-Ping Hao
- Beijing Nankou Duck Breeding Technology Co., Ltd., Beijing 102202, China
| | - Yong He
- Cherry Valley Breeding Technology Co., Ltd., Beijing 100088, China
| | - Xiao-Liang Zhu
- Cherry Valley Breeding Technology Co., Ltd., Beijing 100088, China
| | - Zhuo-Cheng Hou
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Feng Zhu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| |
Collapse
|
8
|
Chang-Brahim I, Koppensteiner LJ, Beltrame L, Bodner G, Saranti A, Salzinger J, Fanta-Jende P, Sulzbachner C, Bruckmüller F, Trognitz F, Samad-Zamini M, Zechner E, Holzinger A, Molin EM. Reviewing the essential roles of remote phenotyping, GWAS and explainable AI in practical marker-assisted selection for drought-tolerant winter wheat breeding. FRONTIERS IN PLANT SCIENCE 2024; 15:1319938. [PMID: 38699541 PMCID: PMC11064034 DOI: 10.3389/fpls.2024.1319938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/13/2024] [Indexed: 05/05/2024]
Abstract
Marker-assisted selection (MAS) plays a crucial role in crop breeding improving the speed and precision of conventional breeding programmes by quickly and reliably identifying and selecting plants with desired traits. However, the efficacy of MAS depends on several prerequisites, with precise phenotyping being a key aspect of any plant breeding programme. Recent advancements in high-throughput remote phenotyping, facilitated by unmanned aerial vehicles coupled to machine learning, offer a non-destructive and efficient alternative to traditional, time-consuming, and labour-intensive methods. Furthermore, MAS relies on knowledge of marker-trait associations, commonly obtained through genome-wide association studies (GWAS), to understand complex traits such as drought tolerance, including yield components and phenology. However, GWAS has limitations that artificial intelligence (AI) has been shown to partially overcome. Additionally, AI and its explainable variants, which ensure transparency and interpretability, are increasingly being used as recognised problem-solving tools throughout the breeding process. Given these rapid technological advancements, this review provides an overview of state-of-the-art methods and processes underlying each MAS, from phenotyping, genotyping and association analyses to the integration of explainable AI along the entire workflow. In this context, we specifically address the challenges and importance of breeding winter wheat for greater drought tolerance with stable yields, as regional droughts during critical developmental stages pose a threat to winter wheat production. Finally, we explore the transition from scientific progress to practical implementation and discuss ways to bridge the gap between cutting-edge developments and breeders, expediting MAS-based winter wheat breeding for drought tolerance.
Collapse
Affiliation(s)
- Ignacio Chang-Brahim
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
| | | | - Lorenzo Beltrame
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Gernot Bodner
- Department of Crop Sciences, Institute of Agronomy, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| | - Anna Saranti
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| | - Jules Salzinger
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Phillipp Fanta-Jende
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Christoph Sulzbachner
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Felix Bruckmüller
- Unit Assistive and Autonomous Systems, Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
| | - Friederike Trognitz
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
| | | | - Elisabeth Zechner
- Verein zur Förderung einer nachhaltigen und regionalen Pflanzenzüchtung, Zwettl, Austria
| | - Andreas Holzinger
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| | - Eva M. Molin
- Unit Bioresources, Center for Health & Bioresources, AIT Austrian Institute of Technology, Tulln, Austria
- Human-Centered AI Lab, Department of Forest- and Soil Sciences, Institute of Forest Engineering, University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| |
Collapse
|
9
|
Liu S, Cheng H, Zhang Y, He M, Zuo D, Wang Q, Lv L, Lin Z, Song G. Fingerprint Finder: Identifying Genomic Fingerprint Sites in Cotton Cohorts for Genetic Analysis and Breeding Advancement. Genes (Basel) 2024; 15:378. [PMID: 38540437 PMCID: PMC10970022 DOI: 10.3390/genes15030378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/17/2024] [Accepted: 03/18/2024] [Indexed: 06/14/2024] Open
Abstract
Genomic data in Gossypium provide numerous data resources for the cotton genomics community. However, to fill the gap between genomic analysis and breeding field work, detecting the featured genomic items of a subset cohort is essential for geneticists. We developed FPFinder v1.0 software to identify a subset of the cohort's fingerprint genomic sites. The FPFinder was developed based on the term frequency-inverse document frequency algorithm. With the short-read sequencing of an elite cotton pedigree, we identified 453 pedigree fingerprint genomic sites and found that these pedigree-featured sites had a role in cotton development. In addition, we applied FPFinder to evaluate the geographical bias of fiber-length-related genomic sites from a modern cotton cohort consisting of 410 accessions. Enriching elite sites in cultivars from the Yangtze River region resulted in the longer fiber length of Yangze River-sourced accessions. Apart from characterizing functional sites, we also identified 12,536 region-specific genomic sites. Combining the transcriptome data of multiple tissues and samples under various abiotic stresses, we found that several region-specific sites contributed to environmental adaptation. In this research, FPFinder revealed the role of the cotton pedigree fingerprint and region-specific sites in cotton development and environmental adaptation, respectively. The FPFinder can be applied broadly in other crops and contribute to genetic breeding in the future.
Collapse
Affiliation(s)
- Shang Liu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China;
| | - Hailiang Cheng
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
- Zhengzhou Research Base, Zhengzhou University, Zhengzhou 450001, China
| | - Youping Zhang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
| | - Man He
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
| | - Dongyun Zuo
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
| | - Qiaolian Wang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
| | - Limin Lv
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
| | - Zhongxv Lin
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China;
| | - Guoli Song
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, China; (S.L.); (Y.Z.); (M.H.); (D.Z.); (Q.W.); (L.L.)
- Zhengzhou Research Base, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
10
|
Reichelt N, Korte A, Krischke M, Mueller MJ, Maag D. Natural variation of warm temperature-induced raffinose accumulation identifies TREHALOSE-6-PHOSPHATE SYNTHASE 1 as a modulator of thermotolerance. PLANT, CELL & ENVIRONMENT 2023; 46:3392-3404. [PMID: 37427798 DOI: 10.1111/pce.14664] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/27/2023] [Accepted: 06/28/2023] [Indexed: 07/11/2023]
Abstract
High-temperature stress limits plant growth and reproduction. Exposure to high temperature, however, also elicits a physiological response, which protects plants from the damage evoked by heat. This response involves a partial reconfiguration of the metabolome including the accumulation of the trisaccharide raffinose. In this study, we explored the intraspecific variation of warm temperature-induced raffinose accumulation as a metabolic marker for temperature responsiveness with the aim to identify genes that contribute to thermotolerance. By combining raffinose measurements in 250 Arabidopsis thaliana accessions following a mild heat treatment with genome-wide association studies, we identified five genomic regions that were associated with the observed trait variation. Subsequent functional analyses confirmed a causal relationship between TREHALOSE-6-PHOSPHATE SYNTHASE 1 (TPS1) and warm temperature-dependent raffinose synthesis. Moreover, complementation of the tps1-1 null mutant with functionally distinct TPS1 isoforms differentially affected carbohydrate metabolism under more severe heat stress. While higher TPS1 activity was associated with reduced endogenous sucrose levels and thermotolerance, disruption of trehalose 6-phosphate signalling resulted in higher accumulation of transitory starch and sucrose and was associated with enhanced heat resistance. Taken together, our findings suggest a role of trehalose 6-phosphate in thermotolerance, most likely through its regulatory function in carbon partitioning and sucrose homoeostasis.
Collapse
Affiliation(s)
- Niklas Reichelt
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Würzburg, Germany
| | - Markus Krischke
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Martin J Mueller
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Daniel Maag
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| |
Collapse
|
11
|
Kim M, Munyaneza JP, Cho E, Jang A, Jo C, Nam KC, Choo HJ, Lee JH. Genome-Wide Association Study on the Content of Nucleotide-Related Compounds in Korean Native Chicken Breast Meat. Animals (Basel) 2023; 13:2966. [PMID: 37760369 PMCID: PMC10525433 DOI: 10.3390/ani13182966] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/12/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
Meat flavor is an important factor that influences the palatability of chicken meat. Inosine 5'-monophosphate (IMP), inosine, and hypoxanthine are nucleic acids that serve as taste-active compounds, mainly enhancing flavor in muscle tissue. For this study, we performed a genome-wide association study (GWAS) using a mixed linear model to identify single-nucleotide polymorphisms (SNPs) that are significantly associated with changes in the contents of the nucleotide-related compounds of breast meat in the Korean native chicken (KNC) population. The genomic region on chicken chromosome 5 containing an SNP (rs316338889) was significantly (p < 0.05) associated with all three traits. The trait-related candidate genes located in this significant genomic region were investigated through performing a functional enrichment analysis and protein-protein interaction (PPI) database search. We found six candidate genes related to the function that possibly affected the content of nucleotide-related compounds in the muscle, namely, the TNNT3 and TNNT2 genes that regulate muscle contractions; the INS, IGF2, and DUSP8 genes associated with insulin sensitivity; and the C5NT1AL gene that is presumably related to the nucleotide metabolism process. This study is the first of its kind to find candidate genes associated with the content of all three types of nucleotide-related compounds in chicken meat using GWAS. The candidate genes identified in this study can be used for genomic selection to breed better-quality chickens in the future.
Collapse
Affiliation(s)
- Minjun Kim
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (M.K.); (J.P.M.)
| | - Jean Pierre Munyaneza
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (M.K.); (J.P.M.)
| | - Eunjin Cho
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Republic of Korea;
| | - Aera Jang
- Department of Applied Animal Science, College of Animal Life Science, Kangwon National University, Chuncheon 24341, Republic of Korea;
| | - Cheorun Jo
- Department of Agricultural Biotechnology, Center for Food and Bioconvergence, Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Republic of Korea;
| | - Ki-Chang Nam
- Department of Animal Science and Technology, Sunchon National University, Suncheon 57922, Republic of Korea;
| | - Hyo Jun Choo
- Poultry Research Institute, National Institute of Animal Science, Rural Development Administration, Pyeongchang 25342, Republic of Korea
| | - Jun Heon Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Republic of Korea; (M.K.); (J.P.M.)
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Republic of Korea;
| |
Collapse
|
12
|
Sypniewski M, Szydlowski M. A Study of 41 Canine Orthologues of Human Genes Involved in Monogenic Obesity Reveals Marker in the ADCY3 for Body Weight in Labrador Retrievers. Vet Sci 2023; 10:390. [PMID: 37368776 DOI: 10.3390/vetsci10060390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 05/05/2023] [Accepted: 05/06/2023] [Indexed: 06/29/2023] Open
Abstract
Obesity and overweight are common conditions in dogs, but individual susceptibility varies with numerous risk factors, including diet, age, sterilization, and gender. In addition to environmental and biological factors, genetic and epigenetic risk factors can influence predisposition to canine obesity, however, they remain unknown. Labrador Retrievers are one of the breeds that are prone to obesity. The purpose of this study was to analyse 41 canine orthologues of human genes linked to monogenic obesity in humans to identify genes associated with body weight in Labrador Retriever dogs. We analysed 11,520 variants from 50 dogs using a linear mixed model with sex, age, and sterilization as covariates and population structure as a random effect. Estimates obtained from the model were subjected to a maxT permutation procedure to adjust p-values for FWER < 0.05. Only the ADCY3 gene showed statistically significant association: TA>T deletion located at 17:19,222,459 in 1/20 intron (per allele effect of 5.56 kg, SE 0.018, p-value = 5.83 × 10-5, TA/TA: 11 dogs; TA/T: 32 dogs; T/T: 7 dogs). Mutations in the ADCY3 gene have already been associated with obesity in mice and humans, making it a promising marker for canine obesity research. Our results provide further evidence that the genetic makeup of obesity in Labrador Retriever dogs contains genes with large effect sizes.
Collapse
Affiliation(s)
- Mateusz Sypniewski
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Wołyńska 33, 60-637 Poznań, Poland
| | - Maciej Szydlowski
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Wołyńska 33, 60-637 Poznań, Poland
| |
Collapse
|
13
|
Gullotta G, Korte A, Marquardt S. Functional variation in the non-coding genome: molecular implications for food security. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:2338-2351. [PMID: 36316269 DOI: 10.1093/jxb/erac395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/06/2022] [Indexed: 06/06/2023]
Abstract
The growing world population, in combination with the anticipated effects of climate change, is pressuring food security. Plants display an impressive arsenal of cellular mechanisms conferring resilience to adverse environmental conditions, and humans rely on these mechanisms for stable food production. The elucidation of the molecular basis of the mechanisms used by plants to achieve resilience promises knowledge-based approaches to enhance food security. DNA sequence polymorphisms can reveal genomic regions that are linked to beneficial traits of plants. However, our ability to interpret how a given DNA sequence polymorphism confers a fitness advantage at the molecular level often remains poor. A key factor is that these polymorphisms largely localize to the enigmatic non-coding genome. Here, we review the functional impact of sequence variations in the non-coding genome on plant biology in the context of crop breeding and agricultural traits. We focus on examples of non-coding with particularly convincing functional support. Our survey combines findings that are consistent with the view that the non-coding genome contributes to cellular mechanisms assisting many plant traits. Understanding how DNA sequence polymorphisms in the non-coding genome shape plant traits at the molecular level offers a largely unexplored reservoir of solutions to address future challenges in plant growth and resilience.
Collapse
Affiliation(s)
- Giorgio Gullotta
- Copenhagen Plant Science Centre, Department of Plant and Environmental Sciences, University of Copenhagen, Bülowsvej 21A, 1871 Frederiksberg, Denmark
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Hubland Nord 32, 97074 Würzburg, Germany
| | - Sebastian Marquardt
- Copenhagen Plant Science Centre, Department of Plant and Environmental Sciences, University of Copenhagen, Bülowsvej 21A, 1871 Frederiksberg, Denmark
| |
Collapse
|
14
|
John M, Grimm D, Korte A. Predicting Gene Regulatory Interactions Using Natural Genetic Variation. Methods Mol Biol 2023; 2698:301-322. [PMID: 37682482 DOI: 10.1007/978-1-0716-3354-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Genome-wide association studies (GWAS) are a powerful tool to elucidate the genotype-phenotype map. Although GWAS are usually used to assess simple univariate associations between genetic markers and traits of interest, it is also possible to infer the underlying genetic architecture and to predict gene regulatory interactions. In this chapter, we describe the latest methods and tools to perform GWAS by calculating permutation-based significance thresholds. For this purpose, we first provide guidelines on univariate GWAS analyses that are extended in the second part of this chapter to more complex models that enable the inference of gene regulatory networks and how these networks vary.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
| | - Dominik Grimm
- Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
15
|
Bercovich N, Genze N, Todesco M, Owens GL, Légaré JS, Huang K, Rieseberg LH, Grimm DG. HeliantHOME, a public and centralized database of phenotypic sunflower data. Sci Data 2022; 9:735. [PMID: 36450875 PMCID: PMC9712528 DOI: 10.1038/s41597-022-01842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 11/11/2022] [Indexed: 12/02/2022] Open
Abstract
Genomic studies often attempt to link natural genetic variation with important phenotypic variation. To succeed, robust and reliable phenotypic data, as well as curated genomic assemblies, are required. Wild sunflowers, originally from North America, are adapted to diverse and often extreme environments and have historically been a widely used model plant system for the study of population genomics, adaptation, and speciation. Moreover, cultivated sunflower, domesticated from a wild relative (Helianthus annuus) is a global oil crop, ranking fourth in production of vegetable oils worldwide. Public availability of data resources both for the plant research community and for the associated agricultural sector, are extremely valuable. We have created HeliantHOME ( http://www.helianthome.org ), a curated, public, and interactive database of phenotypes including developmental, structural and environmental ones, obtained from a large collection of both wild and cultivated sunflower individuals. Additionally, the database is enriched with external genomic data and results of genome-wide association studies. Finally, being a community open-source platform, HeliantHOME is expected to expand as new knowledge and resources become available.
Collapse
Affiliation(s)
- Natalia Bercovich
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Nikita Genze
- grid.6936.a0000000123222966Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany ,grid.4819.40000 0001 0704 7467Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
| | - Marco Todesco
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Gregory L. Owens
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada ,grid.143640.40000 0004 1936 9465Department of Biology, University of Victoria, Victoria, BC Canada
| | - Jean-Sébastien Légaré
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada ,grid.17091.3e0000 0001 2288 9830Department of Computer Science, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Data Science Institute, University of British Columbia, Vancouver, British Columbia Canada
| | - Kaichi Huang
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Loren H. Rieseberg
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Dominik G. Grimm
- grid.6936.a0000000123222966Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany ,grid.4819.40000 0001 0704 7467Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany ,grid.6936.a0000000123222966Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|
16
|
John M, Haselbeck F, Dass R, Malisi C, Ricca P, Dreischer C, Schultheiss SJ, Grimm DG. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species. FRONTIERS IN PLANT SCIENCE 2022; 13:932512. [PMID: 36407627 PMCID: PMC9673477 DOI: 10.3389/fpls.2022.932512] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | - Florian Haselbeck
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | | | | | | | | | | | - Dominik G. Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
- Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|