1
|
Hafner A, DeLeo V, Deng CH, Elsik CG, S Fleming D, Harrison PW, Kalbfleisch TS, Petry B, Pucker B, Quezada-Rodríguez EH, Tuggle CK, Koltes JE. Data reuse in agricultural genomics research: challenges and recommendations. Gigascience 2025; 14:giae106. [PMID: 39804724 PMCID: PMC11727710 DOI: 10.1093/gigascience/giae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 09/17/2024] [Accepted: 11/26/2024] [Indexed: 01/16/2025] Open
Abstract
The scientific community has long benefited from the opportunities provided by data reuse. Recognizing the need to identify the challenges and bottlenecks to reuse in the agricultural research community and propose solutions for them, the data reuse working group was started within the AgBioData consortium framework. Here, we identify the limitations of data standards, metadata deficiencies, data interoperability, data ownership, data availability, user skill level, resource availability, and equity issues, with a specific focus on agricultural genomics research. We propose possible solutions stakeholders could implement to mitigate and overcome these challenges and provide an optimistic perspective on the future of genomics and transcriptomics data reuse.
Collapse
Affiliation(s)
- Alenka Hafner
- Department of Biology, Frear North, Pennsylvania State University, University Park, PA, 16802, US
- Intercollege Graduate Degree Program in Plant Biology, Pennsylvania State University, University Park, PA, 16802, US
| | | | - Cecilia H Deng
- New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, Auckland, 1025, New Zealand
| | - Christine G Elsik
- Division of Animal Sciences and Division of Plant Science & Technology, University of Missouri, MO, 65211, US
- Institute for Data Science & Informatics, University of Missouri, MO, 65211, US
| | - Damarius S Fleming
- Animal Parasitic Diseases Laboratory, United States Department of Agriculture Agricultural Research Service, Beltsville, MD, 20705, US
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire, CB10 1SD, UK
| | - Theodore S Kalbfleisch
- Department of Veterinary Science, Martin-Gatton College of Agriculture, Food, and Environment, University of Kentucky, Lexington, KY, 40202, US
| | - Bruna Petry
- Department of Animal Science, Iowa State University, Ames, IA, 50011, US
| | - Boas Pucker
- Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, 38106, Germany
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, 04510, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, 04510, México
| | | | - James E Koltes
- Department of Animal Science, Iowa State University, Ames, IA, 50011, US
| |
Collapse
|
2
|
Cordeiro D, Pizarro A, Vélez MD, Guevara MÁ, de María N, Ramos P, Cobo-Simón I, Diez-Galán A, Benavente A, Ferreira V, Martín MÁ, Rodríguez-González PM, Solla A, Cervera MT, Diez-Casero JJ, Cabezas JA, Díaz-Sala C. Breeding Alnus species for resistance to Phytophthora disease in the Iberian Peninsula. FRONTIERS IN PLANT SCIENCE 2024; 15:1499185. [PMID: 39717726 PMCID: PMC11663675 DOI: 10.3389/fpls.2024.1499185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 11/20/2024] [Indexed: 12/25/2024]
Abstract
Alders are widely distributed riparian trees in Europe, North Africa and Western Asia. Recently, a strong reduction of alder stands has been detected in Europe due to infection by Phytophthora species (Stramenopila kingdom). This infection causes a disease known as alder dieback, characterized by leaf yellowing, dieback of branches, increased fruit production, and bark necrosis in the collar and basal part of the stem. In the Iberian Peninsula, the drastic alder decline has been confirmed in the Spanish Ulla and Ebro basins, the Portuguese Mondego and Sado basins and the Northern and Western transboundary hydrographic basins of Miño and Sil, Limia, Douro and Tagus. The damaging effects of alder decline require management solutions that promote forest resilience while keeping genetic diversity. Breeding programs involve phenotypic selection of asymptomatic individuals in populations where severe damage is observed, confirmation of tree resistance via inoculation trials under controlled conditions, vegetative propagation of selected trees, further planting and assessment in areas with high disease pressure and different environmental conditions and conservation of germplasm of tolerant genotypes for reforestation. In this way, forest biotechnology provides essential tools for the conservation and sustainable management of forest genetic resources, including material characterization for tolerance, propagation for conservation purposes, and genetic resource traceability, as well as identification and characterization of Phytophthora species. The advancement of biotechnological techniques enables improved monitoring and management of natural resources by studying genetic variability and function through molecular biology methods. In addition, in vitro culture techniques make possible large-scale plant propagation and long-term conservation within breeding programs to preserve selected outstanding genotypes.
Collapse
Affiliation(s)
- Daniela Cordeiro
- Departamento de Ciencias de la Vida, Facultad de Ciencias, Universidad de Alcalá, Madrid, Spain
| | - Alberto Pizarro
- Departamento de Ciencias de la Vida, Facultad de Ciencias, Universidad de Alcalá, Madrid, Spain
| | - M. Dolores Vélez
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - M. Ángeles Guevara
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Nuria de María
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Paula Ramos
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Irene Cobo-Simón
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Alba Diez-Galán
- Instituto Universitario de Investigación en Gestión Forestal Sostenible (iuFOR), Universidad de Valladolid, Palencia, Spain
- Departamento de Producción Vegetal y Recursos Forestales, Escuela Técnica Superior de Ingenierías Agrarias (ETSIIAA), Universidad de Valladolid, Palencia, Spain
| | - Alfredo Benavente
- Instituto Universitario de Investigación en Gestión Forestal Sostenible (iuFOR), Universidad de Valladolid, Palencia, Spain
- Departamento de Producción Vegetal y Recursos Forestales, Escuela Técnica Superior de Ingenierías Agrarias (ETSIIAA), Universidad de Valladolid, Palencia, Spain
| | - Verónica Ferreira
- MARE – Marine and Environmental Sciences Centre, ARNET – Aquatic Research Network, Department of Life Sciences, University of Coimbra, Coimbra, Portugal
| | - M. Ángela Martín
- Departamento de Genética, Escuela Técnica Superior de Ingeniería Agronómica y de Montes (ETSIAM), Universidad de Córdoba, Córdoba, Spain
| | | | - Alejandro Solla
- Ingeniería Forestal y Medio Natural, Centro Universitario de Plasencia, Instituto Universitario de Investigación de la Dehesa (INDEHESA), Universidad de Extremadura, Plasencia, Spain
| | - M. Teresa Cervera
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Julio Javier Diez-Casero
- Instituto Universitario de Investigación en Gestión Forestal Sostenible (iuFOR), Universidad de Valladolid, Palencia, Spain
- Departamento de Producción Vegetal y Recursos Forestales, Escuela Técnica Superior de Ingenierías Agrarias (ETSIIAA), Universidad de Valladolid, Palencia, Spain
| | - José Antonio Cabezas
- Departamento de Ecología y Genética Forestal, Instituto de Ciencias Forestales (ICIFOR), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria - Consejo Superior de Investigaciones Científicas (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Carmen Díaz-Sala
- Departamento de Ciencias de la Vida, Facultad de Ciencias, Universidad de Alcalá, Madrid, Spain
| |
Collapse
|
3
|
Sunil RS, Lim SC, Itharajula M, Mutwil M. The gene function prediction challenge: Large language models and knowledge graphs to the rescue. CURRENT OPINION IN PLANT BIOLOGY 2024; 82:102665. [PMID: 39579414 DOI: 10.1016/j.pbi.2024.102665] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/23/2024] [Accepted: 10/24/2024] [Indexed: 11/25/2024]
Abstract
Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ∼15 % of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterization of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.
Collapse
Affiliation(s)
- Rohan Shawn Sunil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Shan Chun Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Manoj Itharajula
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
4
|
Kaňovská I, Biová J, Škrabišová M. New perspectives of post-GWAS analyses: From markers to causal genes for more precise crop breeding. CURRENT OPINION IN PLANT BIOLOGY 2024; 82:102658. [PMID: 39549685 DOI: 10.1016/j.pbi.2024.102658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 10/08/2024] [Accepted: 10/19/2024] [Indexed: 11/18/2024]
Abstract
Crop breeding advancement is hindered by the imperfection of methods to reveal genes underlying key traits. Genome-wide Association Study (GWAS) is one such method, identifying genomic regions linked to phenotypes. Post-GWAS analyses predict candidate genes and assist in causative mutation (CM) recognition. Here, we assess post-GWAS approaches, address limitations in omics data integration and stress the importance of evaluating associated variants within a broader context of publicly available datasets. Recent advances in bioinformatics tools and genomic strategies for CM identification and allelic variation exploration are reviewed. We discuss the role of markers and marker panel development for more precise breeding. Finally, we highlight the perspectives and challenges of GWAS-based CM prediction for complex quantitative traits.
Collapse
Affiliation(s)
- Ivana Kaňovská
- Department of Biochemistry, Faculty of Science, Palacký University in Olomouc, Šlechtitelů 27, Olomouc 77900, Czech Republic
| | - Jana Biová
- Department of Biochemistry, Faculty of Science, Palacký University in Olomouc, Šlechtitelů 27, Olomouc 77900, Czech Republic
| | - Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacký University in Olomouc, Šlechtitelů 27, Olomouc 77900, Czech Republic.
| |
Collapse
|
5
|
Liu C, Li T, Cui L, Wang N, Huang G, Li R. OrangeExpDB: an integrative gene expression database for Citrus spp. BMC Genomics 2024; 25:521. [PMID: 38802746 PMCID: PMC11129468 DOI: 10.1186/s12864-024-10445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/22/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND Citrus is a major fruit crop, and RNA-sequencing (RNA-seq) data can be utilized to investigate its gene functions, heredity, evolution, development, and the detection of genes linked to essential traits or resistance to pathogens. However, it is challenging to use the public RNA-seq datasets for researchers without bioinformatics training, and expertise. RESULTS OrangeExpDB is a web-based database that integrates transcriptome data of various Citrus spp., including C. limon (L.) Burm., C. maxima (Burm.) Merr., C. reticulata Blanco, C. sinensis (L.) Osbeck, and Poncirus trifoliata (L.) Raf., downloaded from the NCBI SRA database. It features a blast tool for browsing and searching, enabling quick download of expression matrices for different transcriptome samples. Expression of genes of interest can be easily generated by searching gene IDs or sequence similarity. Expression data in text format can be downloaded and presented as a heatmap, with additional sample information provided at the bottom of the webpage. CONCLUSIONS Researchers can utilize OrangeExpDB to facilitate functional genomic analysis and identify key candidate genes, leveraging publicly available citrus RNA-seq datasets. OrangeExpDB can be accessed at http://www.orangeexpdb.com/ .
Collapse
Affiliation(s)
- Chang Liu
- College of Life Sciences, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Tingting Li
- College of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Licao Cui
- College of Bioscience and Engineering, Jiangxi Agricultural University, Nanchang, Jiangxi, 330045, China
| | - Nian Wang
- Citrus Research and Education Center, Department of Microbiology and Cell Science, IFAS, University of Florida, Lake Alfred, FL, USA
| | - Guiyan Huang
- College of Life Sciences, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
| | - Ruimin Li
- College of Life Sciences, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
| |
Collapse
|
6
|
Reiser L, Bakker E, Subramaniam S, Chen X, Sawant S, Khosa K, Prithvi T, Berardini TZ. The Arabidopsis Information Resource in 2024. Genetics 2024; 227:iyae027. [PMID: 38457127 PMCID: PMC11075553 DOI: 10.1093/genetics/iyae027] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/07/2024] [Indexed: 03/09/2024] Open
Abstract
Since 1999, The Arabidopsis Information Resource (www.arabidopsis.org) has been curating data about the Arabidopsis thaliana genome. Its primary focus is integrating experimental gene function information from the peer-reviewed literature and codifying it as controlled vocabulary annotations. Our goal is to produce a "gold standard" functional annotation set that reflects the current state of knowledge about the Arabidopsis genome. At the same time, the resource serves as a nexus for community-based collaborations aimed at improving data quality, access, and reuse. For the past decade, our work has been made possible by subscriptions from our global user base. This update covers our ongoing biocuration work, some of our modernization efforts that contribute to the first major infrastructure overhaul since 2011, the introduction of JBrowse2, and the resource's role in community activities such as organizing the structural reannotation of the genome. For gene function assessment, we used gene ontology annotations as a metric to evaluate: (1) what is currently known about Arabidopsis gene function and (2) the set of "unknown" genes. Currently, 74% of the proteome has been annotated to at least one gene ontology term. Of those loci, half have experimental support for at least one of the following aspects: molecular function, biological process, or cellular component. Our work sheds light on the genes for which we have not yet identified any published experimental data and have no functional annotation. Drawing attention to these unknown genes highlights knowledge gaps and potential sources of novel discoveries.
Collapse
|