1
|
Taddese AA, Addis AC, Tam BT. Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications. Hum Genomics 2025; 19:16. [PMID: 39988670 PMCID: PMC11849233 DOI: 10.1186/s40246-025-00716-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/09/2025] [Indexed: 02/25/2025] Open
Abstract
BACKGROUND Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems. METHODS We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations. RESULTS Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards. CONCLUSIONS These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.
Collapse
Affiliation(s)
- Asefa Adimasu Taddese
- Academy of Wellness and Human Development, Faculty of Arts and Social Sciences, Hong Kong Baptist University, Hong Kong SAR, China
| | - Assefa Chekole Addis
- Department of Information Science, College of Informatics, University of Gondar, Gondar, Ethiopia
| | - Bjorn T Tam
- Academy of Wellness and Human Development, Faculty of Arts and Social Sciences, Hong Kong Baptist University, Hong Kong SAR, China.
- Dr. Stephen Hui Research Centre for Physical Recreation and Wellness, Faculty of Arts and Social Sciences, Hong Kong Baptist University, Hong Kong SAR, China.
| |
Collapse
|
2
|
Ahmad A, Liew AXW, Venturini F, Kalogeras A, Candiani A, Di Benedetto G, Ajibola S, Cartujo P, Romero P, Lykoudi A, De Grandis MM, Xouris C, Lo Bianco R, Doddy I, Elegbede I, D'Urso Labate GF, García del Moral LF, Martos V. AI can empower agriculture for global food security: challenges and prospects in developing nations. Front Artif Intell 2024; 7:1328530. [PMID: 38726306 PMCID: PMC11081032 DOI: 10.3389/frai.2024.1328530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/11/2024] [Indexed: 05/12/2024] Open
Abstract
Food and nutrition are a steadfast essential to all living organisms. With specific reference to humans, the sufficient and efficient supply of food is a challenge as the world population continues to grow. Artificial Intelligence (AI) could be identified as a plausible technology in this 5th industrial revolution in bringing us closer to achieving zero hunger by 2030-Goal 2 of the United Nations Sustainable Development Goals (UNSDG). This goal cannot be achieved unless the digital divide among developed and underdeveloped countries is addressed. Nevertheless, developing and underdeveloped regions fall behind in economic resources; however, they harbor untapped potential to effectively address the impending demands posed by the soaring world population. Therefore, this study explores the in-depth potential of AI in the agriculture sector for developing and under-developed countries. Similarly, it aims to emphasize the proven efficiency and spin-off applications of AI in the advancement of agriculture. Currently, AI is being utilized in various spheres of agriculture, including but not limited to crop surveillance, irrigation management, disease identification, fertilization practices, task automation, image manipulation, data processing, yield forecasting, supply chain optimization, implementation of decision support system (DSS), weed control, and the enhancement of resource utilization. Whereas AI supports food safety and security by ensuring higher crop yields that are acquired by harnessing the potential of multi-temporal remote sensing (RS) techniques to accurately discern diverse crop phenotypes, monitor land cover dynamics, assess variations in soil organic matter, predict soil moisture levels, conduct plant biomass modeling, and enable comprehensive crop monitoring. The present study identifies various challenges, including financial, infrastructure, experts, data availability, customization, regulatory framework, cultural norms and attitudes, access to market, and interdisciplinary collaboration, in the adoption of AI for developing nations with their subsequent remedies. The identification of challenges and opportunities in the implementation of AI could ignite further research and actions in these regions; thereby supporting sustainable development.
Collapse
Affiliation(s)
- Ali Ahmad
- Research Institute for Integrated Coastal Zone Management, Polytechnic University of Valencia, Grau de Gandia, Valencia, Spain
| | | | - Francesca Venturini
- Institute of Applied Mathematics and Physics, Zurich University of Applied Sciences, Winterthur, Switzerland
- TOELT LLC, Dübendorf, Switzerland
| | | | | | | | - Segun Ajibola
- Afridat UG, Bonn, Germany
- NOVA IMS, Universidade Nova de Lisboa, Campus de Campolide, Lisbon, Portugal
| | - Pedro Cartujo
- Department of Electronic and Computer Technology, University of Granada, Granada, Spain
| | - Pablo Romero
- GRANIOT Satellite Technologies S.L, Granada, Spain
| | | | | | - Christos Xouris
- Gaia Robotics Idiotiki Kefalaiouxiki Etaireia, Patras, Greece
| | - Riccardo Lo Bianco
- Department of Agricultural, Food and Forest Sciences, University of Palermo, Viale delle Scienze, Palermo, Italy
| | - Irawan Doddy
- Department of Mechanical Engineering, Universitas Muhammadiyah Pontianak – Universitas, Kalimantan Barat, Indonesia
| | | | | | - Luis F. García del Moral
- Department of Plant Physiology, Institute of Biotechnology, University of Granada, Granada, Spain
| | - Vanessa Martos
- Department of Plant Physiology, Institute of Biotechnology, University of Granada, Granada, Spain
| |
Collapse
|
3
|
Harfouche AL, Nakhle F, Harfouche AH, Sardella OG, Dart E, Jacobson D. A primer on artificial intelligence in plant digital phenomics: embarking on the data to insights journey. TRENDS IN PLANT SCIENCE 2023; 28:154-184. [PMID: 36167648 DOI: 10.1016/j.tplants.2022.08.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 08/22/2022] [Accepted: 08/25/2022] [Indexed: 06/16/2023]
Abstract
Artificial intelligence (AI) has emerged as a fundamental component of global agricultural research that is poised to impact on many aspects of plant science. In digital phenomics, AI is capable of learning intricate structure and patterns in large datasets. We provide a perspective and primer on AI applications to phenome research. We propose a novel human-centric explainable AI (X-AI) system architecture consisting of data architecture, technology infrastructure, and AI architecture design. We clarify the difference between post hoc models and 'interpretable by design' models. We include guidance for effectively using an interpretable by design model in phenomic analysis. We also provide directions to sources of tools and resources for making data analytics increasingly accessible. This primer is accompanied by an interactive online tutorial.
Collapse
Affiliation(s)
- Antoine L Harfouche
- Department for Innovation in Biological, Agro-Food, and Forest Systems, University of Tuscia, Viterbo, VT 01100, Italy.
| | - Farid Nakhle
- Department for Innovation in Biological, Agro-Food, and Forest Systems, University of Tuscia, Viterbo, VT 01100, Italy
| | - Antoine H Harfouche
- Unité de Formation et de Recherche en Sciences Économiques, Gestion, Mathématiques, et Informatique, Université Paris Nanterre, 92001 Nanterre, France
| | - Orlando G Sardella
- Department for Innovation in Biological, Agro-Food, and Forest Systems, University of Tuscia, Viterbo, VT 01100, Italy
| | - Eli Dart
- Energy Sciences Network (ESnet), Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Daniel Jacobson
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
4
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
5
|
Khan MHU, Wang S, Wang J, Ahmar S, Saeed S, Khan SU, Xu X, Chen H, Bhat JA, Feng X. Applications of Artificial Intelligence in Climate-Resilient Smart-Crop Breeding. Int J Mol Sci 2022; 23:11156. [PMID: 36232455 PMCID: PMC9570104 DOI: 10.3390/ijms231911156] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 11/21/2022] Open
Abstract
Recently, Artificial intelligence (AI) has emerged as a revolutionary field, providing a great opportunity in shaping modern crop breeding, and is extensively used indoors for plant science. Advances in crop phenomics, enviromics, together with the other "omics" approaches are paving ways for elucidating the detailed complex biological mechanisms that motivate crop functions in response to environmental trepidations. These "omics" approaches have provided plant researchers with precise tools to evaluate the important agronomic traits for larger-sized germplasm at a reduced time interval in the early growth stages. However, the big data and the complex relationships within impede the understanding of the complex mechanisms behind genes driving the agronomic-trait formations. AI brings huge computational power and many new tools and strategies for future breeding. The present review will encompass how applications of AI technology, utilized for current breeding practice, assist to solve the problem in high-throughput phenotyping and gene functional analysis, and how advances in AI technologies bring new opportunities for future breeding, to make envirotyping data widely utilized in breeding. Furthermore, in the current breeding methods, linking genotype to phenotype remains a massive challenge and impedes the optimal application of high-throughput field phenotyping, genomics, and enviromics. In this review, we elaborate on how AI will be the preferred tool to increase the accuracy in high-throughput crop phenotyping, genotyping, and envirotyping data; moreover, we explore the developing approaches and challenges for multiomics big computing data integration. Therefore, the integration of AI with "omics" tools can allow rapid gene identification and eventually accelerate crop-improvement programs.
Collapse
Affiliation(s)
- Muhammad Hafeez Ullah Khan
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| | - Shoudong Wang
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| | - Jun Wang
- Zhejiang Lab, Hangzhou 310012, China
| | - Sunny Ahmar
- Institute of Biology, Biotechnology and Environmental Protection, Faculty of Natural Sciences, University of Silesia, Jagiellonska 28, 40-032 Katowice, Poland
| | - Sumbul Saeed
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Shahid Ullah Khan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | | | | | | | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| |
Collapse
|