1
|
Temesgen SA, Ahmad B, Grace-Mercure BK, Liu M, Liu L, Lin H, Deng K. Exploring species taxonomic kingdom using information entropy and nucleotide compositional features of coding sequences based on machine learning methods. Methods 2025; 240:165-179. [PMID: 40280261 DOI: 10.1016/j.ymeth.2025.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 03/08/2025] [Accepted: 03/31/2025] [Indexed: 04/29/2025] Open
Abstract
The flow of genetic information from DNA to protein is governed by the central dogma of molecular biology. Genetic drift and mutations usually lead to changes in DNA composition, thereby affecting the coding sequences (CDS) that encode functional proteins. Analyzing the nucleotide distribution in the coding regions of species is crucial for understanding their evolution. In this study, we applied Markov processes to analyze codon formation in 37,031,061 CDSs across 3,735 species genomes, spanning viruses, archaea, bacteria, and eukaryotes, to explore compositional changes. Our results revealed species preferences for different nucleotides. Information entropies and Markov information densities show that eukaryotes exhibit higher redundancy, followed by viruses, suggesting more gene duplication in eukaryotes and high mutation rates in viruses. Evolutionary trends showed an increase in information entropy and a decrease in Markov entropy, with negative correlations between first- and second-order Markov information densities. Furthermore, uniform manifold approximation and projection (UMAP) was used to reduce information redundancy for revealing unique evolutionary patterns in species classification. The machine learning methods demonstrated excellent performance in species classification accuracy, providing profound insights into CDS evolution and protein synthesis.
Collapse
Affiliation(s)
- Sebu Aboma Temesgen
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Basharat Ahmad
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | | | - Minghao Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Kejun Deng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
| |
Collapse
|
2
|
Jamal QMS, Ahmad V. Bacterial metabolomics: current applications for human welfare and future aspects. JOURNAL OF ASIAN NATURAL PRODUCTS RESEARCH 2025; 27:207-230. [PMID: 39078342 DOI: 10.1080/10286020.2024.2385365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 07/31/2024]
Abstract
An imbalanced microbiome is linked to several diseases, such as cancer, inflammatory bowel disease, obesity, and even neurological disorders. Bacteria and their by-products are used for various industrial and clinical purposes. The metabolites under discussion were chosen based on their biological impacts on host and gut microbiota interactions as established by metabolome research. The separation of bacterial metabolites by using statistics and machine learning analysis creates new opportunities for applications of bacteria and their metabolites in the environmental and medical sciences. Thus, the metabolite production strategies, methodologies, and importance of bacterial metabolites for human well-being are discussed in this review.
Collapse
Affiliation(s)
- Qazi Mohammad Sajid Jamal
- Department of Health Informatics, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia
| | - Varish Ahmad
- Health Information Technology Department, The Applied College, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
3
|
Ashikhmina MS, Zenkin AM, Ivanova AO, Pavlishina IR, Orlova OY, Pantiukhin IS, Skorb EV. Large Language Model for Automating the Analysis of Cryoprotectants. J Chem Inf Model 2025; 65:162-172. [PMID: 39723911 DOI: 10.1021/acs.jcim.4c02049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2024]
Abstract
The rapid expansion of scientific literature necessitates developing efficient data extraction and analysis methods. This study presents an innovative approach to automating the extraction of cryoprotectant information from scientific publications using a generative pretrained transformer (GPT) model integrated with a Telegram bot interface. Our system processes and analyzes scientific articles to identify and extract relevant data on cryoprotectants and bacteria, significantly reducing the time required for researchers to gather essential information. Our method optimizes the workflow for researchers in cryopreservation and related fields by utilizing modern artificial intelligence technologies, specifically large language models. The Telegram bot, designed to be user-friendly, provides a comfortable and easy platform for quick data access, enhancing scientific research efficiency. The study's methodology involves data preparation, algorithm development, and system validation using a substantial data set of scientific articles. Results demonstrate the model's capability to accurately recognize and extract critical information, although some limitations in term specificity were noted. Our findings suggest that further refinement and training of the model can enhance its accuracy and reliability for specialized scientific applications.
Collapse
Affiliation(s)
| | - Artemii M Zenkin
- ITMO University, 9, Lomonosova str, St. Petersburg 191002, Russia
| | | | | | - Olga Y Orlova
- ITMO University, 9, Lomonosova str, St. Petersburg 191002, Russia
| | | | | |
Collapse
|
4
|
Bobbo T, Biscarini F, Yaddehige SK, Alberghini L, Rigoni D, Bianchi N, Taccioli C. Machine learning classification of archaea and bacteria identifies novel predictive genomic features. BMC Genomics 2024; 25:955. [PMID: 39402493 PMCID: PMC11472548 DOI: 10.1186/s12864-024-10832-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Archaea and Bacteria are distinct domains of life that are adapted to a variety of ecological niches. Several genome-based methods have been developed for their accurate classification, yet many aspects of the specific genomic features that determine these differences are not fully understood. In this study, we used publicly available whole-genome sequences from bacteria ( N = 2546 ) and archaea ( N = 109 ). From these, a set of genomic features (nucleotide frequencies and proportions, coding sequences (CDS), non-coding, ribosomal and transfer RNA genes (ncRNA, rRNA, tRNA), Chargaff's, topological entropy and Shannon's entropy scores) was extracted and used as input data to develop machine learning models for the classification of archaea and bacteria. RESULTS The classification accuracy ranged from 0.993 (Random Forest) to 0.998 (Neural Networks). Over the four models, only 11 examples were misclassified, especially those belonging to the minority class (Archaea). From variable importance, tRNA topological and Shannon's entropy, nucleotide frequencies in tRNA, rRNA and ncRNA, CDS, tRNA and rRNA Chargaff's scores have emerged as the top discriminating factors. In particular, tRNA entropy (both topological and Shannon's) was the most important genomic feature for classification, pointing at the complex interactions between the genetic code, tRNAs and the translational machinery. CONCLUSIONS tRNA, rRNA and ncRNA genes emerged as the key genomic elements that underpin the classification of archaea and bacteria. In particular, higher nucleotide diversity was found in tRNA from bacteria compared to archaea. The analysis of the few classification errors reflects the complex phylogenetic relationships between bacteria, archaea and eukaryotes.
Collapse
Affiliation(s)
- Tania Bobbo
- Institute for Biomedical Technologies, National Research Council (CNR), Via Fratelli Cervi 93, Segrate (MI), 20054, Italy
| | - Filippo Biscarini
- Institute of Agricultural Biology and Biotechnology, National Research Council (CNR), Via Edoardo Bassini 15, Milano, 20133, Italy.
| | - Sachithra K Yaddehige
- Department of Animal Medicine, Health and Production, University of Padova, Viale dell'Universitá 16, Legnaro, 35020, Italy
| | - Leonardo Alberghini
- Department of Animal Medicine, Health and Production, University of Padova, Viale dell'Universitá 16, Legnaro, 35020, Italy
| | - Davide Rigoni
- Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Via Francesco Marzolo 5, Padova, 35131, Italy
| | - Nicoletta Bianchi
- Department of Translational Medicine, University of Ferrara, Via Luigi Borsari 46, Ferrara, 44121, Italy.
| | - Cristian Taccioli
- Department of Animal Medicine, Health and Production, University of Padova, Viale dell'Universitá 16, Legnaro, 35020, Italy.
| |
Collapse
|
5
|
Taveira IC, Carraro CB, Nogueira KMV, Pereira LMS, Bueno JGR, Fiamenghi MB, dos Santos LV, Silva RN. Structural and biochemical insights of xylose MFS and SWEET transporters in microbial cell factories: challenges to lignocellulosic hydrolysates fermentation. Front Microbiol 2024; 15:1452240. [PMID: 39397797 PMCID: PMC11466781 DOI: 10.3389/fmicb.2024.1452240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 09/16/2024] [Indexed: 10/15/2024] Open
Abstract
The production of bioethanol from lignocellulosic biomass requires the efficient conversion of glucose and xylose to ethanol, a process that depends on the ability of microorganisms to internalize these sugars. Although glucose transporters exist in several species, xylose transporters are less common. Several types of transporters have been identified in diverse microorganisms, including members of the Major Facilitator Superfamily (MFS) and Sugars Will Eventually be Exported Transporter (SWEET) families. Considering that Saccharomyces cerevisiae lacks an effective xylose transport system, engineered yeast strains capable of efficiently consuming this sugar are critical for obtaining high ethanol yields. This article reviews the structure-function relationship of sugar transporters from the MFS and SWEET families. It provides information on several tools and approaches used to identify and characterize them to optimize xylose consumption and, consequently, second-generation ethanol production.
Collapse
Affiliation(s)
- Iasmin Cartaxo Taveira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Cláudia Batista Carraro
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Karoline Maria Vieira Nogueira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Lucas Matheus Soares Pereira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - João Gabriel Ribeiro Bueno
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| | - Mateus Bernabe Fiamenghi
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| | - Leandro Vieira dos Santos
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
- Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom
| | - Roberto N. Silva
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| |
Collapse
|
6
|
Konno N, Maeno S, Tanizawa Y, Arita M, Endo A, Iwasaki W. Evolutionary paths toward multi-level convergence of lactic acid bacteria in fructose-rich environments. Commun Biol 2024; 7:902. [PMID: 39048718 PMCID: PMC11269746 DOI: 10.1038/s42003-024-06580-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 07/11/2024] [Indexed: 07/27/2024] Open
Abstract
Convergence provides clues to unveil the non-random nature of evolution. Intermediate paths toward convergence inform us of the stochasticity and the constraint of evolutionary processes. Although previous studies have suggested that substantial constraints exist in microevolutionary paths, it remains unclear whether macroevolutionary convergence follows stochastic or constrained paths. Here, we performed comparative genomics for hundreds of lactic acid bacteria (LAB) species, including clades showing a convergent gene repertoire and sharing fructose-rich habitats. By adopting phylogenetic comparative methods we showed that the genomic convergence of distinct fructophilic LAB (FLAB) lineages was caused by parallel losses of more than a hundred orthologs and the gene losses followed significantly similar orders. Our results further suggested that the loss of adhE, a key gene for phenotypic convergence to FLAB, follows a specific evolutionary path of domain architecture decay and amino acid substitutions in multiple LAB lineages sharing fructose-rich habitats. These findings unveiled the constrained evolutionary paths toward the convergence of free-living bacterial clades at the genomic and molecular levels.
Collapse
Affiliation(s)
- Naoki Konno
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| | - Shintaro Maeno
- Research Center for Advance Science and Innovation Organization for Research Initiatives, Yamaguchi University, Yamaguchi, Yamaguchi, Japan
| | - Yasuhiro Tanizawa
- Department of Informatics, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Masanori Arita
- Department of Informatics, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Akihito Endo
- Department of Nutritional Science and Food Safety, Faculty of Applied Bioscience, Tokyo University of Agriculture, Tokyo, Japan
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
- Atmosphere and Ocean Research Institute, The University of Tokyo, Kashiwa, Chiba, Japan.
- Institute for Quantitative Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| |
Collapse
|
7
|
Wolfe JM. Pangenomes at the limits of evolution. Trends Ecol Evol 2024; 39:419-420. [PMID: 38580497 DOI: 10.1016/j.tree.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 03/25/2024] [Indexed: 04/07/2024]
Abstract
Evolutionary pathways can be random or deterministic. In a recent article, Beavan et al. investigate this balance by applying machine learning models to microbial pangenomes. The presence of almost one-third of genes can be reliably inferred, indicating a surprising amount of predictable evolution.
Collapse
Affiliation(s)
- Joanna M Wolfe
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA; Department of Organismic & Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
8
|
Garzon MH, Colorado FA. Towards an Analytical Biology. Curr Genomics 2024; 25:65-68. [PMID: 38751597 PMCID: PMC11092911 DOI: 10.2174/0113892029283759231227075715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 11/22/2023] [Accepted: 12/14/2023] [Indexed: 05/18/2024] Open
Abstract
This article draws a perspective on the increasingly unavoidable question of whether steps can be taken in genomics and biology at large to move them more rapidly towards more analytical and deductive biology, akin to similar developments that occurred in other natural sciences, such as physics and chemistry, centuries ago. It provides a summary of recent advances in other relevant sciences in the last 3 decades that are likely to pull it in that direction in the next decade or so, as well as what methods and tools will make it possible.
Collapse
Affiliation(s)
- Max H. Garzon
- Department of Computer Science, University of Memphis, 373 Dunn, USA
| | - Fredy A. Colorado
- Department of Biology, National University of Colombia, Bogotá, Colombia
| |
Collapse
|
9
|
Hwang Y, Cornman AL, Kellogg EH, Ovchinnikov S, Girguis PR. Genomic language model predicts protein co-regulation and function. Nat Commun 2024; 15:2880. [PMID: 38570504 PMCID: PMC10991518 DOI: 10.1038/s41467-024-46947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open
Abstract
Deciphering the relationship between a gene and its genomic context is fundamental to understanding and engineering biological systems. Machine learning has shown promise in learning latent relationships underlying the sequence-structure-function paradigm from massive protein sequence datasets. However, to date, limited attempts have been made in extending this continuum to include higher order genomic context information. Evolutionary processes dictate the specificity of genomic contexts in which a gene is found across phylogenetic distances, and these emergent genomic patterns can be leveraged to uncover functional relationships between gene products. Here, we train a genomic language model (gLM) on millions of metagenomic scaffolds to learn the latent functional and regulatory relationships between genes. gLM learns contextualized protein embeddings that capture the genomic context as well as the protein sequence itself, and encode biologically meaningful and functionally relevant information (e.g. enzymatic function, taxonomy). Our analysis of the attention patterns demonstrates that gLM is learning co-regulated functional modules (i.e. operons). Our findings illustrate that gLM's unsupervised deep learning of the metagenomic corpus is an effective and promising approach to encode functional semantics and regulatory syntax of genes in their genomic contexts and uncover complex relationships between genes in a genomic region.
Collapse
Affiliation(s)
- Yunha Hwang
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | | | - Elizabeth H Kellogg
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Peter R Girguis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
10
|
Beavan A, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci U S A 2024; 121:e2304934120. [PMID: 38147560 PMCID: PMC10769857 DOI: 10.1073/pnas.2304934120] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 11/05/2023] [Indexed: 12/28/2023] Open
Abstract
Pangenomes exhibit remarkable variability in many prokaryotic species, much of which is maintained through the processes of horizontal gene transfer and gene loss. Repeated acquisitions of near-identical homologs can easily be observed across pangenomes, leading to the question of whether these parallel events potentiate similar evolutionary trajectories, or whether the remarkably different genetic backgrounds of the recipients mean that postacquisition evolutionary trajectories end up being quite different. In this study, we present a machine learning method that predicts the presence or absence of genes in the Escherichia coli pangenome based on complex patterns of the presence or absence of other accessory genes within a genome. Our analysis leverages the repeated transfer of genes through the E. coli pangenome to observe patterns of repeated evolution following similar events. We find that the presence or absence of a substantial set of genes is highly predictable from other genes alone, indicating that selection potentiates and maintains gene-gene co-occurrence and avoidance relationships deterministically over long-term bacterial evolution and is robust to differences in host evolutionary history. We propose that at least part of the pangenome can be understood as a set of genes with relationships that govern their likely cohabitants, analogous to an ecosystem's set of interacting organisms. Our findings indicate that intragenomic gene fitness effects may be key drivers of prokaryotic evolution, influencing the repeated emergence of complex gene-gene relationships across the pangenome.
Collapse
Affiliation(s)
- Alan Beavan
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| | - Maria Rosa Domingo-Sananes
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
- School of Science and Technology, Nottingham Trent University, NottinghamNG1 4FQ, United Kingdom
| | - James O. McInerney
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| |
Collapse
|
11
|
Nardulli P, Ballini A, Zamparella M, De Vito D. The Role of Stakeholders' Understandings in Emerging Antimicrobial Resistance: A One Health Approach. Microorganisms 2023; 11:2797. [PMID: 38004808 PMCID: PMC10673085 DOI: 10.3390/microorganisms11112797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/10/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
The increasing misuse of antibiotics in human and veterinary medicine and in agroecosystems and the consequent selective pressure of resistant strains lead to multidrug resistance (AMR), an expanding global phenomenon. Indeed, this phenomenon represents a major public health target with significant clinical implications related to increased morbidity and mortality and prolonged hospital stays. The current presence of microorganisms multi-resistant to antibiotics isolated in patients is a problem because of the additional burden of disease it places on the most fragile patients and the difficulty of finding effective therapies. In recent decades, international organizations like the World Health Organization (WHO) and the European Centre for Disease Prevention and Control (ECDC) have played significant roles in addressing the issue of AMR. The ECDC estimates that in the European Union alone, antibiotic resistance causes 33,000 deaths and approximately 880,000 cases of disability each year. The epidemiological impact of AMR inevitably also has direct economic consequences related not only to the loss of life but also to a reduction in the number of days worked, increased use of healthcare resources for diagnostic procedures and the use of second-line antibiotics when available. In 2015, the WHO, recognising AMR as a complex problem that can only be addressed by coordinated multi-sectoral interventions, promoted the One Health approach that considers human, animal, and environmental health in an integrated manner. In this review, the authors try to address why a collaboration of all stakeholders involved in AMR growth and management is necessary in order to achieve optimal health for people, animals, plants, and the environment, highlighting that AMR is a growing threat to human and animal health, food safety and security, economic prosperity, and ecosystems worldwide.
Collapse
Affiliation(s)
- Patrizia Nardulli
- S.C. Farmacia e UMACA IRCCS Istituto Tumori “Giovanni Paolo II”, Viale O. Flacco 65, 70124 Bari, Italy;
| | - Andrea Ballini
- Department of Clinical and Experimental Medicine, University of Foggia, 71122 Foggia, Italy
| | | | - Danila De Vito
- Department of Translational Biomedicine and Neuroscience, Medical School, University Aldo Moro of Bari, 70124 Bari, Italy;
| |
Collapse
|
12
|
Babele PK, Srivastava A, Young JD. Metabolic flux phenotyping of secondary metabolism in cyanobacteria. Trends Microbiol 2023; 31:1118-1130. [PMID: 37331829 DOI: 10.1016/j.tim.2023.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/10/2023] [Accepted: 05/15/2023] [Indexed: 06/20/2023]
Abstract
Cyanobacteria generate energy from photosynthesis and produce various secondary metabolites with diverse commercial and pharmaceutical applications. Unique metabolic and regulatory pathways in cyanobacteria present new challenges for researchers to enhance their product yields, titers, and rates. Therefore, further advancements are critically needed to establish cyanobacteria as a preferred bioproduction platform. Metabolic flux analysis (MFA) quantitatively determines the intracellular flows of carbon within complex biochemical networks, which elucidate the control of metabolic pathways by transcriptional, translational, and allosteric regulatory mechanisms. The emerging field of systems metabolic engineering (SME) involves the use of MFA and other omics technologies to guide the rational development of microbial production strains. This review highlights the potential of MFA and SME to optimize the production of cyanobacterial secondary metabolites and discusses the technical challenges that lie ahead.
Collapse
Affiliation(s)
- Piyoosh K Babele
- College of Agriculture, Rani Lakshmi Bai Central Agricultural University Jhansi, 284003, Uttar Pradesh, India.
| | - Amit Srivastava
- University of Jyväskylä, Nanoscience Centre, Department of Biological and Environmental Science, 40014 Jyväskylä, Finland
| | - Jamey D Young
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, PMB 351604, Nashville, TN 37235-1604, USA; Department of Molecular Physiology and Biophysics, Vanderbilt University, PMB 351604, Nashville, TN 37235-1604, USA.
| |
Collapse
|
13
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
14
|
Thai TD, Lim W, Na D. Synthetic bacteria for the detection and bioremediation of heavy metals. Front Bioeng Biotechnol 2023; 11:1178680. [PMID: 37122866 PMCID: PMC10133563 DOI: 10.3389/fbioe.2023.1178680] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/04/2023] [Indexed: 05/02/2023] Open
Abstract
Toxic heavy metal accumulation is one of anthropogenic environmental pollutions, which poses risks to human health and ecological systems. Conventional heavy metal remediation approaches rely on expensive chemical and physical processes leading to the formation and release of other toxic waste products. Instead, microbial bioremediation has gained interest as a promising and cost-effective alternative to conventional methods, but the genetic complexity of microorganisms and the lack of appropriate genetic engineering technologies have impeded the development of bioremediating microorganisms. Recently, the emerging synthetic biology opened a new avenue for microbial bioremediation research and development by addressing the challenges and providing novel tools for constructing bacteria with enhanced capabilities: rapid detection and degradation of heavy metals while enhanced tolerance to toxic heavy metals. Moreover, synthetic biology also offers new technologies to meet biosafety regulations since genetically modified microorganisms may disrupt natural ecosystems. In this review, we introduce the use of microorganisms developed based on synthetic biology technologies for the detection and detoxification of heavy metals. Additionally, this review explores the technical strategies developed to overcome the biosafety requirements associated with the use of genetically modified microorganisms.
Collapse
Affiliation(s)
| | | | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| |
Collapse
|