1
|
Silva MKDP, Nicoleti VYU, Rodrigues BDPP, Araujo ASF, Ellwanger JH, de Almeida JM, Lemos LN. Exploring deep learning in phage discovery and characterization. Virology 2025; 609:110559. [PMID: 40359589 DOI: 10.1016/j.virol.2025.110559] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/24/2025] [Accepted: 04/28/2025] [Indexed: 05/15/2025]
Abstract
Bacteriophages, or bacterial viruses, play diverse ecological roles by shaping bacterial populations and also hold significant biotechnological and medical potential, including the treatment of infections caused by multidrug-resistant bacteria. The discovery of novel bacteriophages using large-scale metagenomic data has been accelerated by the accessibility of deep learning (Artificial Intelligence), the increased computing power of graphical processing units (GPUs), and new bioinformatics tools. This review addresses the recent revolution in bacteriophage research, ranging from the adoption of neural network algorithms applied to metagenomic data to the use of pre-trained language models, such as BERT, which have improved the reconstruction of viral metagenome-assembled genomes (vMAGs). This article also discusses the main aspects of bacteriophage biology using deep learning, highlighting the advances and limitations of this approach. Finally, prospects of deep-learning-based metagenomic algorithms and recommendations for future investigations are described.
Collapse
Affiliation(s)
| | - Vitória Yumi Uetuki Nicoleti
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| | | | | | - Joel Henrique Ellwanger
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil.
| | - James Moraes de Almeida
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| | - Leandro Nascimento Lemos
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| |
Collapse
|
2
|
Przymus P, Rykaczewski K, Martín-Segura A, Truu J, Carrillo De Santa Pau E, Kolev M, Naskinova I, Gruca A, Sampri A, Frohme M, Nechyporenko A. Deep learning in microbiome analysis: a comprehensive review of neural network models. Front Microbiol 2025; 15:1516667. [PMID: 39911715 PMCID: PMC11794229 DOI: 10.3389/fmicb.2024.1516667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Accepted: 12/16/2024] [Indexed: 02/07/2025] Open
Abstract
Microbiome research, the study of microbial communities in diverse environments, has seen significant advances due to the integration of deep learning (DL) methods. These computational techniques have become essential for addressing the inherent complexity and high-dimensionality of microbiome data, which consist of different types of omics datasets. Deep learning algorithms have shown remarkable capabilities in pattern recognition, feature extraction, and predictive modeling, enabling researchers to uncover hidden relationships within microbial ecosystems. By automating the detection of functional genes, microbial interactions, and host-microbiome dynamics, DL methods offer unprecedented precision in understanding microbiome composition and its impact on health, disease, and the environment. However, despite their potential, deep learning approaches face significant challenges in microbiome research. Additionally, the biological variability in microbiome datasets requires tailored approaches to ensure robust and generalizable outcomes. As microbiome research continues to generate vast and complex datasets, addressing these challenges will be crucial for advancing microbiological insights and translating them into practical applications with DL. This review provides an overview of different deep learning models in microbiome research, discussing their strengths, practical uses, and implications for future studies. We examine how these models are being applied to solve key problems and highlight potential pathways to overcome current limitations, emphasizing the transformative impact DL could have on the field moving forward.
Collapse
Affiliation(s)
- Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Toruń, Pomeranian, Poland
| | - Krzysztof Rykaczewski
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Toruń, Pomeranian, Poland
| | | | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | | | - Mikhail Kolev
- Department of Mathematics, University of Architecture, Civil Engineering and Geodesy, Sofia, Bulgaria
- Department of Applied Computer Science and Mathematical Modeling, Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Irina Naskinova
- Department of Mathematics, University of Architecture, Civil Engineering and Geodesy, Sofia, Bulgaria
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Alexia Sampri
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
| | - Marcus Frohme
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Brandenburg, Germany
| | - Alina Nechyporenko
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Brandenburg, Germany
- Department of System Engineering, Kharkiv National University of Radioelectronics, Kharkiv, Ukraine
| |
Collapse
|
3
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
4
|
Yan W, Tan L, Meng-Shan L, Sheng S, Jun W, Fu-an W. SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction. PeerJ 2023; 11:e16192. [PMID: 37810796 PMCID: PMC10559882 DOI: 10.7717/peerj.16192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/06/2023] [Indexed: 10/10/2023] Open
Abstract
Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields.
Collapse
Affiliation(s)
- Wu Yan
- School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China
| | - Li Tan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, China
| | - Li Meng-Shan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, China
| | - Sheng Sheng
- School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China
| | - Wang Jun
- School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China
| | - Wu Fu-an
- School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China
| |
Collapse
|