1
|
Feng T, Chen X, Wu S, Tang W, Zhou H, Fang Z. Predicting the bacterial host range of plasmid genomes using the language model-based one-class support vector machine algorithm. Microb Genom 2025; 11. [PMID: 39932495 DOI: 10.1099/mgen.0.001355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025] Open
Abstract
The prediction of the plasmid host range is crucial for investigating the dissemination of plasmids and the transfer of resistance and virulence genes mediated by plasmids. Several machine learning-based tools have been developed to predict plasmid host ranges. These tools have been trained and tested based on the bacterial host records of plasmids in related databases. Typically, a plasmid genome in databases such as the National Center for Biotechnology Information is annotated with only one or a few bacterial hosts, which does not encompass all possible hosts. Consequently, existing methods may significantly underestimate the host ranges of mobile plasmids. In this work, we propose a novel method named HRPredict, which employs a word vector model to digitally represent the encoded proteins on plasmid genomes. Since it is difficult to confirm which host a particular plasmid definitely cannot enter, we developed a machine learning approach for predicting whether a plasmid can enter a specific bacterium as a no-negative samples learning task. Using multiple one-class support vector machine (SVM) models that do not require negative samples for training, HRPredict predicts the host range of plasmids across 45 families, 56 genera and 56 species. In the benchmark test set, we constructed reliable negative samples for each host taxonomic unit via two indirect methods, and we found that the area under the curve (AUC), F1-score, recall, precision and accuracy of most taxonomic unit prediction models exceeded 0.9. Among the 13 broad-host-range plasmid types, HRPredict demonstrated greater coverage than HOTSPOT and PlasmidHostFinder, thus successfully predicting the majority of hosts previously reported. Through feature importance calculation for each SVM model, we found that genes closely related to the plasmid host range are involved in functions such as bacterial adaptability, pathogenicity and survival. These findings provide significant insight into the mechanisms through which bacteria adjust to diverse environments through plasmids. The HRPredict algorithm is expected to facilitate in-depth research on the spread of broad-host-range plasmids and enable host-range predictions for novel plasmids reconstructed from microbiome sequencing data.
Collapse
Affiliation(s)
- Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
- Guangzhou Chest Hospital, Hengzhigang Road 1066, Guangzhou, 510095, PR China
| | - Xirao Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
| | - Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
| | - Waijiao Tang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
| |
Collapse
|
2
|
Przymus P, Rykaczewski K, Martín-Segura A, Truu J, Carrillo De Santa Pau E, Kolev M, Naskinova I, Gruca A, Sampri A, Frohme M, Nechyporenko A. Deep learning in microbiome analysis: a comprehensive review of neural network models. Front Microbiol 2025; 15:1516667. [PMID: 39911715 PMCID: PMC11794229 DOI: 10.3389/fmicb.2024.1516667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Accepted: 12/16/2024] [Indexed: 02/07/2025] Open
Abstract
Microbiome research, the study of microbial communities in diverse environments, has seen significant advances due to the integration of deep learning (DL) methods. These computational techniques have become essential for addressing the inherent complexity and high-dimensionality of microbiome data, which consist of different types of omics datasets. Deep learning algorithms have shown remarkable capabilities in pattern recognition, feature extraction, and predictive modeling, enabling researchers to uncover hidden relationships within microbial ecosystems. By automating the detection of functional genes, microbial interactions, and host-microbiome dynamics, DL methods offer unprecedented precision in understanding microbiome composition and its impact on health, disease, and the environment. However, despite their potential, deep learning approaches face significant challenges in microbiome research. Additionally, the biological variability in microbiome datasets requires tailored approaches to ensure robust and generalizable outcomes. As microbiome research continues to generate vast and complex datasets, addressing these challenges will be crucial for advancing microbiological insights and translating them into practical applications with DL. This review provides an overview of different deep learning models in microbiome research, discussing their strengths, practical uses, and implications for future studies. We examine how these models are being applied to solve key problems and highlight potential pathways to overcome current limitations, emphasizing the transformative impact DL could have on the field moving forward.
Collapse
Affiliation(s)
- Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Toruń, Pomeranian, Poland
| | - Krzysztof Rykaczewski
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Toruń, Pomeranian, Poland
| | | | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | | | - Mikhail Kolev
- Department of Mathematics, University of Architecture, Civil Engineering and Geodesy, Sofia, Bulgaria
- Department of Applied Computer Science and Mathematical Modeling, Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Irina Naskinova
- Department of Mathematics, University of Architecture, Civil Engineering and Geodesy, Sofia, Bulgaria
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Alexia Sampri
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
| | - Marcus Frohme
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Brandenburg, Germany
| | - Alina Nechyporenko
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Brandenburg, Germany
- Department of System Engineering, Kharkiv National University of Radioelectronics, Kharkiv, Ukraine
| |
Collapse
|
3
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
4
|
Wang Y, Gai J, Hou Q, Zhao H, Shan C, Guo Z. Ultra-high-depth macrogenomic sequencing revealed differences in microbial composition and function between high temperature and medium-high temperature Daqu. World J Microbiol Biotechnol 2023; 39:337. [PMID: 37814055 DOI: 10.1007/s11274-023-03772-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 09/18/2023] [Indexed: 10/11/2023]
Abstract
Complex microorganisms in Daqu of different temperatures play a vital role in the taste, flavor and quality of Baijiu during fermentation. However, understanding the functional diversity of the whole microbial community between the Daqus of two different temperatures (high temperature Daqu, HD and medium-high temperature Daqu, MD) remains a major challenge. Here, a systematic study of the microbial diversity, functions as well as physiological and biochemical indexes of Daqu are described. The results revealed that the Daqu exhibited unique characteristics. In particular, the diversity of microorganisms in HD and MD was high, with 44 species including 14 novel species (Sphingomonas sp. is the main novel species) detected in all samples. Their profiles of carbohydrate-active enzymes and specific functional components supported the fact that these species were involved in flavor formation. The Daqu microbiome consisted of a high proportion of phage, providing evidence of phage infection/genome integration and horizontal gene transfer from phage to bacteria. Such processes would also regulate Daqu microbiomes and thus flavor quality. These results enrich current knowledge of Daqu and can be used to promote the development of Baijiu fermentation technology.
Collapse
Affiliation(s)
- Yurong Wang
- Hubei Provincial Engineering and Technology Research Center for Food Ingredients, Hubei University of Arts and Science, Xiangyang, Hubei Province, People's Republic of China
- Xiangyang Lactic Acid Bacteria Biotechnology and Engineering Key Laboratory, Hubei University of Arts and Science, Xiangyang, Hubei, People's Republic of China
| | - Jianshe Gai
- Xinjiang Sishi Avenue Wine Co., Ltd, Huyanghe, Xinjiang Autonomous Region, People's Republic of China
| | - Qiangchuan Hou
- Hubei Provincial Engineering and Technology Research Center for Food Ingredients, Hubei University of Arts and Science, Xiangyang, Hubei Province, People's Republic of China
- Xiangyang Lactic Acid Bacteria Biotechnology and Engineering Key Laboratory, Hubei University of Arts and Science, Xiangyang, Hubei, People's Republic of China
| | - Huijun Zhao
- Hubei Provincial Engineering and Technology Research Center for Food Ingredients, Hubei University of Arts and Science, Xiangyang, Hubei Province, People's Republic of China
- Xiangyang Lactic Acid Bacteria Biotechnology and Engineering Key Laboratory, Hubei University of Arts and Science, Xiangyang, Hubei, People's Republic of China
| | - Chunhui Shan
- School of Food Science, Shihezi University, Shihezi, Xinjiang Autonomous Region, People's Republic of China
| | - Zhuang Guo
- Hubei Provincial Engineering and Technology Research Center for Food Ingredients, Hubei University of Arts and Science, Xiangyang, Hubei Province, People's Republic of China.
- Xiangyang Lactic Acid Bacteria Biotechnology and Engineering Key Laboratory, Hubei University of Arts and Science, Xiangyang, Hubei, People's Republic of China.
- Xinjiang Sishi Avenue Wine Co., Ltd, Huyanghe, Xinjiang Autonomous Region, People's Republic of China.
| |
Collapse
|
5
|
Li M, Wang C, Guo Q, Xu C, Xie Z, Tan J, Wu S, Wang P, Guo J, Fang Z, Zhu S, Duan L, Jiang X, Zhu H. More Positive or More Negative? Metagenomic Analysis Reveals Roles of Virome in Human Disease-Related Gut Microbiome. Front Cell Infect Microbiol 2022; 12:846063. [PMID: 35493727 PMCID: PMC9040671 DOI: 10.3389/fcimb.2022.846063] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 03/07/2022] [Indexed: 12/04/2022] Open
Abstract
Viruses are increasingly viewed as vital components of the human gut microbiota, while their roles in health and diseases remain incompletely understood. Here, we first sequenced and analyzed the 37 metagenomic and 18 host metabolomic samples related to irritable bowel syndrome (IBS) and found that some shifted viruses between IBS and controls covaried with shifted bacteria and metabolites. Especially, phages that infect beneficial lactic acid bacteria depleted in IBS covaried with their hosts. We also retrieved public whole-genome metagenomic datasets of another four diseases (type 2 diabetes, Crohn’s disease, colorectal cancer, and liver cirrhosis), totaling 438 samples including IBS, and performed uniform analysis of the gut viruses in diseases. By constructing disease-specific co-occurrence networks, we found viruses actively interacting with bacteria, negatively correlated with possible dysbiosis-related and inflammation-mediating bacteria, increasing the connectivity between bacteria modules, and contributing to the robustness of the networks. Functional enrichment analysis showed that phages interact with bacteria through predation or expressing genes involved in the transporter and secretion system, metabolic enzymes, etc. We further built a viral database to facilitate systematic functional classification and explored the functions of viral genes on interacting with bacteria. Our analyses provided a systematic view of the gut virome in the disease-related microbial community and suggested possible positive roles of viruses concerning gut health.
Collapse
Affiliation(s)
- Mo Li
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint Ph.D. Program, School of Life Sciences, Peking University, Beijing, China
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Chunhui Wang
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint Ph.D. Program, School of Life Sciences, Peking University, Beijing, China
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Qian Guo
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Congmin Xu
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Zhongjie Xie
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Jie Tan
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Shufang Wu
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Peihong Wang
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Jinyuan Guo
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
| | - Zhencheng Fang
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Shiwei Zhu
- Department of Gastroenterology, Peking University Third Hospital, Beijing, China
| | - Liping Duan
- Department of Gastroenterology, Peking University Third Hospital, Beijing, China
| | - Xiaoqing Jiang
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
- *Correspondence: Huaiqiu Zhu, ; Xiaoqing Jiang,
| | - Huaiqiu Zhu
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint Ph.D. Program, School of Life Sciences, Peking University, Beijing, China
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China
- *Correspondence: Huaiqiu Zhu, ; Xiaoqing Jiang,
| |
Collapse
|
6
|
Fang Z, Zhou H. VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids. Front Microbiol 2021; 12:615711. [PMID: 33613485 PMCID: PMC7894196 DOI: 10.3389/fmicb.2021.615711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/04/2021] [Indexed: 01/22/2023] Open
Abstract
Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial prokaryote virus proteins is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10-34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, China
| |
Collapse
|
7
|
Rios Miguel AB, Jetten MS, Welte CU. The role of mobile genetic elements in organic micropollutant degradation during biological wastewater treatment. WATER RESEARCH X 2020; 9:100065. [PMID: 32984801 PMCID: PMC7494797 DOI: 10.1016/j.wroa.2020.100065] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/19/2020] [Accepted: 08/28/2020] [Indexed: 05/24/2023]
Abstract
Wastewater treatment plants (WWTPs) are crucial for producing clean effluents from polluting sources such as hospitals, industries, and municipalities. In recent decades, many new organic compounds have ended up in surface waters in concentrations that, while very low, cause (chronic) toxicity to countless organisms. These organic micropollutants (OMPs) are usually quite recalcitrant and not sufficiently removed during wastewater treatment. Microbial degradation plays a pivotal role in OMP conversion. Microorganisms can adapt their metabolism to the use of novel molecules via mutations and rearrangements of existing genes in new clusters. Many catabolic genes have been found adjacent to mobile genetic elements (MGEs), which provide a stable scaffold to host new catabolic pathways and spread these genes in the microbial community. These mobile systems could be engineered to enhance OMP degradation in WWTPs, and this review aims to summarize and better understand the role that MGEs might play in the degradation and wastewater treatment process. Available data about the presence of catabolic MGEs in WWTPs are reviewed, and current methods used to identify and measure MGEs in environmental samples are critically evaluated. Finally, examples of how these MGEs could be used to improve micropollutant degradation in WWTPs are outlined. In the near future, advances in the use of MGEs will hopefully enable us to apply selective augmentation strategies to improve OMP conversion in WWTPs.
Collapse
Affiliation(s)
- Ana B. Rios Miguel
- Department of Microbiology, Institute for Water and Wetland Research, Radboud University, Heyendaalseweg 135, 6525, AJ Nijmegen, the Netherlands
| | - Mike S.M. Jetten
- Department of Microbiology, Institute for Water and Wetland Research, Radboud University, Heyendaalseweg 135, 6525, AJ Nijmegen, the Netherlands
- Soehngen Institute of Anaerobic Microbiology, Radboud University, Heyendaalseweg 135, 6525, AJ Nijmegen, the Netherlands
| | - Cornelia U. Welte
- Department of Microbiology, Institute for Water and Wetland Research, Radboud University, Heyendaalseweg 135, 6525, AJ Nijmegen, the Netherlands
- Soehngen Institute of Anaerobic Microbiology, Radboud University, Heyendaalseweg 135, 6525, AJ Nijmegen, the Netherlands
| |
Collapse
|
8
|
Fang Z, Zhou H. Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures. Microb Genom 2020; 6:mgen000459. [PMID: 33074084 PMCID: PMC7725325 DOI: 10.1099/mgen.0.000459] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 10/03/2020] [Indexed: 12/24/2022] Open
Abstract
Plasmids are the key element in horizontal gene transfer in the microbial community. Recently, a large number of experimental and computational methods have been developed to obtain the plasmidomes of microbial communities. Distinguishing transmissible plasmid sequences, which are derived from conjugative or at least mobilizable plasmids, from non-transmissible plasmid sequences in the plasmidome is essential for understanding the diversity of plasmids and how they regulate the microbial community. Unfortunately, due to the highly fragmented characteristics of DNA sequences in the plasmidome, effective identification methods are lacking. In this work, we used information entropy from information theory to assess the randomness of synonymous codon usage over 4424 plasmid genomes. The results showed that for all amino acids, the choice of a synonymous codon in conjugative and mobilizable plasmids is more random than that in non-transmissible plasmids, indicating that transmissible plasmids have different sequence signatures from non-transmissible plasmids. Inspired by this phenomenon, we further developed a novel algorithm named PlasTrans. PlasTrans takes the triplet code sequences and base sequences of plasmid DNA fragments as input and uses the convolutional neural network of the deep learning technique to further extract the more complex signatures of the plasmid sequences and identify the conjugative and mobilizable DNA fragments. Tests showed that PlasTrans could achieve an AUC of as high as 84-91%, even though the fragments only contained hundreds of base pairs. To the best of our knowledge, this is the first quantitative analysis of the difference in sequence signatures between transmissible and non-transmissible plasmids, and we developed the first tool to perform transferability annotation for DNA fragments in the plasmidome. We expect that PlasTrans will be a useful tool for researchers who analyse the properties of novel plasmids in the microbial community and horizontal gene transfer, especially the spread of resistance genes and virulence factors associated with plasmids. PlasTrans is freely available via https://github.com/zhenchengfang/PlasTrans.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
- Center for Quantitative Biology, Peking University, No. 5 Yiheyuan Road Haidian District, Beijing 100871, PR China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, PR China
- State Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, PR China
| |
Collapse
|