1
|
Hera MR, Liu S, Wei W, Rodriguez JS, Ma C, Koslicki D. Metagenomic functional profiling: to sketch or not to sketch? Bioinformatics 2024; 40:ii165-ii173. [PMID: 39230701 PMCID: PMC11373326 DOI: 10.1093/bioinformatics/btae397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Functional profiling of metagenomic samples is essential to decipher the functional capabilities of microbial communities. Traditional and more widely used functional profilers in the context of metagenomics rely on aligning reads against a known reference database. However, aligning sequencing reads against a large and fast-growing database is computationally expensive. In general, k-mer-based sketching techniques have been successfully used in metagenomics to address this bottleneck, notably in taxonomic profiling. In this work, we describe leveraging FracMinHash (implemented in sourmash, a publicly available software), a k-mer-sketching algorithm, to obtain functional profiles of metagenome samples. RESULTS We show how pieces of the sourmash software (and the resulting FracMinHash sketches) can be put together in a pipeline to functionally profile a metagenomic sample. We named our pipeline fmh-funprofiler. We report that the functional profiles obtained using this pipeline demonstrate comparable completeness and better purity compared to the profiles obtained using other alignment-based methods when applied to simulated metagenomic data. We also report that fmh-funprofiler is 39-99× faster in wall-clock time, and consumes up to 40-55× less memory. Coupled with the KEGG database, this method not only replicates fundamental biological insights but also highlights novel signals from the Human Microbiome Project datasets. AVAILABILITY AND IMPLEMENTATION This fast and lightweight metagenomic functional profiler is freely available and can be accessed here: https://github.com/KoslickiLab/fmh-funprofiler. All scripts of the analyses we present in this manuscript can be found on GitHub.
Collapse
Affiliation(s)
- Mahmudur Rahman Hera
- School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Shaopeng Liu
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Wei Wei
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Judith S Rodriguez
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Chunyu Ma
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - David Koslicki
- School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, Pennsylvania 16802, United States
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| |
Collapse
|
2
|
Abdullah K, Wilkins D, Ferrari BC. Utilization of-Omic technologies in cold climate hydrocarbon bioremediation: a text-mining approach. Front Microbiol 2023; 14:1113102. [PMID: 37396353 PMCID: PMC10313077 DOI: 10.3389/fmicb.2023.1113102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 05/02/2023] [Indexed: 07/04/2023] Open
Abstract
Hydrocarbon spills in cold climates are a prominent and enduring form of anthropogenic contamination. Bioremediation is one of a suite of remediation tools that has emerged as a cost-effective strategy for transforming these contaminants in soil, ideally into less harmful products. However, little is understood about the molecular mechanisms driving these complex, microbially mediated processes. The emergence of -omic technologies has led to a revolution within the sphere of environmental microbiology allowing for the identification and study of so called 'unculturable' organisms. In the last decade, -omic technologies have emerged as a powerful tool in filling this gap in our knowledge on the interactions between these organisms and their environment in vivo. Here, we utilize the text mining software Vosviewer to process meta-data and visualize key trends relating to cold climate bioremediation projects. The results of text mining of the literature revealed a shift over time from optimizing bioremediation experiments on the macro/community level to, in more recent years focusing on individual organisms of interest, interactions within the microbiome and the investigation of novel metabolic degradation pathways. This shift in research focus was made possible in large part by the rise of omics studies allowing research to focus not only what organisms/metabolic pathways are present but those which are functional. However, all is not harmonious, as the development of downstream analytical methods and associated processing tools have outpaced sample preparation methods, especially when dealing with the unique challenges posed when analyzing soil-based samples.
Collapse
Affiliation(s)
- Kristopher Abdullah
- Faculty of Science, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Daniel Wilkins
- Environmental Stewardship Program, Australian Antarctic Division, Department of Climate Change, Energy, Environment and Water, Kingston, TAS, Australia
| | - Belinda C. Ferrari
- Faculty of Science, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
3
|
Sun J, Yuan Y, Cai L, Zeng M, Li X, Yao F, Chen W, Huang Y, Shafiq M, Xie Q, Zhang Q, Wong N, Wang Z, Jiao X. Metagenomic evidence for antibiotics-driven co-evolution of microbial community, resistome and mobilome in hospital sewage. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 327:121539. [PMID: 37019259 DOI: 10.1016/j.envpol.2023.121539] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/11/2023] [Accepted: 03/29/2023] [Indexed: 06/19/2023]
Abstract
Overconsumption of antibiotics is an immediate cause for the emergence of antimicrobial resistance (AMR) and antibiotic resistant bacteria (ARB), though its environmental impact remains inadequately clarified. There is an urgent need to dissect the complex links underpinning the dynamic co-evolution of ARB and their resistome and mobilome in hospital sewage. Metagenomic and bioinformatic methods were employed to analyze the microbial community, resistome and mobilome in hospital sewage, in relation to data on clinical antibiotic use collected from a tertiary-care hospital. In this study, resistome (1,568 antibiotic resistance genes, ARGs, corresponding to 29 antibiotic types/subtypes) and mobilome (247 types of mobile genetic elements, MGEs) were identified. Networks connecting co-occurring ARGs with MGEs encompass 176 nodes and 578 edges, in which over 19 types of ARGs had significant correlations with MGEs. Prescribed dosage and time-dependent antibiotic consumption were associated with the abundance and distributions of ARGs, and conjugative transfer of ARGs via MGEs. Variation partitioning analyses show that effects of conjugative transfer were most likely the main contributors to transient propagation and persistence of AMR. We have presented the first evidence supporting idea that use of clinical antibiotics is a potent driving force for the development of co-evolving resistome and mobilome, which in turn supports the growth and evolution of ARB in hospital sewage. The use of clinical antibiotics calls for greater attention in antibiotic stewardship and management.
Collapse
Affiliation(s)
- Jiayu Sun
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China; Guangdong Province Center for Disease Control and Prevention, Guangzhou, 511400, China
| | - Yumeng Yuan
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China
| | - Leshan Cai
- The First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, China; Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou, 515041, China
| | - Mi Zeng
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China
| | - Xin Li
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China
| | - Fen Yao
- Department of Pharmacology, Shantou University Medical College, Shantou, 515041, China
| | - Weidong Chen
- The First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, China
| | - Yuanchun Huang
- The First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, China
| | - Muhammad Shafiq
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China
| | - Qingdong Xie
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China
| | - Qiaoxin Zhang
- The First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, China
| | - Naikei Wong
- Department of Pharmacology, Shantou University Medical College, Shantou, 515041, China
| | - Zhen Wang
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou, 515041, China
| | - Xiaoyang Jiao
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, 515041, China; Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou, 515041, China.
| |
Collapse
|
4
|
Li D, Hu J, Zhang L, Li L, Yin Q, Shi J, Guo H, Zhang Y, Zhuang P. Deep learning and machine intelligence: New computational modeling techniques for discovery of the combination rules and pharmacodynamic characteristics of Traditional Chinese Medicine. Eur J Pharmacol 2022; 933:175260. [PMID: 36116517 DOI: 10.1016/j.ejphar.2022.175260] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 08/15/2022] [Accepted: 09/05/2022] [Indexed: 11/19/2022]
Abstract
It has been increasingly accepted that Multi-Ingredient-Based interventions provide advantages over single-target therapy for complex diseases. With the growing development of Traditional Chinese Medicine (TCM) and continually being refined of a holistic view, "multi-target" and "multi-pathway" integration characteristics of which are being accepted. However, its effector substances, efficacy targets, especially the combination rules and mechanisms remain unclear, and more powerful strategies to interpret the synergy are urgently needed. Artificial intelligence (AI) and computer vision lead to a rapidly expanding in many fields, including diagnosis and treatment of TCM. AI technology significantly improves the reliability and accuracy of diagnostics, target screening, and new drug research. While all AI techniques are capable of matching models to biological big data, the specific methods are complex and varied. Retrieves literature by the keywords such as "artificial intelligence", "machine learning", "deep learning", "traditional Chinese medicine" and "Chinese medicine". Search the application of computer algorithms of TCM between 2000 and 2021 in PubMed, Web of Science, China National Knowledge Infrastructure (CNKI), Elsevier and Springer. This review concentrates on the application of computational in herb quality evaluation, drug target discovery, optimized compatibility and medical diagnoses of TCM. We describe the characteristics of biological data for which different AI techniques are applicable, and discuss some of the best data mining methods and the problems faced by deep learning and machine learning methods applied to Chinese medicine.
Collapse
Affiliation(s)
- Dongna Li
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Jing Hu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Lin Zhang
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Lili Li
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Qingsheng Yin
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Jiangwei Shi
- First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China; National Clinical Research Center for Chinese Medicine Acupuncture and Moxibustion, China
| | - Hong Guo
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China
| | - Yanjun Zhang
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China; First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China; National Clinical Research Center for Chinese Medicine Acupuncture and Moxibustion, China.
| | - Pengwei Zhuang
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| |
Collapse
|
5
|
Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021; 16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Collapse
Affiliation(s)
- Benjamin Voigt
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Oliver Fischer
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Krumnow
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Herta
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Piotr Wojciech Dabrowski
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| |
Collapse
|