1
|
Ferchiou S, Caza F, Villemur R, Betoulle S, St-Pierre Y. From shells to sequences: A proof-of-concept study for on-site analysis of hemolymphatic circulating cell-free DNA from sentinel mussels using Nanopore technology. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 934:172969. [PMID: 38754506 DOI: 10.1016/j.scitotenv.2024.172969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/30/2024] [Accepted: 05/01/2024] [Indexed: 05/18/2024]
Abstract
Blue mussels are often abundant and widely distributed in polar marine coastal ecosystems. Because of their wide distribution, ecological importance, and relatively stationary lifestyle, bivalves have long been considered suitable indicators of ecosystem health and changes. Monitoring the population dynamics of blue mussels can provide information on the overall biodiversity, species interactions, and ecosystem functioning. In the present work, we combined the concept of liquid biopsy (LB), an emerging concept in medicine based on the sequencing of free circulating DNA, with the Oxford Nanopore Technologies (ONT) platform using a portable laboratory in a remote area. Our results demonstrate that this platform is ideally suited for sequencing hemolymphatic circulating cell-free DNA (ccfDNA) fragments found in blue mussels. The percentage of non-self ccfDNA accounted for >50 % of ccfDNA at certain sampling Sites, allowing the quick, on-site acquisition of a global view of the biodiversity of a coastal marine ecosystem. These ccfDNA fragments originated from viruses, bacteria, plants, arthropods, algae, and multiple Chordata. Aside from non-self ccfDNA, we found DNA fragments from all 14 blue mussel chromosomes, as well as those originating from the mitochondrial genomes. However, the distribution of nuclear and mitochondrial DNA was significantly different between Sites. Similarly, analyses between various sampling Sites showed that the biodiversity varied significantly within microhabitats. Our work shows that the ONT platform is well-suited for LB in sentinel blue mussels in remote and challenging conditions, enabling faster fieldwork for conservation strategies and resource management in diverse settings.
Collapse
Affiliation(s)
- Sophia Ferchiou
- INRS-Centre Armand-Frappier Santé Technologie, 531 Boul. des Prairies, Laval, QC H7V 1B7, Canada
| | - France Caza
- INRS-Centre Armand-Frappier Santé Technologie, 531 Boul. des Prairies, Laval, QC H7V 1B7, Canada
| | - Richard Villemur
- INRS-Centre Armand-Frappier Santé Technologie, 531 Boul. des Prairies, Laval, QC H7V 1B7, Canada
| | - Stéphane Betoulle
- Université Reims Champagne-Ardenne, UMR-I 02 SEBIO Stress environnementaux et Biosurveillance des milieux aquatiques, Campus Moulin de la Housse, 51687 Reims, France
| | - Yves St-Pierre
- INRS-Centre Armand-Frappier Santé Technologie, 531 Boul. des Prairies, Laval, QC H7V 1B7, Canada.
| |
Collapse
|
2
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
3
|
Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic classification via joint analysis of amino acid and DNA. Nat Methods 2024; 21:971-973. [PMID: 38769467 DOI: 10.1038/s41592-024-02273-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 04/11/2024] [Indexed: 05/22/2024]
Abstract
Metagenomic taxonomic classifiers analyze either DNA or amino acid (AA) sequences. Metabuli ( https://metabuli.steineggerlab.com ), however, jointly analyzes both DNA and AA to leverage AA conservation for sensitive homology detection and DNA mutations for specific differentiation of closely related taxa. In the Critical Assessment of Metagenome Interpretation 2 plant-associated dataset, Metabuli covered 99% and 98% of classifications of state-of-the-art DNA- and AA-based classifiers, respectively.
Collapse
Affiliation(s)
- Jaebeom Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Martin Steinegger
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Acheampong DA, Jenjaroenpun P, Wongsurawat T, Krulilung A, Pomyen Y, Kunadirek P, Chuaypen N, Kusonmano K, Nookaew I. CAIM: Coverage-based Analysis for Identification of Microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591018. [PMID: 38746391 PMCID: PMC11091946 DOI: 10.1101/2024.04.25.591018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic (WMS) approach. In this study, we developed a new bioinformatics tool, CAIM, for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consitently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similality of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and primary 44 liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Collapse
Affiliation(s)
- Daniel A. Acheampong
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Alongkorn Krulilung
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, Bangkok, 10210, Thailand
| | - Pattapon Kunadirek
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
- Systems Biology and Bioinformatics Research Laboratory, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
5
|
Song L, Langmead B. Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification. Genome Biol 2024; 25:106. [PMID: 38664753 PMCID: PMC11046777 DOI: 10.1186/s13059-024-03244-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
Collapse
Affiliation(s)
- Li Song
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH, USA.
- Department of Computer Science, Dartmouth College, Hanover, NH, USA.
- Department of Microbiology and Immunology, Dartmouth College, Hanover, NH, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
6
|
Zheng A, Shaw J, Yu YW. Mora: abundance aware metagenomic read re-assignment for disentangling similar strains. BMC Bioinformatics 2024; 25:161. [PMID: 38649836 PMCID: PMC11035124 DOI: 10.1186/s12859-024-05768-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 04/05/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Taxonomic classification of reads obtained by metagenomic sequencing is often a first step for understanding a microbial community, but correctly assigning sequencing reads to the strain or sub-species level has remained a challenging computational problem. RESULTS We introduce Mora, a MetagenOmic read Re-Assignment algorithm capable of assigning short and long metagenomic reads with high precision, even at the strain level. Mora is able to accurately re-assign reads by first estimating abundances through an expectation-maximization algorithm and then utilizing abundance information to re-assign query reads. The key idea behind Mora is to maximize read re-assignment qualities while simultaneously minimizing the difference from estimated abundance levels, allowing Mora to avoid over assigning reads to the same genomes. On simulated diverse reads, this allows Mora to achieve F1 scores comparable to other algorithms while having less runtime. However, Mora significantly outshines other algorithms on very similar reads. We show that the high penalty of over assigning reads to a common reference genome allows Mora to accurately infer correct strains for real data in the form of E. coli reads. CONCLUSIONS Mora is a fast and accurate read re-assignment algorithm that is modularized, allowing it to be incorporated into general metagenomics and genomics workflows. It is freely available at https://github.com/AfZheng126/MORA .
Collapse
Affiliation(s)
- Andrew Zheng
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada
| | - Jim Shaw
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada.
| | - Yun William Yu
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada.
- Computer and Mathematical Sciences, University of Toronto at Scarborough, 1265 Military Trail, Toronto, Ontario, M1C 1A4, Canada.
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, Pennsylvania, 15213, USA.
| |
Collapse
|
7
|
Peres da Silva R, Suphavilai C, Nagarajan N. MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes. BMC Bioinformatics 2024; 25:153. [PMID: 38627615 PMCID: PMC11022314 DOI: 10.1186/s12859-024-05760-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.
Collapse
Affiliation(s)
- Rafael Peres da Silva
- School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.
| | - Chayaporn Suphavilai
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore
| | - Niranjan Nagarajan
- School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Republic of Singapore.
| |
Collapse
|
8
|
Bitsikas V, Cubizolles F, Schier AF. A vertebrate family without a functional Hypocretin/Orexin arousal system. Curr Biol 2024; 34:1532-1540.e4. [PMID: 38490200 DOI: 10.1016/j.cub.2024.02.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/20/2023] [Accepted: 02/12/2024] [Indexed: 03/17/2024]
Abstract
The Hypocretin/Orexin signaling pathway suppresses sleep and promotes arousal, whereas the loss of Hypocretin/Orexin results in narcolepsy, including the involuntary loss of muscle tone (cataplexy).1 Here, we show that the South Asian fish species Chromobotia macracanthus exhibits a sleep-like state during which individuals stop swimming and rest on their side. Strikingly, we discovered that the Hypocretin/Orexin system is pseudogenized in C. macracanthus, but in contrast to Hypocretin-deficient mammals, C. macracanthus does not suffer from sudden behavioral arrests. Similarly, zebrafish mutations in hypocretin/orexin show no evident signs of cataplectic-like episodes. Notably, four additional species in the Botiidae family also lack a functional Hypocretin/Orexin system. These findings identify the first vertebrate family that does not rely on a functional Hypocretin/Orexin system for the regulation of sleep and arousal.
Collapse
Affiliation(s)
- Vassilis Bitsikas
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA; Biozentrum, University of Basel, Spitalstrasse 41, 4056 Basel, Switzerland
| | - Fabien Cubizolles
- Biozentrum, University of Basel, Spitalstrasse 41, 4056 Basel, Switzerland
| | - Alexander F Schier
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA; Biozentrum, University of Basel, Spitalstrasse 41, 4056 Basel, Switzerland.
| |
Collapse
|
9
|
Buytaers FE, Verhaegen B, Van Nieuwenhuysen T, Roosens NHC, Vanneste K, Marchal K, De Keersmaecker SCJ. Strain-level characterization of foodborne pathogens without culture enrichment for outbreak investigation using shotgun metagenomics facilitated with nanopore adaptive sampling. Front Microbiol 2024; 15:1330814. [PMID: 38495515 PMCID: PMC10940517 DOI: 10.3389/fmicb.2024.1330814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/12/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction Shotgun metagenomics has previously proven effective in the investigation of foodborne outbreaks by providing rapid and comprehensive insights into the microbial contaminant. However, culture enrichment of the sample has remained a prerequisite, despite the potential impact on pathogen detection resulting from the growth competition. To circumvent the need for culture enrichment, we explored the use of adaptive sampling using various databases for a targeted nanopore sequencing, compared to shotgun metagenomics alone. Methods The adaptive sampling method was first tested on DNA of mashed potatoes mixed with DNA of a Staphylococcus aureus strain previously associated with a foodborne outbreak. The selective sequencing was used to either deplete the potato sequencing reads or enrich for the pathogen sequencing reads, and compared to a shotgun sequencing. Then, living S. aureus were spiked at 105 CFU into 25 g of mashed potatoes. Three DNA extraction kits were tested, in combination with enrichment using adaptive sampling, following whole genome amplification. After data analysis, the possibility to characterize the contaminant with the different sequencing and extraction methods, without culture enrichment, was assessed. Results Overall, the adaptive sampling outperformed the shotgun sequencing. While the use of a host removal DNA extraction kit and targeted sequencing using a database of foodborne pathogens allowed rapid detection of the pathogen, the most complete characterization was achieved when using solely a database of S. aureus combined with a conventional DNA extraction kit, enabling accurate placement of the strain on a phylogenetic tree alongside outbreak cases. Discussion This method shows great potential for strain-level analysis of foodborne outbreaks without the need for culture enrichment, thereby enabling faster investigations and facilitating precise pathogen characterization. The integration of adaptive sampling with metagenomics presents a valuable strategy for more efficient and targeted analysis of microbial communities in foodborne outbreaks, contributing to improved food safety and public health.
Collapse
Affiliation(s)
- Florence E. Buytaers
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Bavo Verhaegen
- National Reference Laboratory for Foodborne Outbreaks (NRL-FBO) and for Coagulase Positive Staphylococci (NRL-CPS), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | - Tom Van Nieuwenhuysen
- National Reference Laboratory for Foodborne Outbreaks (NRL-FBO) and for Coagulase Positive Staphylococci (NRL-CPS), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | | | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Department of Information Technology, IDlab, IMEC, Ghent University, Ghent, Belgium
| | | |
Collapse
|
10
|
Abulfaraj AA, Shami AY, Alotaibi NM, Alomran MM, Aloufi AS, Al-Andal A, AlHamdan NR, Alshehrei FM, Sefrji FO, Alsaadi KH, Abuauf HW, Alshareef SA, Jalal RS. Exploration of genes encoding KEGG pathway enzymes in rhizospheric microbiome of the wild plant Abutilon fruticosum. AMB Express 2024; 14:27. [PMID: 38381255 PMCID: PMC10881953 DOI: 10.1186/s13568-024-01678-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/28/2024] [Indexed: 02/22/2024] Open
Abstract
The operative mechanisms and advantageous synergies existing between the rhizobiome and the wild plant species Abutilon fruticosum were studied. Within the purview of this scientific study, the reservoir of genes in the rhizobiome, encoding the most highly enriched enzymes, was dominantly constituted by members of phylum Thaumarchaeota within the archaeal kingdom, phylum Proteobacteria within the bacterial kingdom, and the phylum Streptophyta within the eukaryotic kingdom. The ensemble of enzymes encoded through plant exudation exhibited affiliations with 15 crosstalking KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathways. The ultimate goal underlying root exudation, as surmised from the present investigation, was the biosynthesis of saccharides, amino acids, and nucleic acids, which are imperative for the sustenance, propagation, or reproduction of microbial consortia. The symbiotic companionship existing between the wild plant and its associated rhizobiome amplifies the resilience of the microbial community against adverse abiotic stresses, achieved through the orchestration of ABA (abscisic acid) signaling and its cascading downstream effects. Emergent from the process of exudation are pivotal bioactive compounds including ATP, D-ribose, pyruvate, glucose, glutamine, and thiamine diphosphate. In conclusion, we hypothesize that future efforts to enhance the growth and productivity of commercially important crop plants under both favorable and unfavorable environmental conditions may focus on manipulating plant rhizobiomes.
Collapse
Affiliation(s)
- Aala A Abulfaraj
- Biological Sciences Department, College of Science & Arts, King Abdulaziz University, Rabigh 21911, Saudi Arabia.
| | - Ashwag Y Shami
- Department of Biology, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Nahaa M Alotaibi
- Department of Biology, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Maryam M Alomran
- Department of Biology, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Abeer S Aloufi
- Department of Biology, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Abeer Al-Andal
- Department of Biology, College of Science, King Khalid University, Abha 61413, Saudi Arabia
| | | | - Fatimah M Alshehrei
- Department of Biology, Jumum College University, Umm Al-Qura University, P.O. Box 7388, Makkah 21955, Saudi Arabia
| | - Fatmah O Sefrji
- Department of Biology, College of Science, Taibah University, Al-Madinah Al-Munawarah 30002, Saudi Arabia
| | - Khloud H Alsaadi
- Department of Biological Science, College of Science, University of Jeddah, Jeddah 21493, Saudi Arabia
| | - Haneen W Abuauf
- Department of Biology, Faculty of Applied Science, Umm Al-Qura University, Makkah 24381, Saudi Arabia
| | - Sahar A Alshareef
- Department of Biological Science, College of Science and Arts at Khulis, University of Jeddah, Jeddah 21921, Saudi Arabia
| | - Rewaa S Jalal
- Department of Biological Science, College of Science, University of Jeddah, Jeddah 21493, Saudi Arabia.
| |
Collapse
|
11
|
Spohr P, Scharf S, Rommerskirchen A, Henrich B, Jäger P, Klau GW, Haas R, Dilthey A, Pfeffer K. Insights into gut microbiomes in stem cell transplantation by comprehensive shotgun long-read sequencing. Sci Rep 2024; 14:4068. [PMID: 38374282 PMCID: PMC10876974 DOI: 10.1038/s41598-024-53506-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/21/2024] Open
Abstract
The gut microbiome is a diverse ecosystem, dominated by bacteria; however, fungi, phages/viruses, archaea, and protozoa are also important members of the gut microbiota. Exploration of taxonomic compositions beyond bacteria as well as an understanding of the interaction between the bacteriome with the other members is limited using 16S rDNA sequencing. Here, we developed a pipeline enabling the simultaneous interrogation of the gut microbiome (bacteriome, mycobiome, archaeome, eukaryome, DNA virome) and of antibiotic resistance genes based on optimized long-read shotgun metagenomics protocols and custom bioinformatics. Using our pipeline we investigated the longitudinal composition of the gut microbiome in an exploratory clinical study in patients undergoing allogeneic hematopoietic stem cell transplantation (alloHSCT; n = 31). Pre-transplantation microbiomes exhibited a 3-cluster structure, characterized by Bacteroides spp. /Phocaeicola spp., mixed composition and Enterococcus abundances. We revealed substantial inter-individual and temporal variabilities of microbial domain compositions, human DNA, and antibiotic resistance genes during the course of alloHSCT. Interestingly, viruses and fungi accounted for substantial proportions of microbiome content in individual samples. In the course of HSCT, bacterial strains were stable or newly acquired. Our results demonstrate the disruptive potential of alloHSCTon the gut microbiome and pave the way for future comprehensive microbiome studies based on long-read metagenomics.
Collapse
Affiliation(s)
- Philipp Spohr
- Chair Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Düsseldorf, Germany
| | - Sebastian Scharf
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Anna Rommerskirchen
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Birgit Henrich
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Paul Jäger
- Department of Hematology, Immunology, and Clinical Immunology, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Gunnar W Klau
- Chair Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Düsseldorf, Germany.
| | - Rainer Haas
- Department of Hematology, Immunology, and Clinical Immunology, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany.
| | - Alexander Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Düsseldorf, Germany.
| | - Klaus Pfeffer
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
12
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
13
|
Marić J, Križanović K, Riondet S, Nagarajan N, Šikić M. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 2024; 25:15. [PMID: 38212694 PMCID: PMC10782538 DOI: 10.1186/s12859-024-05634-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Long reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes. RESULTS General-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports. CONCLUSION The findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.
Collapse
Affiliation(s)
- Josip Marić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Krešimir Križanović
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Sylvain Riondet
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore.
| | - Mile Šikić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia.
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
| |
Collapse
|
14
|
Cheng WY, Liu WX, Ding Y, Wang G, Shi Y, Chu ESH, Wong S, Sung JJY, Yu J. High Sensitivity of Shotgun Metagenomic Sequencing in Colon Tissue Biopsy by Host DNA Depletion. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1195-1205. [PMID: 36174929 PMCID: PMC11082407 DOI: 10.1016/j.gpb.2022.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 08/29/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
The high host genetic background of tissue biopsies hinders the application of shotgun metagenomic sequencing in characterizing the tissue microbiota. We proposed an optimized method that removed host DNA from colon biopsies and examined the effect on metagenomic analysis. Human or mouse colon biopsies were divided into two groups, with one group undergoing host DNA depletion and the other serving as the control. Host DNA was removed through differential lysis of mammalian and bacterial cells before sequencing. The impact of host DNA depletion on microbiota was compared based on phylogenetic diversity analyses and regression analyses. Removing host DNA enhanced bacterial sequencing depth and improved species discovery, increasing bacterial reads by 2.46 ± 0.20 folds while reducing host reads by 6.80% ± 1.06%. Moreover, 2.40 times more of bacterial species were detected after host DNA depletion. This was confirmed from mouse colon tissues, increasing bacterial reads by 5.46 ± 0.42 folds while decreasing host reads by 10.2% ± 0.83%. Similarly, significantly more bacterial species were detected in the mouse colon tissue upon host DNA depletion (P < 0.001). Furthermore, an increased microbial richness was evident in the host DNA-depleted samples compared with non-depleted controls in human colon biopsies and mouse colon tissues (P < 0.001). Our optimized method of host DNA depletion improves the sensitivity of shotgun metagenomic sequencing in bacteria detection in the biopsy, which may yield a more accurate taxonomic profile of the tissue microbiota and identify bacteria that are important for disease initiation or progression.
Collapse
Affiliation(s)
- Wing Yin Cheng
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Wei-Xin Liu
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Yanqiang Ding
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Guoping Wang
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Yu Shi
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Eagle S H Chu
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Sunny Wong
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China
| | - Joseph J Y Sung
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China; Lee Kong Chian School of Medicine, Nanyang Technology University, Singapore 639798, Singapore
| | - Jun Yu
- State Key Laboratory of Digestive Disease, Institute of Digestive Disease and The Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Shenzhen Research Institute, The Chinese University of Hong Kong, Hong Kong Special Administrative Region 999077, China.
| |
Collapse
|
15
|
Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023; 30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open
Abstract
Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future.
Collapse
Affiliation(s)
- Hongyu Zheng
- Computer Science Department, Princeton University, Princeton, New Jersey, USA
| | - Guillaume Marçais
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
16
|
Zhao Y, Huang F, Wang W, Gao R, Fan L, Wang A, Gao SH. Application of high-throughput sequencing technologies and analytical tools for pathogen detection in urban water systems: Progress and future perspectives. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165867. [PMID: 37516185 DOI: 10.1016/j.scitotenv.2023.165867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 07/31/2023]
Abstract
The ubiquitous presence of pathogenic microorganisms, such as viruses, bacteria, fungi, and protozoa, in urban water systems poses a significant risk to public health. The emergence of infectious waterborne diseases mediated by urban water systems has become one of the leading global causes of mortality. However, the detection and monitoring of these pathogenic microorganisms have been limited by the complexity and diversity in the environmental samples. Conventional methods were restricted by long assay time, high benchmarks of identification, and narrow application sceneries. Novel technologies, such as high-throughput sequencing technologies, enable potentially full-spectrum detection of trace pathogenic microorganisms in complex environmental matrices. This review discusses the current state of high-throughput sequencing technologies for identifying pathogenic microorganisms in urban water systems with a concise summary. Furthermore, future perspectives in pathogen research emphasize the need for detection methods with high accuracy and sensitivity, the establishment of precise detection standards and procedures, and the significance of bioinformatics software and platforms. We have compiled a list of pathogens analysis software/platforms/databases that boast robust engines and high accuracy for preference. We highlight the significance of analyses by combining targeted and non-targeted sequencing technologies, short and long reads technologies, sequencing technologies, and bioinformatic tools in pursuing upgraded biosafety in urban water systems.
Collapse
Affiliation(s)
- Yanmei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Fang Huang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Rui Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
| | - Aijie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Shu-Hong Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
| |
Collapse
|
17
|
Zhu X, Zhao L, Huang L, Yang W, Wang L, Yu R. cgMSI: pathogen detection within species from nanopore metagenomic sequencing data. BMC Bioinformatics 2023; 24:387. [PMID: 37821827 PMCID: PMC10568937 DOI: 10.1186/s12859-023-05512-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 10/02/2023] [Indexed: 10/13/2023] Open
Abstract
BACKGROUND Metagenomic sequencing is an unbiased approach that can potentially detect all the known and unidentified strains in pathogen detection. Recently, nanopore sequencing has been emerging as a highly potential tool for rapid pathogen detection due to its fast turnaround time. However, identifying pathogen within species is nontrivial for nanopore sequencing data due to the high sequencing error rate. RESULTS We developed the core gene alleles metagenome strain identification (cgMSI) tool, which uses a two-stage maximum a posteriori probability estimation method to detect pathogens at strain level from nanopore metagenomic sequencing data at low computational cost. The cgMSI tool can accurately identify strains and estimate relative abundance at 1× coverage. CONCLUSIONS We developed cgMSI for nanopore metagenomic pathogen detection within species. cgMSI is available at https://github.com/ZHU-XU-xmu/cgMSI .
Collapse
Affiliation(s)
- Xu Zhu
- School of Informatics, Xiamen University, Xiamen, Fujian, China
| | - Lili Zhao
- Women and Children's Hospital, School of Medicine, Xiamen University, Xiamen, Fujian, China
| | - Lihong Huang
- Computer Management Center, The First Affiliated Hospital of Xiamen University, Xiamen, Fujian, China
| | | | - Liansheng Wang
- School of Informatics, Xiamen University, Xiamen, Fujian, China.
- National Institute for Data Science in Health and Medicine, Informatics, Xiamen University, Xiamen, Fujian, China.
| | - Rongshan Yu
- School of Informatics, Xiamen University, Xiamen, Fujian, China.
- National Institute for Data Science in Health and Medicine, Informatics, Xiamen University, Xiamen, Fujian, China.
| |
Collapse
|
18
|
Kille B, Garrison E, Treangen TJ, Phillippy AM. Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. Bioinformatics 2023; 39:btad512. [PMID: 37603771 PMCID: PMC10505501 DOI: 10.1093/bioinformatics/btad512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/19/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
MOTIVATION The Jaccard similarity on k-mer sets has shown to be a convenient proxy for sequence identity. By avoiding expensive base-level alignments and comparing reduced sequence representations, tools such as MashMap can scale to massive numbers of pairwise comparisons while still providing useful similarity estimates. However, due to their reliance on minimizer winnowing, previous versions of MashMap were shown to be biased and inconsistent estimators of Jaccard similarity. This directly impacts downstream tools that rely on the accuracy of these estimates. RESULTS To address this, we propose the minmer winnowing scheme, which generalizes the minimizer scheme by use of a rolling minhash with multiple sampled k-mers per window. We show both theoretically and empirically that minmers yield an unbiased estimator of local Jaccard similarity, and we implement this scheme in an updated version of MashMap. The minmer-based implementation is over 10 times faster than the minimizer-based version under the default ANI threshold, making it well-suited for large-scale comparative genomics applications. AVAILABILITY AND IMPLEMENTATION MashMap3 is available at https://github.com/marbl/MashMap.
Collapse
Affiliation(s)
- Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
19
|
Keenum I, Player R, Kralj J, Servetas S, Sussman MD, Russell JA, Stone J, Chandrapati S, Sozhamannan S. Amplicon Sequencing Minimal Information (ASqMI): Quality and Reporting Guidelines for Actionable Calls in Biodefense Applications. J AOAC Int 2023; 106:1424-1430. [PMID: 37067472 PMCID: PMC10472743 DOI: 10.1093/jaoacint/qsad047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/30/2023] [Accepted: 04/07/2023] [Indexed: 04/18/2023]
Abstract
BACKGROUND Accurate, high-confidence data is critical for assessing potential biothreat incidents. In a biothreat event, false-negative and -positive results have serious consequences. Worst case scenarios can result in unnecessary shutdowns or fatalities at an exorbitant monetary and psychological cost, respectively. Quantitative PCR assays for agents of interest have been successfully used for routine biosurveillance. Recently, there has been increased impetus for adoption of amplicon sequencing (AS) for biosurveillance because it enables discrimination of true positives from near-neighbor false positives, as well as broad, simultaneous detection of many targets in many pathogens in a high-throughput scheme. However, the high sensitivity of AS can lead to false positives. Appropriate controls and workflow reporting can help address these challenges. OBJECTIVES Data reporting standards are critical to data trustworthiness. The standards presented herein aim to provide a framework for method quality assessment in biodetection. METHODS We present a set of standards, Amplicon Sequencing Minimal Information (ASqMI), developed under the auspices of the AOAC INTERNATIONAL Stakeholder Program on Agent Detection Assays for making actionable calls in biosurveillance applications. In addition to the first minimum information guidelines for AS, we provide a controls checklist and scoring scheme to assure AS run quality and assess potential sample contamination. RESULTS Adoption of the ASqMI guidelines will improve data quality, help track workflow performance, and ultimately provide decision makers confidence to trust the results of this new and powerful technology. CONCLUSION AS workflows can provide robust, confident calls for biodetection; however, due diligence in reporting and controls are needed. The ASqMI guideline is the first AS minimum reporting guidance document that also provides the means for end users to evaluate their workflows to improve confidence. HIGHLIGHTS Standardized reporting guidance for actionable calls is critical to ensuring trustworthy data.
Collapse
Affiliation(s)
- Ishi Keenum
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Robert Player
- The Johns Hopkins University, Applied Physics Laboratory, Laurel, MD 20723, USA
- Datirium, LLC, Cincinnati, OH 45526, USA
| | - Jason Kralj
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Stephanie Servetas
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Michael D Sussman
- US Department of Agriculture, Agricultural Analytics Division, Livestock and Poultry Programs, Agricultural Marketing Service, Washington, DC 20250 USA
| | | | | | | | - Shanmuga Sozhamannan
- Joint Program Executive Office for Chemical, Biological, Radiological and Nuclear Defense (JPEO-CBRND), Joint Project Lead for CBRND Enabling Biotechnologies (JPL CBRND EB), Frederick, MD 21702, USA
- Joint Research and Development, Inc., Stafford, VA 22556, USA
| |
Collapse
|
20
|
Qi F, Fan S, Fang C, Ge L, Lyu J, Huang Z, Zhao S, Zou Y, Huang L, Liu X, Liang Y, Zhang Y, Zhong Y, Zhang H, Xiao L, Zhang X. Orally administrated Lactobacillus gasseri TM13 and Lactobacillus crispatus LG55 can restore the vaginal health of patients recovering from bacterial vaginosis. Front Immunol 2023; 14:1125239. [PMID: 37575226 PMCID: PMC10415204 DOI: 10.3389/fimmu.2023.1125239] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/19/2023] [Indexed: 08/15/2023] Open
Abstract
Bacterial vaginosis (BV) is a common infection of the lower genital tract with a vaginal microbiome dysbiosis caused by decreasing of lactobacilli. Previous studies suggested that supplementation with live Lactobacillus may benefit the recovery of BV, however, the outcomes vary in people from different regions. Herein, we aim to evaluate the effectiveness of oral Chinese-origin Lactobacillus with adjuvant metronidazole (MET) on treating Chinese BV patients. In total, 67 Chinese women with BV were enrolled in this parallel controlled trial and randomly assigned to two study groups: a control group treated with MET vaginal suppositories for 7 days and a probiotic group treated with oral Lactobacillus gasseri TM13 and Lactobacillus crispatus LG55 as an adjuvant to MET for 30 days. By comparing the participants with Nugent Scores ≥ 7 and < 7 on days 14, 30, and 90, we found that oral administration of probiotics did not improve BV cure rates (72.73% and 84.00% at day 14, 57.14% and 60.00% at day 30, 32.14% and 48.39% at day 90 for probiotic and control group respectively). However, the probiotics were effective in restoring vaginal health after cure by showing higher proportion of participants with Nugent Scores < 4 in the probiotic group compared to the control group (87.50% and 71.43% on day 14, 93.75% and 88.89% on day 30, and 77.78% and 66.67% on day 90). The relative abundance of the probiotic strains was significantly increased in the intestinal microbiome of the probiotic group compared to the control group at day 14, but no significance was detected after 30 and 90 days. Also, the probiotics were not detected in vaginal microbiome, suggesting that L. gasseri TM13 and L. crispatus LG55 mainly acted through the intestine. A higher abundance of Prevotella timonensis at baseline was significantly associated with long-term cure failure of BV and greatly contributed to the enrichment of the lipid IVA synthesis pathway, which could aggravate inflammation response. To sum up, L. gasseri TM13 and L. crispatus LG55 can restore the vaginal health of patients recovering from BV, and individualized intervention mode should be developed to restore the vaginal health of patients recovering from BV. Clinical trial registration https://classic.clinicaltrials.gov/ct2/show/, identifier NCT04771728.
Collapse
Affiliation(s)
- Fengyuan Qi
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Shangrong Fan
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Chao Fang
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
| | - Lan Ge
- BGI Precision Nutrition (Shenzhen) Technology Co., Ltd, Shenzhen, China
| | - Jinli Lyu
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Zhuoqi Huang
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Shaowei Zhao
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
| | - Yuanqiang Zou
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
| | - Liting Huang
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Xinyang Liu
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Yiheng Liang
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Yongke Zhang
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| | - Yiyi Zhong
- BGI Precision Nutrition (Shenzhen) Technology Co., Ltd, Shenzhen, China
| | - Haifeng Zhang
- BGI Precision Nutrition (Shenzhen) Technology Co., Ltd, Shenzhen, China
| | - Liang Xiao
- BGI-Shenzhen, Shenzhen, China
- ShenZhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China
| | - Xiaowei Zhang
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, China
- Institute of Obstetrics and Gynecology, Shenzhen Peking University Hong Kong University of Science and Technology Medical Center, Shenzhen, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecological Diseases, Peking University Shenzhen Hospital, Shenzhen, China
| |
Collapse
|
21
|
Baker DN, Langmead B. Genomic sketching with multiplicities and locality-sensitive hashing using Dashing 2. Genome Res 2023; 33:1218-1227. [PMID: 37414575 PMCID: PMC10538361 DOI: 10.1101/gr.277655.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/30/2023] [Indexed: 07/08/2023]
Abstract
A genomic sketch is a small, probabilistic representation of the set of k-mers in a sequencing data set. Sketches are building blocks for large-scale analyses that consider similarities between many pairs of sequences or sequence collections. Although existing tools can easily compare tens of thousands of genomes, data sets can reach millions of sequences and beyond. Popular tools also fail to consider k-mer multiplicities, making them less applicable in quantitative settings. Here, we describe a method called Dashing 2 that builds on the SetSketch data structure. SetSketch is related to HyperLogLog (HLL) but discards use of leading zero count in favor of a truncated logarithm of adjustable base. Unlike HLL, SetSketch can perform multiplicity-aware sketching when combined with the ProbMinHash method. Dashing 2 integrates locality-sensitive hashing to scale all-pairs comparisons to millions of sequences. It achieves superior similarity estimates for the Jaccard coefficient and average nucleotide identity compared with the original Dashing, but in much less time while using the same-sized sketch. Dashing 2 is a free, open source software.
Collapse
Affiliation(s)
- Daniel N Baker
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218-2683, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218-2683, USA
| |
Collapse
|
22
|
Yang C, Lo T, Nip KM, Hafezqorani S, Warren RL, Birol I. Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim. Gigascience 2023; 12:giad013. [PMID: 36939007 PMCID: PMC10025935 DOI: 10.1093/gigascience/giad013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 01/19/2023] [Accepted: 02/17/2023] [Indexed: 03/21/2023] Open
Abstract
BACKGROUND Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.
Collapse
Affiliation(s)
- Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Life Sciences Centre Room 1364 – 2350 Health Science Mall Vancouver, BC V6T 1Z3, Canada
| |
Collapse
|
23
|
Zheng V, Sariyuce AE, Zola J. Identifying Taxonomic Units in Metagenomic DNA Streams on Mobile Devices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1092-1103. [PMID: 35511831 DOI: 10.1109/tcbb.2022.3172661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
With the emergence of portable DNA sequencers, such as Oxford Nanopore Technology MinION, metagenomic DNA sequencing can be performed in real-time and directly in the field. However, because metagenomic DNA analysis tasks, e.g., classification, taxonomic units assignment, etc., are compute and memory intensive, and the available methods are designed for batch processing, the current metagenomic tools are not well suited for mobile devices. In this work, we propose a new memory-efficient approach to identify Operational Taxonomic Units (OTUs) in metagenomic DNA streams on mobile devices. Our method is based on finding connected components in overlap graphs constructed over a real-time stream of long DNA reads as produced by the MinION platform. We propose an efficient algorithm to maintain connected components when an overlap graph is streamed and show how redundant information can be removed from the stream by transitive closures. We also propose how our algorithms can be integrated into a larger DNA analysis pipeline tailored for mobile computing. Through experiments on simulated and real-world metagenomic data, executed on the actual mobile device, we demonstrate that our resulting solution is able to recover OTUs with high precision. Our experiments also demonstrate the compounding benefits of introducing feedback loops in the DNA analysis pipeline.
Collapse
|
24
|
Sadasivan H, Wadden J, Goliya K, Ranjan P, Dickson RP, Blaauw D, Das R, Narayanasamy S. Rapid Real-time Squiggle Classification for Read until using RawMap. ARCHIVES OF CLINICAL AND BIOMEDICAL RESEARCH 2023; 7:45-57. [PMID: 36938368 PMCID: PMC10022530 DOI: 10.26502/acbr.50170318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
ReadUntil enables Oxford Nanopore Technology's (ONT) sequencers to selectively sequence reads of target species in real-time. This enables efficient microbial enrichment for applications such as microbial abundance estimation and is particularly beneficial for metagenomic samples with a very high fraction of non-target reads (> 99% can be human reads). However, read-until requires a fast and accurate software filter that analyzes a short prefix of a read and determines if it belongs to a microbe of interest (target) or not. The baseline Read Until pipeline uses a deep neural network-based basecaller called Guppy and is slow and inaccurate for this task (~60% of bases sequenced are unclassified). We present RawMap, an efficient CPU-only microbial species-agnostic Read Until classifier for filtering non-target human reads in the squiggle space. RawMap uses a Support Vector Machine (SVM), which is trained to distinguish human from microbe using non-linear and non-stationary characteristics of ONT's squiggle output (continuous electrical signals). Compared to the baseline Read Until pipeline, RawMap is a 1327X faster classifier and significantly improves the sequencing time and cost, and compute time savings. We show that RawMap augmented pipelines reduce sequencing time and cost by ~24% and computing cost by 22%. Additionally, since RawMap is agnostic to microbial species, it can also classify microbial species it is not trained on. We also discuss how RawMap may be used as an alternative to the RT-PCR test for viral load quantification of SARS-CoV-2.
Collapse
Affiliation(s)
- Harisankar Sadasivan
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Jack Wadden
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Kush Goliya
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Piyush Ranjan
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, USA
| | - Robert P Dickson
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, USA
| | - David Blaauw
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Reetuparna Das
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Satish Narayanasamy
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| |
Collapse
|
25
|
Ma S, Li H. Statistical and Computational Methods for Microbial Strain Analysis. Methods Mol Biol 2023; 2629:231-245. [PMID: 36929080 DOI: 10.1007/978-1-0716-2986-4_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.
Collapse
Affiliation(s)
- Siyuan Ma
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
26
|
Portik DM, Brown CT, Pierce-Ward NT. Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets. BMC Bioinformatics 2022; 23:541. [PMID: 36513983 PMCID: PMC9749362 DOI: 10.1186/s12859-022-05103-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 12/07/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic classification and profiling. The development of long-read specific tools for taxonomic classification is accelerating, yet there is a lack of information regarding their relative performance. Here, we perform a critical benchmarking study using 11 methods, including five methods designed specifically for long reads. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. RESULTS Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, two long-read methods (BugSeq, MEGAN-LR & DIAMOND) and one generalized method (sourmash) displayed high precision and recall without any filtering required. Furthermore, in the PacBio HiFi datasets these methods detected all species down to the 0.1% abundance level with high precision. Some long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to resemble the precision and recall of the top-performing methods. We found read quality affected performance for methods relying on protein prediction or exact k-mer matching, and these methods performed better with PacBio HiFi datasets. We also found that long-read datasets with a large proportion of shorter reads (< 2 kb length) resulted in lower precision and worse abundance estimates, relative to length-filtered datasets. Finally, for classification methods, we found that the long-read datasets produced significantly better results than short-read datasets, demonstrating clear advantages for long-read metagenomic sequencing. CONCLUSIONS Our critical assessment of available methods provides best-practice recommendations for current research using long reads and establishes a baseline for future benchmarking studies.
Collapse
Affiliation(s)
- Daniel M. Portik
- grid.423340.20000 0004 0640 9878Pacific Biosciences, 1305 O’Brien Dr, Menlo Park, CA 93025 USA
| | - C. Titus Brown
- grid.27860.3b0000 0004 1936 9684Department of Population Health and Reproduction, University of California Davis, Davis, CA USA
| | - N. Tessa Pierce-Ward
- grid.27860.3b0000 0004 1936 9684Department of Population Health and Reproduction, University of California Davis, Davis, CA USA
| |
Collapse
|
27
|
Roux S, Emerson JB. Diversity in the soil virosphere: to infinity and beyond? Trends Microbiol 2022; 30:1025-1035. [PMID: 35644779 DOI: 10.1016/j.tim.2022.05.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 05/02/2022] [Accepted: 05/03/2022] [Indexed: 01/13/2023]
Abstract
Viruses are key members of Earth's microbiomes, shaping microbial community composition and metabolism. Here, we describe recent advances in 'soil viromics', that is, virus-focused metagenome and metatranscriptome analyses that offer unprecedented windows into the soil virosphere. Given the emerging picture of high soil viral activity, diversity, and dynamics over short spatiotemporal scales, we then outline key eco-evolutionary processes that we hypothesize are the major diversity drivers for soil viruses. We argue that a community effort is needed to establish a 'global soil virosphere atlas' that can be used to address the roles of viruses in soil microbiomes and terrestrial biogeochemical cycles across spatiotemporal scales.
Collapse
Affiliation(s)
- Simon Roux
- DOE (Department of Energy) Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Joanne B Emerson
- Department of Plant Pathology, University of California, Davis, Davis, CA, USA; Genome Center, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
28
|
Das A, Schatz MC. Sketching and sampling approaches for fast and accurate long read classification. BMC Bioinformatics 2022; 23:452. [PMID: 36316646 PMCID: PMC9624007 DOI: 10.1186/s12859-022-05014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/27/2022] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. RESULTS Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. CONCLUSIONS The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
Collapse
Affiliation(s)
- Arun Das
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Michael C. Schatz
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
29
|
Curry KD, Wang Q, Nute MG, Tyshaieva A, Reeves E, Soriano S, Wu Q, Graeber E, Finzer P, Mendling W, Savidge T, Villapol S, Dilthey A, Treangen TJ. Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data. Nat Methods 2022; 19:845-853. [PMID: 35773532 PMCID: PMC9939874 DOI: 10.1038/s41592-022-01520-4] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 05/10/2022] [Indexed: 12/12/2022]
Abstract
16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation-maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu.
Collapse
Affiliation(s)
- Kristen D. Curry
- Rice University, Department of Computer Science, Houston, TX, USA,Corresponding authors: , ,
| | - Qi Wang
- Rice University, Department of Systems, Synthetic, and Physical Biology Science, Houston, TX, USA
| | - Michael G. Nute
- Rice University, Department of Computer Science, Houston, TX, USA
| | - Alona Tyshaieva
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Elizabeth Reeves
- Rice University, Department of Computer Science, Houston, TX, USA
| | - Sirena Soriano
- Houston Methodist Research Institute, Center for Neuroregeneration, Houston, TX, USA
| | - Qinglong Wu
- Baylor College of Medicine, Department of Pathology and Immunology, Houston, TX, USA,Texas Children’s Microbiome Center, Department of Pathology, Texas Children’s Hospital, Houston, Texas, USA
| | - Enid Graeber
- Rice University, Department of Systems, Synthetic, and Physical Biology Science, Houston, TX, USA
| | - Patrick Finzer
- Rice University, Department of Systems, Synthetic, and Physical Biology Science, Houston, TX, USA
| | - Werner Mendling
- German Center for Infections in Gynaecology and Obstetrics at Helios University Clinic Wuppertal, Wuppertal, Germany
| | - Tor Savidge
- Baylor College of Medicine, Department of Pathology and Immunology, Houston, TX, USA,Texas Children’s Microbiome Center, Department of Pathology, Texas Children’s Hospital, Houston, Texas, USA
| | - Sonia Villapol
- Houston Methodist Research Institute, Center for Neuroregeneration, Houston, TX, USA
| | - Alexander Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| | - Todd J. Treangen
- Rice University, Department of Computer Science, Houston, TX, USA,Corresponding authors: , ,
| |
Collapse
|
30
|
Liu S, Koslicki D. CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices. Bioinformatics 2022; 38:i28-i35. [PMID: 35758788 PMCID: PMC9235470 DOI: 10.1093/bioinformatics/btac237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation K-mer-based methods are used ubiquitously in the field of computational biology. However, determining the optimal value of k for a specific application often remains heuristic. Simply reconstructing a new k-mer set with another k-mer size is computationally expensive, especially in metagenomic analysis where datasets are large. Here, we introduce a hashing-based technique that leverages a kind of bottom-m sketch as well as a k-mer ternary search tree (KTST) to obtain k-mer-based similarity estimates for a range of k values. By truncating k-mers stored in a pre-built KTST with a large k=kmax value, we can simultaneously obtain k-mer-based estimates for all k values up to kmax. This truncation approach circumvents the reconstruction of new k-mer sets when changing k values, making analysis more time and space-efficient. Results We derived the theoretical expression of the bias factor due to truncation. And we showed that the biases are negligible in practice: when using a KTST to estimate the containment index between a RefSeq-based microbial reference database and simulated metagenome data for 10 values of k, the running time was close to 10× faster compared to a classic MinHash approach while using less than one-fifth the space to store the data structure. Availability and implementation A python implementation of this method, CMash, is available at https://github.com/dkoslicki/CMash. The reproduction of all experiments presented herein can be accessed via https://github.com/KoslickiLab/CMASH-reproducibles. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shaopeng Liu
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - David Koslicki
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA.,Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16801, USA.,Department of Biology, Pennsylvania State University, State College, PA 16801, USA
| |
Collapse
|
31
|
Identification of Fungi in Flaxseed (L. usitatissimum L.) Using the ITS1 and ITS2 Intergenic Regions. MICROBIOLOGY RESEARCH 2022. [DOI: 10.3390/microbiolres13020024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Flaxseed (Linum usitatissimum L.) displays functional properties and contains α-linolenic acid (omega-3). It also contains soluble and insoluble fiber, lignans, phenolic acids, flavonoids, phytic acid, vitamins, and minerals. However, its microbiota can cause fungal contaminations, drastically reducing its quality. The objective of this work was to identify the fungi present in bulk flaxseed through the internal transcribed spacer (ITS1) intergenic region using a metataxonomics approach. Fungal identification was performed via high-performance sequencing of the ITS1 region using ITS1 (GAACCWGCGGARGGATCA) and ITS2 (GCTGCGTTCTTCATCGATGC) as primers with 300 cycles and single-end sequencing in the MiSeq Sequencing System equipment (Illumina Inc., San Diego, CA, USA). Six genera and eight species of fungi were found in the sample. The genus Aspergillus stood out with three xerophilic species found, A. cibarius, A. Appendiculatus, and A. amstelodami, the first being the most abundant. The second most abundant genus was Wallemia, with the species W. muriae. This is one of the fungi taxa with great xerophilic potential, and some strains can produce toxins. Metataxonomics has proved to be a complete, fast, and efficient method to identify different fungi. Furthermore, high-performance genetic sequencing is an important ally in research, helping to develop novel technological advances related to food safety.
Collapse
|
32
|
Li YJ, Chuang CH, Cheng WC, Chen SH, Chen WL, Lin YJ, Lin CY, Shih YH. A metagenomics study of hexabromocyclododecane degradation with a soil microbial community. JOURNAL OF HAZARDOUS MATERIALS 2022; 430:128465. [PMID: 35739659 DOI: 10.1016/j.jhazmat.2022.128465] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/27/2022] [Accepted: 01/27/2022] [Indexed: 06/15/2023]
Abstract
Hexabromocyclododecanes (HBCDs) are globally prevalent and persistent organic pollutants (POPs) listed by the Stockholm Convention in 2013. They have been detected in many environmental media from waterbodies to Plantae and even in the human body. Due to their highly bioaccumulative characterization, they pose an urgent public health issue. Here, we demonstrate that the indigenous microbial community in the agricultural soil in Taiwan could decompose HBCDs with no additional carbon source incentive. The degradation kinetics reached 0.173 day-1 after the first treatment and 0.104 day-1 after second exposure. With additional C-sources, the rate constants decreased to 0.054-0.097 day-1. The hydroxylic debromination metabolites and ring cleavage long-chain alkane metabolites were identified to support the potential metabolic pathways utilized by the soil microbial communities. The metagenome established by Nanopore sequencing showed significant compositional alteration in the soil microbial community after the HBCD treatment. After ranking, comparing relative abundances, and performing network analyses, several novel bacterial taxa were identified to contribute to HBCD biotransformation, including Herbaspirillum, Sphingomonas, Brevundimonas, Azospirillum, Caulobacter, and Microvirga, through halogenated / aromatic compound degradation, glutathione-S-transferase, and hydrolase activity. We present a compelling and applicable approach combining metagenomics research, degradation kinetics, and metabolomics strategies, which allowed us to decipher the natural attenuation and remediation mechanisms of HBCDs.
Collapse
Affiliation(s)
- Yi-Jie Li
- Department of Agricultural Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Chia-Hsien Chuang
- Institute of Information Science, Academia Sinica, No. 128, Sec. 2, Academia Road, Nankang, Taipei 11529, Taiwan
| | - Wen-Chih Cheng
- Institute of Information Science, Academia Sinica, No. 128, Sec. 2, Academia Road, Nankang, Taipei 11529, Taiwan
| | - Shu-Hwa Chen
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University (TMU), No. 250 Wu-Hsing St., Taipei, Taiwan
| | - Wen-Ling Chen
- Department of Agricultural Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan; Institute of Food Safety and Health, College of Public Health, National Taiwan University, No. 17, Xuzhou Rd., Taipei 100, Taiwan; Department of Public Health, College of Public Health, National Taiwan University, No. 17, Xuzhou Rd., Taipei 100, Taiwan
| | - Yu-Jie Lin
- Department of Agricultural Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Chung-Yen Lin
- Institute of Information Science, Academia Sinica, No. 128, Sec. 2, Academia Road, Nankang, Taipei 11529, Taiwan.
| | - Yang-Hsin Shih
- Department of Agricultural Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.
| |
Collapse
|
33
|
Jin S, Wetzel D, Schirmer M. Deciphering mechanisms and implications of bacterial translocation in human health and disease. Curr Opin Microbiol 2022; 67:102147. [PMID: 35461008 DOI: 10.1016/j.mib.2022.102147] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/28/2022] [Accepted: 03/03/2022] [Indexed: 12/12/2022]
Abstract
Significant increases in potential microbial translocation, especially along the oral-gut axis, have been identified in many immune-related and inflammatory diseases, such as inflammatory bowel disease, colorectal cancer, rheumatoid arthritis, and liver cirrhosis, for which we currently have no cure or long-term treatment options. Recent advances in computational and experimental omics approaches now enable strain tracking, functional profiling, and strain isolation in unprecedented detail, which has the potential to elucidate the causes and consequences of microbial translocation. In this review, we discuss current evidence for the detection of bacterial translocation, examine different translocation axes with a primary focus on the oral-gut axis, and outline currently known translocation mechanisms and how they adversely affect the host in disease. Finally, we conclude with an overview of state-of-the-art computational and experimental tools for strain tracking and highlight the required next steps to elucidate the role of bacterial translocation in human health.
Collapse
Affiliation(s)
- Shen Jin
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany
| | - Daniela Wetzel
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany
| | - Melanie Schirmer
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany.
| |
Collapse
|
34
|
Ko KKK, Chng KR, Nagarajan N. Metagenomics-enabled microbial surveillance. Nat Microbiol 2022; 7:486-496. [PMID: 35365786 DOI: 10.1038/s41564-022-01089-w] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 02/22/2022] [Indexed: 12/13/2022]
Abstract
Lessons learnt from the COVID-19 pandemic include increased awareness of the potential for zoonoses and emerging infectious diseases that can adversely affect human health. Although emergent viruses are currently in the spotlight, we must not forget the ongoing toll of morbidity and mortality owing to antimicrobial resistance in bacterial pathogens and to vector-borne, foodborne and waterborne diseases. Population growth, planetary change, international travel and medical tourism all contribute to the increasing frequency of infectious disease outbreaks. Surveillance is therefore of crucial importance, but the diversity of microbial pathogens, coupled with resource-intensive methods, compromises our ability to scale-up such efforts. Innovative technologies that are both easy to use and able to simultaneously identify diverse microorganisms (viral, bacterial or fungal) with precision are necessary to enable informed public health decisions. Metagenomics-enabled surveillance methods offer the opportunity to improve detection of both known and yet-to-emerge pathogens.
Collapse
Affiliation(s)
- Karrie K K Ko
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,Department of Microbiology, Singapore General Hospital, Singapore, Singapore.,Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore.,Duke-NUS Medical School, Singapore, Singapore.,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore
| | - Kern Rei Chng
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,National Centre for Food Science, Singapore Food Agency, Singapore, Singapore
| | - Niranjan Nagarajan
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore. .,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore.
| |
Collapse
|
35
|
Strain identification and quantitative analysis in microbial communities. J Mol Biol 2022; 434:167582. [DOI: 10.1016/j.jmb.2022.167582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/31/2022] [Accepted: 04/03/2022] [Indexed: 12/14/2022]
|
36
|
Adler A, Poirier S, Pagni M, Maillard J, Holliger C. Disentangle genus microdiversity within a complex microbial community by using a multi-distance long-read binning method: example of Candidatus Accumulibacter. Environ Microbiol 2022; 24:2136-2156. [PMID: 35315560 PMCID: PMC9311429 DOI: 10.1111/1462-2920.15947] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 02/19/2022] [Indexed: 11/26/2022]
Abstract
Complete genomes can be recovered from metagenomes by assembling and binning DNA sequences into metagenome assembled genomes (MAGs). Yet, the presence of microdiversity can hamper the assembly and binning processes, possibly yielding chimeric, highly fragmented and incomplete genomes. Here, the metagenomes of four samples of aerobic granular sludge bioreactors containing Candidatus (Ca.) Accumulibacter, a phosphate-accumulating organism of interest for wastewater treatment, were sequenced with both PacBio and Illumina. Different strategies of genome assembly and binning were investigated, including published protocols and a binning procedure adapted to the binning of long contigs (MuLoBiSC). Multiple criteria were considered to select the best strategy for Ca. Accumulibacter, whose multiple strains in every sample represent a challenging microdiversity. In this case, the best strategy relies on long-read only assembly and a custom binning procedure including MuLoBiSC in metaWRAP. Several high-quality Ca. Accumulibacter MAGs, including a novel species, were obtained independently from different samples. Comparative genomic analysis showed that MAGs retrieved in different samples harbour genomic rearrangements in addition to accumulation of point mutations. The microdiversity of Ca. Accumulibacter, likely driven by mobile genetic elements, causes major difficulties in recovering MAGs, but it is also a hallmark of the panmictic lifestyle of these bacteria.
Collapse
Affiliation(s)
- Aline Adler
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Simon Poirier
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Marco Pagni
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Vital-IT Group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Julien Maillard
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,IFP Energie nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison Cedex, France
| | - Christof Holliger
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
37
|
Yang Y, Che Y, Liu L, Wang C, Yin X, Deng Y, Yang C, Zhang T. Rapid absolute quantification of pathogens and ARGs by nanopore sequencing. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 809:152190. [PMID: 34890655 DOI: 10.1016/j.scitotenv.2021.152190] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/30/2021] [Accepted: 12/01/2021] [Indexed: 06/13/2023]
Abstract
Compositional nature of relative abundance data in the current standard microbiome studies limits microbial dynamics interpretations and cross-sample comparisons. Here, we demonstrate the first rapid (1-h sequencing) method coupling Nanopore metagenomic sequencing with cellular spike-in to facilitate the absolute quantification and removal assessment of pathogens and antibiotic resistance genes (ARGs) in wastewater treatment plants (WWTPs). Nanopore sequencing-based quantification results for both simple mock community and complex real environmental samples showed a high consistency with those from the widely-used Illumina and culture-based approaches. Implementing such method, we quantified 46 predominant putative pathogenic species, and 361 ARGs in three WWTP sample sets. Though high log removals of dominant pathogens (2.23 logs) and ARGs (1.98 logs) were achieved, complete removal of all pathogens and ARGs were not achieved. Noticeably, Mycobacterium spp., Clostridium_P perfringens, and Borrelia hermsii exhibited low removal, and 13 ARGs even increased in absolute abundance after the treatment. Our proposed approach manifested its profound ability in providing absolute quantitation information guiding wastewater-based epidemiological surveillance and quantitative risk assessment facilitating microbial hazards management.
Collapse
Affiliation(s)
- Yu Yang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - You Che
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Lei Liu
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Chunxiao Wang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Xiaole Yin
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Yu Deng
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Chao Yang
- Key Laboratory of Molecular Microbiology and Technology for Ministry of Education, College of Life Sciences, Nankai University, Tianjin 300071, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Centre for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
38
|
BugSplit enables genome-resolved metagenomics through highly accurate taxonomic binning of metagenomic assemblies. Commun Biol 2022; 5:151. [PMID: 35194141 PMCID: PMC8864044 DOI: 10.1038/s42003-022-03114-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 02/03/2022] [Indexed: 11/13/2022] Open
Abstract
A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. Here we develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. We perform nanopore mNGS on patients with COVID-19, and using a reference database predating COVID-19, demonstrate that BugSplit’s taxonomic binning enables sensitive and specific detection of a novel coronavirus not possible with other approaches. When applied to nanopore mNGS data from cases of Klebsiella pneumoniae and Neisseria gonorrhoeae infection, BugSplit’s taxonomic binning accurately separates pathogen sequences from those of the host and microbiota, and unlocks the possibility of sequence typing, in silico serotyping, and antimicrobial resistance prediction of each organism within a sample. BugSplit is available at https://bugseq.com/academic. A new computational method, BugSplit, teases out individual species’ genomes from metagenomic samples. The authors show that BugSplit is able to identify the presence of a novel coronavirus in COVID-19 patients using a database from 2019 predating the pandemic and can separate host and pathogen sequences in other clinical samples with much higher specificity and accuracy than competing tools.
Collapse
|
39
|
Grealey J, Lannelongue L, Saw WY, Marten J, Méric G, Ruiz-Carmona S, Inouye M. THE CARBON FOOTPRINT OF BIOINFORMATICS. Mol Biol Evol 2022; 39:6526403. [PMID: 35143670 PMCID: PMC8892942 DOI: 10.1093/molbev/msac034] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Collapse
Affiliation(s)
- Jason Grealey
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Woei-Yuh Saw
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Jonathan Marten
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
| | - Sergio Ruiz-Carmona
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.,British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.,The Alan Turing Institute, London, UK
| |
Collapse
|
40
|
Liao H, Cai D, Sun Y. VirStrain: a strain identification tool for RNA viruses. Genome Biol 2022; 23:38. [PMID: 35101081 PMCID: PMC8801933 DOI: 10.1186/s13059-022-02609-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 01/12/2022] [Indexed: 12/18/2022] Open
Abstract
Viruses change constantly during replication, leading to high intra-species diversity. Although many changes are neutral or deleterious, some can confer on the virus different biological properties such as better adaptability. In addition, viral genotypes often have associated metadata, such as host residence, which can help with inferring viral transmission during pandemics. Thus, subspecies analysis can provide important insights into virus characterization. Here, we present VirStrain, a tool taking short reads as input with viral strain composition as output. We rigorously test VirStrain on multiple simulated and real virus sequencing datasets. VirStrain outperforms the state-of-the-art tools in both sensitivity and accuracy.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
41
|
Yang S, Johnson MA, Hansen MA, Bush E, Li S, Vinatzer BA. Metagenomic sequencing for detection and identification of the boxwood blight pathogen Calonectria pseudonaviculata. Sci Rep 2022; 12:1399. [PMID: 35082361 PMCID: PMC8791934 DOI: 10.1038/s41598-022-05381-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/10/2022] [Indexed: 11/16/2022] Open
Abstract
Pathogen detection and identification are key elements in outbreak control of human, animal, and plant diseases. Since many fungal plant pathogens cause similar symptoms, are difficult to distinguish morphologically, and grow slowly in culture, culture-independent, sequence-based diagnostic methods are desirable. Whole genome metagenomic sequencing has emerged as a promising technique because it can potentially detect any pathogen without culturing and without the need for pathogen-specific probes. However, efficient DNA extraction protocols, computational tools, and sequence databases are required. Here we applied metagenomic sequencing with the Oxford Nanopore Technologies MinION to the detection of the fungus Calonectria pseudonaviculata, the causal agent of boxwood (Buxus spp.) blight disease. Two DNA extraction protocols, several DNA purification kits, and various computational tools were tested. All DNA extraction methods and purification kits provided sufficient quantity and quality of DNA. Several bioinformatics tools for taxonomic identification were found suitable to assign sequencing reads to the pathogen with an extremely low false positive rate. Over 9% of total reads were identified as C. pseudonaviculata in a severely diseased sample and identification at strain-level resolution was approached as the number of sequencing reads was increased. We discuss how metagenomic sequencing could be implemented in routine plant disease diagnostics.
Collapse
Affiliation(s)
- Shu Yang
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Marcela A Johnson
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA.,Graduate Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Mary Ann Hansen
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Elizabeth Bush
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Boris A Vinatzer
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
42
|
Martin S, Heavens D, Lan Y, Horsfield S, Clark MD, Leggett RM. Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples. Genome Biol 2022; 23:11. [PMID: 35067223 PMCID: PMC8785595 DOI: 10.1186/s13059-021-02582-x] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 12/20/2021] [Indexed: 12/13/2022] Open
Abstract
Adaptive sampling is a method of software-controlled enrichment unique to nanopore sequencing platforms. To test its potential for enrichment of rarer species within metagenomic samples, we create a synthetic mock community and construct sequencing libraries with a range of mean read lengths. Enrichment is up to 13.87-fold for the least abundant species in the longest read length library; factoring in reduced yields from rejecting molecules the calculated efficiency raises this to 4.93-fold. Finally, we introduce a mathematical model of enrichment based on molecule length and relative abundance, whose predictions correlate strongly with mock and complex real-world microbial communities.
Collapse
Affiliation(s)
- Samuel Martin
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Darren Heavens
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Yuxuan Lan
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | | | | | | |
Collapse
|
43
|
Hoang MTV, Irinyi L, Hu Y, Schwessinger B, Meyer W. Long-Reads-Based Metagenomics in Clinical Diagnosis With a Special Focus on Fungal Infections. Front Microbiol 2022; 12:708550. [PMID: 35069461 PMCID: PMC8770865 DOI: 10.3389/fmicb.2021.708550] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
Identification of the causative infectious agent is essential in the management of infectious diseases, with the ideal diagnostic method being rapid, accurate, and informative, while remaining cost-effective. Traditional diagnostic techniques rely on culturing and cell propagation to isolate and identify the causative pathogen. These techniques are limited by the ability and the time required to grow or propagate an agent in vitro and the facts that identification based on morphological traits are non-specific, insensitive, and reliant on technical expertise. The evolution of next-generation sequencing has revolutionized genomic studies to generate more data at a cheaper cost. These are divided into short- and long-read sequencing technologies, depending on the length of reads generated during sequencing runs. Long-read sequencing also called third-generation sequencing emerged commercially through the instruments released by Pacific Biosciences and Oxford Nanopore Technologies, although relying on different sequencing chemistries, with the first one being more accurate both platforms can generate ultra-long sequence reads. Long-read sequencing is capable of entirely spanning previously established genomic identification regions or potentially small whole genomes, drastically improving the accuracy of the identification of pathogens directly from clinical samples. Long-read sequencing may also provide additional important clinical information, such as antimicrobial resistance profiles and epidemiological data from a single sequencing run. While initial applications of long-read sequencing in clinical diagnosis showed that it could be a promising diagnostic technique, it also has highlighted the need for further optimization. In this review, we show the potential long-read sequencing has in clinical diagnosis of fungal infections and discuss the pros and cons of its implementation.
Collapse
Affiliation(s)
- Minh Thuy Vi Hoang
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - Laszlo Irinyi
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
| | - Yiheng Hu
- Research School of Biology, Australia National University, Canberra, ACT, Australia
| | | | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, Australia
| |
Collapse
|
44
|
Retrospective detection of asymptomatic monkeypox virus infections among male sexual health clinic attendees in Belgium. Nat Med 2022; 28:2288-2292. [PMID: 35961373 PMCID: PMC9671802 DOI: 10.1038/s41591-022-02004-w] [Citation(s) in RCA: 125] [Impact Index Per Article: 62.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/10/2022] [Indexed: 01/14/2023]
Abstract
The magnitude of the 2022 multi-country monkeypox virus (MPXV) outbreak has surpassed any preceding outbreak. It is unclear whether asymptomatic or otherwise undiagnosed infections are fuelling this epidemic. In this study, we aimed to assess whether undiagnosed infections occurred among men attending a Belgian sexual health clinic in May 2022. We retrospectively screened 224 samples collected for gonorrhea and chlamydia testing using an MPXV PCR assay and identified MPXV-DNA-positive samples from four men. At the time of sampling, one man had a painful rash, and three men had reported no symptoms. Upon clinical examination 21-37 days later, these three men were free of clinical signs, and they reported not having experienced any symptoms. Serology confirmed MPXV exposure in all three men, and MPXV was cultured from two cases. These findings show that certain cases of monkeypox remain undiagnosed and suggest that testing and quarantining of individuals reporting symptoms may not suffice to contain the outbreak.
Collapse
|
45
|
Curry KD, Nute MG, Treangen TJ. It takes guts to learn: machine learning techniques for disease detection from the gut microbiome. Emerg Top Life Sci 2021; 5:815-827. [PMID: 34779841 PMCID: PMC8786294 DOI: 10.1042/etls20210213] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/29/2021] [Accepted: 10/06/2021] [Indexed: 02/01/2023]
Abstract
Associations between the human gut microbiome and expression of host illness have been noted in a variety of conditions ranging from gastrointestinal dysfunctions to neurological deficits. Machine learning (ML) methods have generated promising results for disease prediction from gut metagenomic information for diseases including liver cirrhosis and irritable bowel disease, but have lacked efficacy when predicting other illnesses. Here, we review current ML methods designed for disease classification from microbiome data. We highlight the computational challenges these methods have effectively overcome and discuss the biological components that have been overlooked to offer perspectives on future work in this area.
Collapse
Affiliation(s)
- Kristen D. Curry
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| |
Collapse
|
46
|
Latorre-Pérez A, Gimeno-Valero H, Tanner K, Pascual J, Vilanova C, Porcar M. A Round Trip to the Desert: In situ Nanopore Sequencing Informs Targeted Bioprospecting. Front Microbiol 2021; 12:768240. [PMID: 34966365 PMCID: PMC8710813 DOI: 10.3389/fmicb.2021.768240] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 11/15/2021] [Indexed: 12/26/2022] Open
Abstract
Bioprospecting expeditions are often performed in remote locations, in order to access previously unexplored samples. Nevertheless, the actual potential of those samples is only assessed once scientists are back in the laboratory, where a time-consuming screening must take place. This work evaluates the suitability of using Nanopore sequencing during a journey to the Tabernas Desert (Spain) for forecasting the potential of specific samples in terms of bacterial diversity and prevalence of radiation- and desiccation-resistant taxa, which were the target of the bioprospecting activities. Samples collected during the first day were analyzed through 16S rRNA gene sequencing using a mobile laboratory. Results enabled the identification of locations showing the greatest and the least potential, and a second, informed sampling was performed focusing on those sites. After finishing the expedition, a culture collection of 166 strains belonging to 50 different genera was established. Overall, Nanopore and culturing data correlated well, since samples holding a greater potential at the microbiome level also yielded a more interesting set of microbial isolates, whereas samples showing less biodiversity resulted in a reduced (and redundant) set of culturable bacteria. Thus, we anticipate that portable sequencers hold potential as key, easy-to-use tools for in situ-informed bioprospecting strategies.
Collapse
Affiliation(s)
| | | | | | | | | | - Manuel Porcar
- Darwin Bioprospecting Excellence S.L., Paterna, Spain
- Institute for Integrative Systems Biology I2SysBio (University of València-CSIC), Paterna, Spain
| |
Collapse
|
47
|
Siekaniec G, Roux E, Lemane T, Guédon E, Nicolas J. Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer. Microb Genom 2021; 7. [PMID: 34812718 PMCID: PMC8743539 DOI: 10.1099/mgen.0.000654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
This study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called ORI, on 77 strains of Streptococcus thermophilus. We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed 'sibling' strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.
Collapse
Affiliation(s)
- Grégoire Siekaniec
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- INRAE, Institut Agro, STLO, F-35000, Rennes, France
| | - Emeline Roux
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- CALBINOTOX (Composés ALimentaire BIofonctionnalités et risques NeuTOXiques) EA7488 Université de Lorraine, France
| | - Téo Lemane
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
| | - Eric Guédon
- INRAE, Institut Agro, STLO, F-35000, Rennes, France
- *Correspondence: Eric Guédon,
| | - Jacques Nicolas
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- *Correspondence: Jacques Nicolas,
| |
Collapse
|
48
|
Buytaers FE, Saltykova A, Denayer S, Verhaegen B, Vanneste K, Roosens NHC, Piérard D, Marchal K, De Keersmaecker SCJ. Towards Real-Time and Affordable Strain-Level Metagenomics-Based Foodborne Outbreak Investigations Using Oxford Nanopore Sequencing Technologies. Front Microbiol 2021; 12:738284. [PMID: 34803953 PMCID: PMC8602914 DOI: 10.3389/fmicb.2021.738284] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/13/2021] [Indexed: 11/18/2022] Open
Abstract
The current routine laboratory practices to investigate food samples in case of foodborne outbreaks still rely on attempts to isolate the pathogen in order to characterize it. We present in this study a proof of concept using Shiga toxin-producing Escherichia coli spiked food samples for a strain-level metagenomics foodborne outbreak investigation method using the MinION and Flongle flow cells from Oxford Nanopore Technologies, and we compared this to Illumina short-read-based metagenomics. After 12 h of MinION sequencing, strain-level characterization could be achieved, linking the food containing a pathogen to the related human isolate of the affected patient, by means of a single-nucleotide polymorphism (SNP)-based phylogeny. The inferred strain harbored the same virulence genes as the spiked isolate and could be serotyped. This was achieved by applying a bioinformatics method on the long reads using reference-based classification. The same result could be obtained after 24-h sequencing on the more recent lower output Flongle flow cell, on an extract treated with eukaryotic host DNA removal. Moreover, an alternative approach based on in silico DNA walking allowed to obtain rapid confirmation of the presence of a putative pathogen in the food sample. The DNA fragment harboring characteristic virulence genes could be matched to the E. coli genus after sequencing only 1 h with the MinION, 1 h with the Flongle if using a host DNA removal extraction, or 5 h with the Flongle with a classical DNA extraction. This paves the way towards the use of metagenomics as a rapid, simple, one-step method for foodborne pathogen detection and for fast outbreak investigation that can be implemented in routine laboratories on samples prepared with the current standard practices.
Collapse
Affiliation(s)
- Florence E. Buytaers
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Assia Saltykova
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Sarah Denayer
- National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL STEC), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | - Bavo Verhaegen
- National Reference Laboratory for Shiga Toxin-Producing Escherichia coli (NRL STEC), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | - Kevin Vanneste
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
| | | | - Denis Piérard
- National Reference Center for Shiga Toxin-Producing Escherichia coli (NRC STEC), Department of Microbiology and Infection Control, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Department of Information Technology, IDlab, IMEC, Ghent University, Ghent, Belgium
| | | |
Collapse
|
49
|
Croville G, Corrand L, Lucas MN, Le Loc'h G, Donnadieu C, Lopez-Roques C, Manno M, Blondel V, Delverdier M, Guérin JL. Detection and Typing of a Fowl Adenovirus Type 1 Agent of Pancreatitis in Guinea Fowl. Avian Dis 2021; 65:429-437. [PMID: 34699140 DOI: 10.1637/0005-2086-65.3.429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/29/2020] [Indexed: 11/05/2022]
Abstract
Adenoviral pancreatitis has been amply described for decades in guinea fowl. Although its pathologic picture has been characterized fairly well, its etiology still remains only partially clarified. Based on several outbreaks diagnosed on commercial guinea flocks raised in France since 2017, we performed direct whole-genome sequencing from pancreatic lesional tissue by using the Oxford Nanopore Technologies (ONT) sequencing method. We generated 4781 viral reads and assembled a whole genome of 43,509 bp, clustering within fowl adenovirus type 1 (FAdV-1). A phylogenetic analysis based on a partial sequence of the hexon and short fiber genes on viruses collected in France showed 98.7% and 99.8% nucleotide identity, respectively. Altogether, these results confirm that an FAdV-1 closely related to chicken and other avian strains is the agent of pancreatitis in guinea fowl. This study illustrates the potential of ONT sequencing method to achieve rapid whole-genome sequencing directly from pathologic material.
Collapse
Affiliation(s)
| | - Léni Corrand
- Université de Toulouse, ENVT, INRAe, UMR IHAP, 31076 Toulouse, France.,ABIOPOLE, 64410 Arzacq-Arraziguet, France
| | | | | | | | | | - Maxime Manno
- GeT-PlaGe, Genotoul, INRAE, 31326, Castanet-Tolosan, France
| | | | | | - Jean-Luc Guérin
- Université de Toulouse, ENVT, INRAe, UMR IHAP, 31076 Toulouse, France,
| |
Collapse
|
50
|
Comprehensive Wet-Bench and Bioinformatics Workflow for Complex Microbiota Using Oxford Nanopore Technologies. mSystems 2021; 6:e0075021. [PMID: 34427527 PMCID: PMC8407471 DOI: 10.1128/msystems.00750-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The advent of high-throughput sequencing techniques has recently provided an astonishing insight into the composition and function of the human microbiome. Next-generation sequencing (NGS) has become the gold standard for advanced microbiome analysis; however, 3rd generation real-time sequencing, such as Oxford Nanopore Technologies (ONT), enables rapid sequencing from several kilobases to >2 Mb with high resolution. Despite the wide availability and the enormous potential for clinical and translational applications, ONT is poorly standardized in terms of sampling and storage conditions, DNA extraction, library creation, and bioinformatic classification. Here, we present a comprehensive analysis pipeline with sampling, storage, DNA extraction, library preparation, and bioinformatic evaluation for complex microbiomes sequenced with ONT. Our findings from buccal and rectal swabs and DNA extraction experiments indicate that methods that were approved for NGS microbiome analysis cannot be simply adapted to ONT. We recommend using swabs and DNA extractions protocols with extended washing steps. Both 16S rRNA and metagenomic sequencing achieved reliable and reproducible results. Our benchmarking experiments reveal thresholds for analysis parameters that achieved excellent precision, recall, and area under the precision recall values and is superior to existing classifiers (Kraken2, Kaiju, and MetaMaps). Hence, our workflow provides an experimental and bioinformatic pipeline to perform a highly accurate analysis of complex microbial structures from buccal and rectal swabs. IMPORTANCE Advanced microbiome analysis relies on sequencing of short DNA fragments from microorganisms like bacteria, fungi, and viruses. More recently, long fragment DNA sequencing of 3rd generation sequencing has gained increasing importance and can be rapidly conducted within a few hours due to its potential real-time sequencing. However, the analysis and correct identification of the microbiome relies on a multitude of factors, such as the method of sampling, DNA extraction, sequencing, and bioinformatic analysis. Scientists have used different protocols in the past that do not allow us to compare results across different studies and research fields. Here, we provide a comprehensive workflow from DNA extraction, sequencing, and bioinformatic workflow that allows rapid and accurate analysis of human buccal and rectal swabs with reproducible protocols. This workflow can be readily applied by many scientists from various research fields that aim to use long-fragment microbiome sequencing.
Collapse
|