1
|
Compendium of Metabolomic and Genomic Datasets for Cyanobacteria: Mined the Gap. WATER RESEARCH 2024; 256:121492. [PMID: 38593604 DOI: 10.1016/j.watres.2024.121492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 03/09/2024] [Accepted: 03/18/2024] [Indexed: 04/11/2024]
Abstract
Cyanobacterial blooms, producing toxic secondary metabolites, are becoming increasingly common phenomena in the face of rising global temperatures. They are the world's most abundant photosynthetic organisms, largely owing their success to a range of highly diverse and complex natural products possessing a broad spectrum of different bioactivities. Over 2600 compounds have been isolated from cyanobacteria thus far, and their characterisation has revealed unusual and useful chemistries and motifs including alkynes, halogens, and non-canonical amino acids. Genome sequencing of cyanobacteria lags behind natural product isolation, with only 19% of cyanobacterial natural products associated with a sequenced organism. Recent advances in meta(genomics) provide promise to narrow this gap and has also facilitated the uprise of combined genomic and metabolomic approaches, heralding a new era of discovery of novel compounds. Analyses of the datasets described within this manuscript reveal the asynchrony of current genomic and metabolomic data, highlight the chemical diversity of cyanobacterial natural products. Linked to this manuscript, we make these manually curated datasets freely accessible for the public to facilitate further research in this important area.
Collapse
|
2
|
Does regulation hold the key to optimizing lipopeptide production in Pseudomonas for biotechnology? Front Bioeng Biotechnol 2024; 12:1363183. [PMID: 38476965 PMCID: PMC10928948 DOI: 10.3389/fbioe.2024.1363183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 02/12/2024] [Indexed: 03/14/2024] Open
Abstract
Lipopeptides (LPs) produced by Pseudomonas spp. are specialized metabolites with diverse structures and functions, including powerful biosurfactant and antimicrobial properties. Despite their enormous potential in environmental and industrial biotechnology, low yield and high production cost limit their practical use. While genome mining and functional genomics have identified a multitude of LP biosynthetic gene clusters, the regulatory mechanisms underlying their biosynthesis remain poorly understood. We propose that regulation holds the key to unlocking LP production in Pseudomonas for biotechnology. In this review, we summarize the structure and function of Pseudomonas-derived LPs and describe the molecular basis for their biosynthesis and regulation. We examine the global and specific regulator-driven mechanisms controlling LP synthesis including the influence of environmental signals. Understanding LP regulation is key to modulating production of these valuable compounds, both quantitatively and qualitatively, for industrial and environmental biotechnology.
Collapse
|
3
|
Exploring Pan-Genomes: An Overview of Resources and Tools for Unraveling Structure, Function, and Evolution of Crop Genes and Genomes. Biomolecules 2023; 13:1403. [PMID: 37759803 PMCID: PMC10527062 DOI: 10.3390/biom13091403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/29/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pan-genomes of several crops. The pan-genomes of crops constructed from various cultivars, accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novel genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding to mitigate the impact of global climate change. Here, we summarize the tools used for pan-genome assembly and annotations, web-portals hosting plant pan-genomes, etc. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and future potential of this emerging field of study.
Collapse
|
4
|
Composition of the alfalfa pathobiome in commercial fields. Front Microbiol 2023; 14:1225781. [PMID: 37692394 PMCID: PMC10491455 DOI: 10.3389/fmicb.2023.1225781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/31/2023] [Indexed: 09/12/2023] Open
Abstract
Through the recent advances of modern high-throughput sequencing technologies, the "one microbe, one disease" dogma is being gradually replaced with the principle of the "pathobiome". Pathobiome is a comprehensive biotic environment that not only includes a diverse community of all disease-causing organisms within the plant but also defines their mutual interactions and resultant effect on plant health. To date, the concept of pathobiome as a major component in plant health and sustainable production of alfalfa (Medicago sativa L.), the most extensively cultivated forage legume in the world, is non-existent. Here, we approached this subject by characterizing the biodiversity of the alfalfa pathobiome using high-throughput sequencing technology. Our metagenomic study revealed a remarkable abundance of different pathogenic communities associated with alfalfa in the natural ecosystem. Profiling the alfalfa pathobiome is a starting point to assess known and identify new and emerging stress challenges in the context of plant disease management. In addition, it allows us to address the complexity of microbial interactions within the plant host and their impact on the development and evolution of pathogenesis.
Collapse
|
5
|
Long-Read Metagenomics of Marine Microbes Reveals Diversely Expressed Secondary Metabolites. Microbiol Spectr 2023; 11:e0150123. [PMID: 37409950 PMCID: PMC10434046 DOI: 10.1128/spectrum.01501-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/14/2023] [Indexed: 07/07/2023] Open
Abstract
Microbial secondary metabolites play crucial roles in microbial competition, communication, resource acquisition, antibiotic production, and a variety of other biotechnological processes. The retrieval of full-length BGC (biosynthetic gene cluster) sequences from uncultivated bacteria is difficult due to the technical constraints of short-read sequencing, making it impossible to determine BGC diversity. Using long-read sequencing and genome mining, 339 mainly full-length BGCs were recovered in this study, illuminating the wide range of BGCs from uncultivated lineages discovered in seawater from Aoshan Bay, Yellow Sea, China. Many extremely diverse BGCs were discovered in bacterial phyla such as Proteobacteria, Bacteroidota, Acidobacteriota, and Verrucomicrobiota as well as the previously uncultured archaeal phylum "Candidatus Thermoplasmatota." The data from metatranscriptomics showed that 30.1% of secondary metabolic genes were being expressed, and they also revealed the expression pattern of BGC core biosynthetic genes and tailoring enzymes. Taken together, our results demonstrate that long-read metagenomic sequencing combined with metatranscriptomic analysis provides a direct view into the functional expression of BGCs in environmental processes. IMPORTANCE Genome mining of metagenomic data has become the preferred method for the bioprospecting of novel compounds by cataloguing secondary metabolite potential. However, the accurate detection of BGCs requires unfragmented genomic assemblies, which have been technically difficult to obtain from metagenomes until recently with new long-read technologies. We used high-quality metagenome-assembled genomes generated from long-read data to determine the biosynthetic potential of microbes found in the surface water of the Yellow Sea. We recovered 339 highly diverse and mostly full-length BGCs from largely uncultured and underexplored bacterial and archaeal phyla. Additionally, we present long-read metagenomic sequencing combined with metatranscriptomic analysis as a potential method for gaining access to the largely underutilized genetic reservoir of specialized metabolite gene clusters in the majority of microbes that are not cultured. The combination of long-read metagenomic and metatranscriptomic analyses is significant because it can more accurately assess the mechanisms of microbial adaptation to the environment through BGC expression based on metatranscriptomic data.
Collapse
|
6
|
Advanced Methods for Natural Products Discovery: Bioactivity Screening, Dereplication, Metabolomics Profiling, Genomic Sequencing, Databases and Informatic Tools, and Structure Elucidation. Mar Drugs 2023; 21:md21050308. [PMID: 37233502 DOI: 10.3390/md21050308] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023] Open
Abstract
Natural Products (NP) are essential for the discovery of novel drugs and products for numerous biotechnological applications. The NP discovery process is expensive and time-consuming, having as major hurdles dereplication (early identification of known compounds) and structure elucidation, particularly the determination of the absolute configuration of metabolites with stereogenic centers. This review comprehensively focuses on recent technological and instrumental advances, highlighting the development of methods that alleviate these obstacles, paving the way for accelerating NP discovery towards biotechnological applications. Herein, we emphasize the most innovative high-throughput tools and methods for advancing bioactivity screening, NP chemical analysis, dereplication, metabolite profiling, metabolomics, genome sequencing and/or genomics approaches, databases, bioinformatics, chemoinformatics, and three-dimensional NP structure elucidation.
Collapse
|
7
|
Snake River alfalfa virus, a persistent virus infecting alfalfa (Medicago sativa L.) in Washington State, USA. Virol J 2023; 20:32. [PMID: 36803436 PMCID: PMC9938972 DOI: 10.1186/s12985-023-01991-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 02/14/2023] [Indexed: 02/21/2023] Open
Abstract
Here we report an occurrence of Snake River alfalfa virus (SRAV) in Washington state, USA. SRAV was recently identified in alfalfa (Medicago sativa L.) plants and western flower thrips in south-central Idaho and proposed to be a first flavi-like virus identified in a plant host. We argue that the SRAV, based on its prevalence in alfalfa plants, readily detectable dsRNA, genome structure, presence in alfalfa seeds, and seed-mediated transmission is a persistent new virus distantly resembling members of the family Endornaviridae.
Collapse
|
8
|
TaxiBGC: a Taxonomy-Guided Approach for Profiling Experimentally Characterized Microbial Biosynthetic Gene Clusters and Secondary Metabolite Production Potential in Metagenomes. mSystems 2022; 7:e0092522. [PMID: 36378489 PMCID: PMC9765181 DOI: 10.1128/msystems.00925-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Biosynthetic gene clusters (BGCs) in microbial genomes encode bioactive secondary metabolites (SMs), which can play important roles in microbe-microbe and host-microbe interactions. Given the biological significance of SMs and the current profound interest in the metabolic functions of microbiomes, the unbiased identification of BGCs from high-throughput metagenomic data could offer novel insights into the complex chemical ecology of microbial communities. Currently available tools for predicting BGCs from shotgun metagenomes have several limitations, including the need for computationally demanding read assembly, predicting a narrow breadth of BGC classes, and not providing the SM product. To overcome these limitations, we developed taxonomy-guided identification of biosynthetic gene clusters (TaxiBGC), a command-line tool for predicting experimentally characterized BGCs (and inferring their known SMs) in metagenomes by first pinpointing the microbial species likely to harbor them. We benchmarked TaxiBGC on various simulated metagenomes, showing that our taxonomy-guided approach could predict BGCs with much-improved performance (mean F1 score, 0.56; mean PPV score, 0.80) compared with directly identifying BGCs by mapping sequencing reads onto the BGC genes (mean F1 score, 0.49; mean PPV score, 0.41). Next, by applying TaxiBGC on 2,650 metagenomes from the Human Microbiome Project and various case-control gut microbiome studies, we were able to associate BGCs (and their SMs) with different human body sites and with multiple diseases, including Crohn's disease and liver cirrhosis. In all, TaxiBGC provides an in silico platform to predict experimentally characterized BGCs and their SM production potential in metagenomic data while demonstrating important advantages over existing techniques. IMPORTANCE Currently available bioinformatics tools to identify BGCs from metagenomic sequencing data are limited in their predictive capability or ease of use to even computationally oriented researchers. We present an automated computational pipeline called TaxiBGC, which predicts experimentally characterized BGCs (and infers their known SMs) in shotgun metagenomes by first considering the microbial species source. Through rigorous benchmarking techniques on simulated metagenomes, we show that TaxiBGC provides a significant advantage over existing methods. When demonstrating TaxiBGC on thousands of human microbiome samples, we associate BGCs encoding bacteriocins with different human body sites and diseases, thereby elucidating a possible novel role of this antibiotic class in maintaining the stability of microbial ecosystems throughout the human body. Furthermore, we report for the first time gut microbial BGC associations shared among multiple pathologies. Ultimately, we expect our tool to facilitate future investigations into the chemical ecology of microbial communities across diverse niches and pathologies.
Collapse
|
9
|
Long-Read Metagenome-Assembled Genomes Improve Identification of Novel Complete Biosynthetic Gene Clusters in a Complex Microbial Activated Sludge Ecosystem. mSystems 2022; 7:e0063222. [PMID: 36445112 PMCID: PMC9765116 DOI: 10.1128/msystems.00632-22] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Microorganisms produce a wide variety of secondary/specialized metabolites (SMs), the majority of which are yet to be discovered. These natural products play multiple roles in microbiomes and are important for microbial competition, communication, and success in the environment. SMs have been our major source of antibiotics and are used in a range of biotechnological applications. In silico mining for biosynthetic gene clusters (BGCs) encoding the production of SMs is commonly used to assess the genetic potential of organisms. However, as BGCs span tens to over 200 kb, identifying complete BGCs requires genome data that has minimal assembly gaps within the BGCs, a prerequisite that was previously only met by individually sequenced genomes. Here, we assess the performance of the currently available genome mining platform antiSMASH on 1,080 high-quality metagenome-assembled bacterial genomes (HQ MAGs) previously produced from wastewater treatment plants (WWTPs) using a combination of long-read (Oxford Nanopore) and short-read (Illumina) sequencing technologies. More than 4,200 different BGCs were identified, with 88% of these being complete. Sequence similarity clustering of the BGCs implies that the majority of this biosynthetic potential likely encodes novel compounds, and few BGCs are shared between genera. We identify BGCs in abundant and functionally relevant genera in WWTPs, suggesting a role of secondary metabolism in this ecosystem. We find that the assembly of HQ MAGs using long-read sequencing is vital to explore the genetic potential for SM production among the uncultured members of microbial communities. IMPORTANCE Cataloguing secondary metabolite (SM) potential using genome mining of metagenomic data has become the method of choice in bioprospecting for novel compounds. However, accurate biosynthetic gene cluster (BGC) detection requires unfragmented genomic assemblies, which have been technically difficult to obtain from metagenomes until very recently with new long-read technologies. Here, we determined the biosynthetic potential of activated sludge (AS), the microbial community used in resource recovery and wastewater treatment, by mining high-quality metagenome-assembled genomes generated from long-read data. We found over 4,000 BGCs, including BGCs in abundant process-critical bacteria, with no similarity to the BGCs of characterized products. We show how long-read MAGs are required to confidently assemble complete BGCs, and we determined that the AS BGCs from different studies have very little overlap, suggesting that AS is a rich source of biosynthetic potential and new bioactive compounds.
Collapse
|
10
|
Progress and opportunities in microbial community metabolomics. Curr Opin Microbiol 2022; 70:102195. [PMID: 36063685 DOI: 10.1016/j.mib.2022.102195] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 01/25/2023]
Abstract
The metabolome lies at the interface of host-microbiome crosstalk. Previous work has established links between chemically diverse microbial metabolites and a myriad of host physiological processes and diseases. Coupled with scalable and cost-effective technologies, metabolomics is thus gaining popularity as a tool for characterization of microbial communities, particularly when combined with metagenomics as a window into microbiome function. A systematic interrogation of microbial community metabolomes can uncover key microbial compounds, metabolic capabilities of the microbiome, and also provide critical mechanistic insights into microbiome-linked host phenotypes. In this review, we discuss methods and accompanying resources that have been developed for these purposes. The accomplishments of these methods demonstrate that metabolomes can be used to functionally characterize microbial communities, and that microbial properties can be used to identify and investigate chemical compounds.
Collapse
|
11
|
Deep-Sea Sediments from the Southern Gulf of Mexico Harbor a Wide Diversity of PKS I Genes. Antibiotics (Basel) 2022; 11:antibiotics11070887. [PMID: 35884142 PMCID: PMC9311598 DOI: 10.3390/antibiotics11070887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 11/19/2022] Open
Abstract
The excessive use of antibiotics has triggered the appearance of new resistant strains, which is why great interest has been taken in the search for new bioactive compounds capable of overcoming this emergency in recent years. Massive sequencing tools have enabled the detection of new microorganisms that cannot be cultured in a laboratory, thus opening the door to the search for new biosynthetic genes. The great variety in oceanic environments in terms of pressure, salinity, temperature, and nutrients enables marine microorganisms to develop unique biochemical and physiological properties for their survival, enhancing the production of secondary metabolites that can vary from those produced by terrestrial microorganisms. We performed a search for type I PKS genes in metagenomes obtained from the marine sediments of the deep waters of the Gulf of Mexico using Hidden Markov Models. More than 2000 candidate genes were detected in the metagenomes that code for type I PKS domains, while biosynthetic pathways that may code for other secondary metabolites were also detected. Our research demonstrates the great potential use of the marine sediments of the Gulf of Mexico for identifying genes that code for new secondary metabolites.
Collapse
|
12
|
Insights into the Antimicrobial Activities and Metabolomes of Aquimarina ( Flavobacteriaceae, Bacteroidetes) Species from the Rare Marine Biosphere. Mar Drugs 2022; 20:md20070423. [PMID: 35877716 PMCID: PMC9323603 DOI: 10.3390/md20070423] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/16/2022] [Accepted: 06/24/2022] [Indexed: 12/17/2022] Open
Abstract
Two novel natural products, the polyketide cuniculene and the peptide antibiotic aquimarin, were recently discovered from the marine bacterial genus Aquimarina. However, the diversity of the secondary metabolite biosynthetic gene clusters (SM-BGCs) in Aquimarina genomes indicates a far greater biosynthetic potential. In this study, nine representative Aquimarina strains were tested for antimicrobial activity against diverse human-pathogenic and marine microorganisms and subjected to metabolomic and genomic profiling. We found an inhibitory activity of most Aquimarina strains against Candida glabrata and marine Vibrio and Alphaproteobacteria species. Aquimarina sp. Aq135 and Aquimarina muelleri crude extracts showed particularly promising antimicrobial activities, amongst others against methicillin-resistant Staphylococcus aureus. The metabolomic and functional genomic profiles of Aquimarina spp. followed similar patterns and were shaped by phylogeny. SM-BGC and metabolomics networks suggest the presence of novel polyketides and peptides, including cyclic depsipeptide-related compounds. Moreover, exploration of the ‘Sponge Microbiome Project’ dataset revealed that Aquimarina spp. possess low-abundance distributions worldwide across multiple marine biotopes. Our study emphasizes the relevance of this member of the microbial rare biosphere as a promising source of novel natural products. We predict that future metabologenomics studies of Aquimarina species will expand the spectrum of known secondary metabolites and bioactivities from marine ecosystems.
Collapse
|
13
|
Unveiling the genomic potential of Pseudomonas type strains for discovering new natural products. Microb Genom 2022; 8:000758. [PMID: 35195510 PMCID: PMC8942027 DOI: 10.1099/mgen.0.000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 12/07/2021] [Indexed: 12/20/2022] Open
Abstract
Microbes host a huge variety of biosynthetic gene clusters that produce an immeasurable array of secondary metabolites with many different biological activities such as antimicrobial, anticarcinogenic and antiviral. Despite the complex task of isolating and characterizing novel natural products, microbial genomic strategies can be useful for carrying out these types of studies. However, although genomic-based research on secondary metabolism is on the increase, there is still a lack of reports focusing specifically on the genus Pseudomonas. In this work, we aimed (i) to unveil the main biosynthetic systems related to secondary metabolism in Pseudomonas type strains, (ii) to study the evolutionary processes that drive the diversification of their coding regions and (iii) to select Pseudomonas strains showing promising results in the search for useful natural products. We performed a comparative genomic study on 194 Pseudomonas species, paying special attention to the evolution and distribution of different classes of biosynthetic gene clusters and the coding features of antimicrobial peptides. Using EvoMining, a bioinformatic approach for studying evolutionary processes related to secondary metabolism, we sought to decipher the protein expansion of enzymes related to the lipid metabolism, which may have evolved toward the biosynthesis of novel secondary metabolites in Pseudomonas. The types of metabolites encoded in Pseudomonas type strains were predominantly non-ribosomal peptide synthetases, bacteriocins, N-acetylglutaminylglutamine amides and ß-lactones. Also, the evolution of genes related to secondary metabolites was found to coincide with Pseudomonas species diversification. Interestingly, only a few Pseudomonas species encode polyketide synthases, which are related to the lipid metabolism broadly distributed among bacteria. Thus, our EvoMining-based search may help to discover new types of secondary metabolite gene clusters in which lipid-related enzymes are involved. This work provides information about uncharacterized metabolites produced by Pseudomonas type strains, whose gene clusters have evolved in a species-specific way. Our results provide novel insight into the secondary metabolism of Pseudomonas and will serve as a basis for the prioritization of the isolated strains. This article contains data hosted by Microreact.
Collapse
|
14
|
Petabase-scale sequence alignment catalyses viral discovery. Nature 2022; 602:142-147. [PMID: 35082445 DOI: 10.1038/s41586-021-04332-2] [Citation(s) in RCA: 138] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/10/2021] [Indexed: 01/20/2023]
Abstract
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Collapse
|
15
|
Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat Biotechnol 2022; 40:711-719. [PMID: 34980911 DOI: 10.1038/s41587-021-01130-z] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 10/13/2021] [Indexed: 12/18/2022]
Abstract
Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host-viral (host-plasmid) associations using Hi-C data.
Collapse
|
16
|
coronaSPAdes: from biosynthetic gene clusters to RNA viral assemblies. Bioinformatics 2021; 38:1-8. [PMID: 34406356 DOI: 10.1093/bioinformatics/btab597] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 07/20/2021] [Accepted: 08/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The COVID-19 pandemic has ignited a broad scientific interest in viral research in general and coronavirus research in particular. The identification and characterization of viral species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble many viruses (including coronaviruses) into a single contig. Coverage variation between datasets and within dataset, presence of close strains, splice variants and contamination set a high bar for assemblers to process viral datasets with diverse properties. RESULTS We developed coronaSPAdes, a novel assembler for RNA viral species recovery in general and coronaviruses in particular. coronaSPAdes leverages the knowledge about viral genome structures to improve assembly extending ideas initially implemented in biosyntheticSPAdes. We have shown that coronaSPAdes outperforms existing SPAdes modes and other popular short-read metagenome and viral assemblers in the recovery of full-length RNA viral genomes. AVAILABILITY AND IMPLEMENTATION coronaSPAdes version used in this article is a part of SPAdes 3.15 release and is freely available at http://cab.spbu.ru/software/spades. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Genome-Guided Discovery of Natural Products through Multiplexed Low-Coverage Whole-Genome Sequencing of Soil Actinomycetes on Oxford Nanopore Flongle. mSystems 2021; 6:e0102021. [PMID: 34812649 PMCID: PMC8609971 DOI: 10.1128/msystems.01020-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/31/2021] [Indexed: 12/02/2022] Open
Abstract
Genome mining is an important tool for discovery of new natural products; however, the number of publicly available genomes for natural product-rich microbes such as actinomycetes, relative to human pathogens with smaller genomes, is small. To obtain contiguous DNA assemblies and identify large (ca. 10 to greater than 100 kb) biosynthetic gene clusters (BGCs) with high GC (>70%) and high-repeat content, it is necessary to use long-read sequencing methods when sequencing actinomycete genomes. One of the hurdles to long-read sequencing is the higher cost. In the current study, we assessed Flongle, a recently launched platform by Oxford Nanopore Technologies, as a low-cost DNA sequencing option to obtain contiguous DNA assemblies and analyze BGCs. To make the workflow more cost-effective, we multiplexed up to four samples in a single Flongle sequencing experiment while expecting low-sequencing coverage per sample. We hypothesized that contiguous DNA assemblies might enable analysis of BGCs even at low sequencing depth. To assess the value of these assemblies, we collected high-resolution mass spectrometry data and conducted a multi-omics analysis to connect BGCs to secondary metabolites. In total, we assembled genomes for 20 distinct strains across seven sequencing experiments. In each experiment, 50% of the bases were in reads longer than 10 kb, which facilitated the assembly of reads into contigs with an average N50 value of 3.5 Mb. The programs antiSMASH and PRISM predicted 629 and 295 BGCs, respectively. We connected BGCs to metabolites for N,N-dimethyl cyclic-di-tryptophan, two novel lasso peptides, and three known actinomycete-associated siderophores, namely, mirubactin, heterobactin, and salinichelin. IMPORTANCE Short-read sequencing of GC-rich genomes such as those from actinomycetes results in a fragmented genome assembly and truncated biosynthetic gene clusters (often 10 to >100 kb long), which hinders our ability to understand the biosynthetic potential of a given strain and predict the molecules that can be produced. The current study demonstrates that contiguous DNA assemblies, suitable for analysis of BGCs, can be obtained through low-coverage, multiplexed sequencing on Flongle, which provides a new low-cost workflow ($30 to 40 per strain) for sequencing actinomycete strain libraries.
Collapse
|
18
|
Abstract
Polyketide synthases (PKSs) and non-ribosomal peptide synthetases (NRPSs) are mega enzymes responsible for the biosynthesis of a large fraction of natural products (NPs). Molecular markers for biosynthetic genes, such as the ketosynthase (KS) domain of PKSs, have been used to assess the diversity and distribution of biosynthetic genes in complex microbial communities. More recently, metagenomic studies have complemented and enhanced this approach by allowing the recovery of complete biosynthetic gene clusters (BGCs) from environmental DNA. In this study, the distribution and diversity of biosynthetic genes and clusters from Arctic Ocean samples (NICE-2015 expedition), was assessed using PCR-based strategies coupled with high-throughput sequencing and metagenomic analysis. In total, 149 KS domain OTU sequences were recovered, 36 % of which could not be assigned to any known BGC. In addition, 74 bacterial metagenome-assembled genomes were recovered, from which 179 BGCs were extracted. A network analysis identified potential new NP families, including non-ribosomal peptides and polyketides. Complete or near-complete BGCs were recovered, which will enable future heterologous expression efforts to uncover the respective NPs. Our study represents the first report of biosynthetic diversity assessed for Arctic Ocean metagenomes and highlights the potential of Arctic Ocean planktonic microbiomes for the discovery of novel secondary metabolites. The strategy employed in this study will enable future bioprospection, by identifying promising samples for bacterial isolation efforts, while providing also full-length BGCs for heterologous expression.
Collapse
|
19
|
Metabolomics and genomics in natural products research: complementary tools for targeting new chemical entities. Nat Prod Rep 2021; 38:2041-2065. [PMID: 34787623 PMCID: PMC8691422 DOI: 10.1039/d1np00036e] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Covering: 2010 to 2021Organisms in nature have evolved into proficient synthetic chemists, utilizing specialized enzymatic machinery to biosynthesize an inspiring diversity of secondary metabolites. Often serving to boost competitive advantage for their producers, these secondary metabolites have widespread human impacts as antibiotics, anti-inflammatories, and antifungal drugs. The natural products discovery field has begun a shift away from traditional activity-guided approaches and is beginning to take advantage of increasingly available metabolomics and genomics datasets to explore undiscovered chemical space. Major strides have been made and now enable -omics-informed prioritization of chemical structures for discovery, including the prospect of confidently linking metabolites to their biosynthetic pathways. Over the last decade, more integrated strategies now provide researchers with pipelines for simultaneous identification of expressed secondary metabolites and their biosynthetic machinery. However, continuous collaboration by the natural products community will be required to optimize strategies for effective evaluation of natural product biosynthetic gene clusters to accelerate discovery efforts. Here, we provide an evaluative guide to scientific literature as it relates to studying natural product biosynthesis using genomics, metabolomics, and their integrated datasets. Particular emphasis is placed on the unique insights that can be gained from large-scale integrated strategies, and we provide source organism-specific considerations to evaluate the gaps in our current knowledge.
Collapse
|
20
|
The Methods of Digging for "Gold" within the Salt: Characterization of Halophilic Prokaryotes and Identification of Their Valuable Biological Products Using Sequencing and Genome Mining Tools. Genes (Basel) 2021; 12:genes12111756. [PMID: 34828362 PMCID: PMC8619533 DOI: 10.3390/genes12111756] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 10/18/2021] [Accepted: 10/30/2021] [Indexed: 02/06/2023] Open
Abstract
Halophiles, the salt-loving organisms, have been investigated for at least a hundred years. They are found in all three domains of life, namely Archaea, Bacteria, and Eukarya, and occur in saline and hypersaline environments worldwide. They are already a valuable source of various biomolecules for biotechnological, pharmaceutical, cosmetological and industrial applications. In the present era of multidrug-resistant bacteria, cancer expansion, and extreme environmental pollution, the demand for new, effective compounds is higher and more urgent than ever before. Thus, the unique metabolism of halophilic microorganisms, their low nutritional requirements and their ability to adapt to harsh conditions (high salinity, high pressure and UV radiation, low oxygen concentration, hydrophobic conditions, extreme temperatures and pH, toxic compounds and heavy metals) make them promising candidates as a fruitful source of bioactive compounds. The main aim of this review is to highlight the nucleic acid sequencing experimental strategies used in halophile studies in concert with the presentation of recent examples of bioproducts and functions discovered in silico in the halophile's genomes. We point out methodological gaps and solutions based on in silico methods that are helpful in the identification of valuable bioproducts synthesized by halophiles. We also show the potential of an increasing number of publicly available genomic and metagenomic data for halophilic organisms that can be analysed to identify such new bioproducts and their producers.
Collapse
|
21
|
Screening Strategies for Biosurfactant Discovery. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2021; 181:17-52. [PMID: 34518910 DOI: 10.1007/10_2021_174] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The isolation and screening of bacteria and fungi for the production of surface-active compounds has been the basis for the majority of the biosurfactants discovered to date. Hence, a wide variety of well-established and relatively simple methods are available for screening, mostly focused on the detection of surface or interfacial activity of the culture supernatant. However, the success of any biodiscovery effort, specifically aiming to access novelty, relies directly on the characteristics being screened for and the uniqueness of the microorganisms being screened. Therefore, given that rather few novel biosurfactant structures have been discovered during the last decade, advanced strategies are now needed to widen access to novel chemistries and properties. In addition, more modern Omics technologies should be considered to the traditional culture-based approaches for biosurfactant discovery. This chapter summarizes the screening methods and strategies typically used for the discovery of biosurfactants and highlights some of the Omics-based approaches that have resulted in the discovery of unique biosurfactants. These studies illustrate the potentially enormous diversity that has yet to be unlocked and how we can begin to tap into these biological resources.
Collapse
|
22
|
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.
Collapse
|
23
|
Abstract
SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.
Collapse
|
24
|
ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs. MICROBIOME 2021; 9:149. [PMID: 34183047 PMCID: PMC8240309 DOI: 10.1186/s40168-021-01092-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 05/11/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics. METHODS Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG. RESULTS We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes. CONCLUSIONS We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes "hidden" in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. Video abstract.
Collapse
|
25
|
Dissecting Disease-Suppressive Rhizosphere Microbiomes by Functional Amplicon Sequencing and 10× Metagenomics. mSystems 2021; 6:e0111620. [PMID: 34100635 PMCID: PMC8269251 DOI: 10.1128/msystems.01116-20] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Disease-suppressive soils protect plants against soilborne fungal pathogens that would otherwise cause root infections. Soil suppressiveness is, in most cases, mediated by the antagonistic activity of the microbial community associated with the plant roots. Considering the enormous taxonomic and functional diversity of the root-associated microbiome, identification of the microbial genera and mechanisms underlying this phenotype is challenging. One approach to unravel the underlying mechanisms is to identify metabolic pathways enriched in the disease-suppressive microbial community, in particular, pathways that harbor natural products with antifungal properties. An important class of these natural products includes peptides produced by nonribosomal peptide synthetases (NRPSs). Here, we applied functional amplicon sequencing of NRPS-associated adenylation domains (A domains) to a collection of eight soils that are suppressive or nonsuppressive (i.e., conducive) to Fusarium culmorum, a fungal root pathogen of wheat. To identify functional elements in the root-associated bacterial community, we developed an open-source pipeline, referred to as dom2BGC, for amplicon annotation and putative gene cluster reconstruction through analyzing A domain co-occurrence across samples. We applied this pipeline to rhizosphere communities from four disease-suppressive and four conducive soils and found significant similarities in NRPS repertoires between suppressive soils. Specifically, several siderophore biosynthetic gene clusters were consistently associated with suppressive soils, hinting at competition for iron as a potential mechanism of suppression. Finally, to validate dom2BGC and to allow more unbiased functional metagenomics, we performed 10× metagenomic sequencing of one suppressive soil, leading to the identification of multiple gene clusters potentially associated with the disease-suppressive phenotype. IMPORTANCE Soil-borne plant-pathogenic fungi continue to be a major threat to agriculture and horticulture. The genus Fusarium in particular is one of the most devastating groups of soilborne fungal pathogens for a wide range of crops. Our approach to develop novel sustainable strategies to control this fungal root pathogen is to explore and exploit an effective, yet poorly understood naturally occurring protection, i.e., disease-suppressive soils. After screening 28 agricultural soils, we recently identified four soils that were suppressive to root disease of wheat caused by Fusarium culmorum. We also confirmed, via sterilization and transplantation, that the microbiomes of these soils play a significant role in the suppressive phenotype. By adopting nonribosomal peptide synthetase (NRPS) functional amplicon screening of suppressive and conducive soils, we here show how computationally driven comparative analysis of combined functional amplicon and metagenomic data can unravel putative mechanisms underlying microbiome-associated plant phenotypes.
Collapse
|
26
|
Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery. Nat Commun 2021; 12:3225. [PMID: 34050176 PMCID: PMC8163882 DOI: 10.1038/s41467-021-23502-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 05/04/2021] [Indexed: 02/07/2023] Open
Abstract
Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs.
Collapse
|
27
|
Abstract
The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth's continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes.
Collapse
|
28
|
Metagenomic Data Assembly - The Way of Decoding Unknown Microorganisms. Front Microbiol 2021; 12:613791. [PMID: 33833738 PMCID: PMC8021871 DOI: 10.3389/fmicb.2021.613791] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open
Abstract
Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers - computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.
Collapse
|
29
|
Production of the antimicrobial compound tetrabromopyrrole and the Pseudomonas quinolone system precursor, 2-heptyl-4-quinolone, by a novel marine species Pseudoalteromonas galatheae sp. nov. Sci Rep 2020; 10:21630. [PMID: 33303891 PMCID: PMC7730127 DOI: 10.1038/s41598-020-78439-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 11/25/2020] [Indexed: 01/23/2023] Open
Abstract
Novel antimicrobials are urgently needed due to the rapid spread of antibiotic resistant bacteria. In a genome-wide analysis of Pseudoalteromonas strains, one strain (S4498) was noticed due to its potent antibiotic activity. It did not produce the yellow antimicrobial pigment bromoalterochromide, which was produced by several related type strains with which it shared less than 95% average nucleotide identity. Also, it produced a sweet-smelling volatile not observed from other strains. Mining the genome of strain S4498 using the secondary metabolite prediction tool antiSMASH led to eight biosynthetic gene clusters with no homology to known compounds, and synteny analyses revealed that the yellow pigment bromoalterochromide was likely lost during evolution. Metabolome profiling of strain S4498 using HPLC-HRMS analyses revealed marked differences to the type strains. In particular, a series of quinolones known as pseudanes were identified and verified by NMR. The characteristic odor of the strain was linked to the pseudanes. The highly halogenated compound tetrabromopyrrole was detected as the major antibacterial component by bioassay-guided fractionation. Taken together, the polyphasic analysis demonstrates that strain S4498 belongs to a novel species within the genus Pseudoalteromonas, and we propose the name Pseudoalteromonas galatheae sp. nov. (type strain S4498T = NCIMB 15250T = LMG 31599T).
Collapse
|
30
|
Discovery of Novel Biosynthetic Gene Cluster Diversity From a Soil Metagenomic Library. Front Microbiol 2020; 11:585398. [PMID: 33365020 PMCID: PMC7750434 DOI: 10.3389/fmicb.2020.585398] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 11/16/2020] [Indexed: 12/31/2022] Open
Abstract
Soil microorganisms historically have been a rich resource for natural product discovery, yet the majority of these microbes remain uncultivated and their biosynthetic capacity is left underexplored. To identify the biosynthetic potential of soil microorganisms using a culture-independent approach, we constructed a large-insert metagenomic library in Escherichia coli from a topsoil sampled from the Cullars Rotation (Auburn, AL, United States), a long-term crop rotation experiment. Library clones were screened for biosynthetic gene clusters (BGCs) using either PCR or a NGS (next generation sequencing) multiplexed pooling strategy, coupled with bioinformatic analysis to identify contigs associated with each metagenomic clone. A total of 1,015 BGCs were detected from 19,200 clones, identifying 223 clones (1.2%) that carry a polyketide synthase (PKS) and/or a non-ribosomal peptide synthetase (NRPS) cluster, a dramatically improved hit rate compared to PCR screening that targeted type I polyketide ketosynthase (KS) domains. The NRPS and PKS clusters identified by NGS were distinct from known BGCs in the MIBiG database or those PKS clusters identified by PCR. Likewise, 16S rRNA gene sequences obtained by NGS of the library included many representatives that were not recovered by PCR, in concordance with the same bias observed in KS amplicon screening. This study provides novel resources for natural product discovery and circumvents amplification bias to allow annotation of a soil metagenomic library for a more complete picture of its functional and phylogenetic diversity.
Collapse
|
31
|
metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 2020; 17:1103-1110. [PMID: 33020656 PMCID: PMC10699202 DOI: 10.1038/s41592-020-00971-x] [Citation(s) in RCA: 286] [Impact Index Per Article: 71.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/22/2020] [Accepted: 09/07/2020] [Indexed: 02/06/2023]
Abstract
Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.
Collapse
|
32
|
Discovery of an Abundance of Biosynthetic Gene Clusters in Shark Bay Microbial Mats. Front Microbiol 2020; 11:1950. [PMID: 32973707 PMCID: PMC7472256 DOI: 10.3389/fmicb.2020.01950] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/24/2020] [Indexed: 01/27/2023] Open
Abstract
Microbial mats are geobiological multilayered ecosystems that have significant evolutionary value in understanding the evolution of early life on Earth. Shark Bay, Australia has some of the best examples of modern microbial mats thriving under harsh conditions of high temperatures, salinity, desiccation, and ultraviolet (UV) radiation. Microorganisms living in extreme ecosystems are thought to potentially encode for secondary metabolites as a survival strategy. Many secondary metabolites are natural products encoded by a grouping of genes known as biosynthetic gene clusters (BGCs). Natural products have diverse chemical structures and functions which provide competitive advantages for microorganisms and can also have biotechnology applications. In the present study, the diversity of BGC were described in detail for the first time from Shark Bay microbial mats. A total of 1477 BGCs were detected in metagenomic data over a 20 mm mat depth horizon, with the surface layer possessing over 200 BGCs and containing the highest relative abundance of BGCs of all mat layers. Terpene and bacteriocin BGCs were highly represented and their natural products are proposed to have important roles in ecosystem function in these mat systems. Interestingly, potentially novel BGCs were detected from Heimdallarchaeota and Lokiarchaeota, two evolutionarily significant archaeal phyla not previously known to possess BGCs. This study provides new insights into how secondary metabolites from BGCs may enable diverse microbial mat communities to adapt to extreme environments.
Collapse
|
33
|
Bacterial Secondary Metabolite Biosynthetic Potential in Soil Varies with Phylum, Depth, and Vegetation Type. mBio 2020; 11:e00416-20. [PMID: 32546614 PMCID: PMC7298704 DOI: 10.1128/mbio.00416-20] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 05/08/2020] [Indexed: 01/12/2023] Open
Abstract
Bacteria isolated from soils are major sources of specialized metabolites, including antibiotics and other compounds with clinical value that likely shape interactions among microbial community members and impact biogeochemical cycles. Yet, isolated lineages represent a small fraction of all soil bacterial diversity. It remains unclear how the production of specialized metabolites varies across the phylogenetic diversity of bacterial species in soils and whether the genetic potential for production of these metabolites differs with soil depth and vegetation type within a geographic region. We sampled soils and saprolite from three sites in a northern California Critical Zone Observatory with various vegetation and bedrock characteristics and reconstructed 1,334 metagenome-assembled genomes containing diverse biosynthetic gene clusters (BGCs) for secondary metabolite production. We obtained genomes for prolific producers of secondary metabolites, including novel groups within the Actinobacteria, Chloroflexi, and candidate phylum "Candidatus Dormibacteraeota." Surprisingly, one genome of a candidate phyla radiation (CPR) bacterium coded for a ribosomally synthesized linear azole/azoline-containing peptide, a capacity we found in other publicly available CPR bacterial genomes. Overall, bacteria with higher biosynthetic potential were enriched in shallow soils and grassland soils, with patterns of abundance of BGC type varying by taxonomy.IMPORTANCE Microbes produce specialized compounds to compete or communicate with one another and their environment. Some of these compounds, such as antibiotics, are also useful in medicine and biotechnology. Historically, most antibiotics have come from soil bacteria which can be isolated and grown in the lab. Though the vast majority of soil bacteria cannot be isolated, we can extract their genetic information and search it for genes which produce these specialized compounds. These understudied soil bacteria offer a wealth of potential for the discovery of new and important microbial products. Here, we identified the ability to produce these specialized compounds in diverse and novel bacteria in a range of soil environments. This information will be useful to other researchers who wish to isolate certain products. Beyond their use to humans, understanding the distribution and function of microbial products is key to understanding microbial communities and their effects on biogeochemical cycles.
Collapse
|
34
|
Abstract
Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery.
Collapse
|
35
|
The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery. ACS CENTRAL SCIENCE 2019; 5:1824-1833. [PMID: 31807684 PMCID: PMC6891855 DOI: 10.1021/acscentsci.9b00806] [Citation(s) in RCA: 212] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Indexed: 05/06/2023]
Abstract
Despite rapid evolution in the area of microbial natural products chemistry, there is currently no open access database containing all microbially produced natural product structures. Lack of availability of these data is preventing the implementation of new technologies in natural products science. Specifically, development of new computational strategies for compound characterization and identification are being hampered by the lack of a comprehensive database of known compounds against which to compare experimental data. The creation of an open access, community-maintained database of microbial natural product structures would enable the development of new technologies in natural products discovery and improve the interoperability of existing natural products data resources. However, these data are spread unevenly throughout the historical scientific literature, including both journal articles and international patents. These documents have no standard format, are often not digitized as machine readable text, and are not publicly available. Further, none of these documents have associated structure files (e.g., MOL, InChI, or SMILES), instead containing images of structures. This makes extraction and formatting of relevant natural products data a formidable challenge. Using a combination of manual curation and automated data mining approaches we have created a database of microbial natural products (The Natural Products Atlas, www.npatlas.org) that includes 24 594 compounds and contains referenced data for structure, compound names, source organisms, isolation references, total syntheses, and instances of structural reassignment. This database is accompanied by an interactive web portal that permits searching by structure, substructure, and physical properties. The Web site also provides mechanisms for visualizing natural products chemical space and dashboards for displaying author and discovery timeline data. These interactive tools offer a powerful knowledge base for natural products discovery with a central interface for structure and property-based searching and presents new viewpoints on structural diversity in natural products. The Natural Products Atlas has been developed under FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is integrated with other emerging natural product databases, including the Minimum Information About a Biosynthetic Gene Cluster (MIBiG) repository, and the Global Natural Products Social Molecular Networking (GNPS) platform. It is designed as a community-supported resource to provide a central repository for known natural product structures from microorganisms and is the first comprehensive, open access resource of this type. It is expected that the Natural Products Atlas will enable the development of new natural products discovery modalities and accelerate the process of structural characterization for complex natural products libraries.
Collapse
|