1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Bhandari M, Poelstra JW, Kauffman M, Varghese B, Helmy YA, Scaria J, Rajashekara G. Genomic Diversity, Antimicrobial Resistance, Plasmidome, and Virulence Profiles of Salmonella Isolated from Small Specialty Crop Farms Revealed by Whole-Genome Sequencing. Antibiotics (Basel) 2023; 12:1637. [PMID: 37998839 PMCID: PMC10668983 DOI: 10.3390/antibiotics12111637] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023] Open
Abstract
Salmonella is the leading cause of death associated with foodborne illnesses in the USA. Difficulty in treating human salmonellosis is attributed to the development of antimicrobial resistance and the pathogenicity of Salmonella strains. Therefore, it is important to study the genetic landscape of Salmonella, such as the diversity, plasmids, and presence antimicrobial resistance genes (AMRs) and virulence genes. To this end, we isolated Salmonella from environmental samples from small specialty crop farms (SSCFs) in Northeast Ohio from 2016 to 2021; 80 Salmonella isolates from 29 Salmonella-positive samples were subjected to whole-genome sequencing (WGS). In silico serotyping revealed the presence of 15 serotypes. AMR genes were detected in 15% of the samples, with 75% exhibiting phenotypic and genotypic multidrug resistance (MDR). Plasmid analysis demonstrated the presence of nine different types of plasmids, and 75% of AMR genes were located on plasmids. Interestingly, five Salmonella Newport isolates and one Salmonella Dublin isolate carried the ACSSuT gene cassette on a plasmid, which confers resistance to ampicillin, chloramphenicol, streptomycin, sulfonamide, and tetracycline. Overall, our results show that SSCFs are a potential reservoir of Salmonella with MDR genes. Thus, regular monitoring is needed to prevent the transmission of MDR Salmonella from SSCFs to humans.
Collapse
Affiliation(s)
- Menuka Bhandari
- Center for Food Animal Health, Department of Animal Sciences, College of Food, Agricultural, and Environmental Sciences, The Ohio State University, Wooster, OH 44691, USA; (M.B.); (M.K.)
| | - Jelmer W. Poelstra
- Molecular and Cellular Imaging Center, College of Food, Agricultural, and Environmental Sciences, The Ohio State University, Wooster, OH 44691, USA;
| | - Michael Kauffman
- Center for Food Animal Health, Department of Animal Sciences, College of Food, Agricultural, and Environmental Sciences, The Ohio State University, Wooster, OH 44691, USA; (M.B.); (M.K.)
| | - Binta Varghese
- Department of Veterinary Pathobiology, Oklahoma State University, Stillwater, OK 74074, USA; (B.V.); (J.S.)
| | - Yosra A. Helmy
- Department of Veterinary Science, Martin-Gatton College of Agriculture, Food and Environment, University of Kentucky, Lexington, KY 40546, USA;
| | - Joy Scaria
- Department of Veterinary Pathobiology, Oklahoma State University, Stillwater, OK 74074, USA; (B.V.); (J.S.)
| | - Gireesh Rajashekara
- Center for Food Animal Health, Department of Animal Sciences, College of Food, Agricultural, and Environmental Sciences, The Ohio State University, Wooster, OH 44691, USA; (M.B.); (M.K.)
| |
Collapse
|
3
|
Van Etten J, Stephens TG, Bhattacharya D. A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data. Syst Biol 2023; 72:1101-1118. [PMID: 37314057 DOI: 10.1093/sysbio/syad037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 03/20/2023] [Accepted: 06/12/2023] [Indexed: 06/15/2023] Open
Abstract
In the age of genome sequencing, whole-genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole-genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly fragmented and incomplete. Here, we compare the results of one alignment-free approach (which utilizes the D2 statistic) to traditional multi-gene maximum likelihood trees in 3 algal groups that have high-quality genome data available. In addition, we simulate lower-quality, fragmented genome data using these algae to test method robustness to genome quality and completeness. Finally, we apply the alignment-free approach to environmental metagenome assembled genome data of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell amplified data from uncultured marine stramenopiles to demonstrate its utility with real datasets. We find that in all instances, the alignment-free method produces phylogenies that are comparable, and often more informative, than those created using the traditional multi-gene approach. The k-mer-based method performs well even when there are significant missing data that include marker genes traditionally used for tree reconstruction. Our results demonstrate the value of alignment-free approaches for classifying novel, often cryptic or rare, species, that may not be culturable or are difficult to access using single-cell methods, but fill important gaps in the tree of life.
Collapse
Affiliation(s)
- Julia Van Etten
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, 14 College Farm Road, New Brunswick, NJ 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| |
Collapse
|
4
|
Fontana F, Alessandri G, Tarracchini C, Bianchi MG, Rizzo SM, Mancabelli L, Lugli GA, Argentini C, Vergna LM, Anzalone R, Longhi G, Viappiani A, Taurino G, Chiu M, Turroni F, Bussolati O, van Sinderen D, Milani C, Ventura M. Designation of optimal reference strains representing the infant gut bifidobacterial species through a comprehensive multi-omics approach. Environ Microbiol 2022; 24:5825-5839. [PMID: 36123315 PMCID: PMC10092070 DOI: 10.1111/1462-2920.16205] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/10/2022] [Indexed: 01/12/2023]
Abstract
The genomic era has resulted in the generation of a massive amount of genetic data concerning the genomic diversity of bacterial taxa. As a result, the microbiological community is increasingly looking for ways to define reference bacterial strains to perform experiments that are representative of the entire bacterial species. Despite this, there is currently no established approach allowing a reliable identification of reference strains based on a comprehensive genomic, ecological, and functional context. In the current study, we developed a comprehensive multi-omics approach that will allow the identification of the optimal reference strains using the Bifidobacterium genus as test case. Strain tracking analysis based on 1664 shotgun metagenomics datasets of healthy infant faecal samples were employed to identify bifidobacterial strains suitable for in silico and in vitro analyses. Subsequently, an ad hoc bioinformatic tool was developed to screen local strain collections for the most suitable species-representative strain alternative. The here presented approach was validated using in vitro trials followed by metagenomics and metatranscriptomics analyses. Altogether, these results demonstrated the validity of the proposed model for reference strain selection, thus allowing improved in silico and in vitro investigations both in terms of cross-laboratory reproducibility and relevance of research findings.
Collapse
Affiliation(s)
- Federico Fontana
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.,GenProbio srl, Parma, Italy
| | - Giulia Alessandri
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Chiara Tarracchini
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | | | - Sonia Mirjam Rizzo
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Leonardo Mancabelli
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Gabriele Andrea Lugli
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Chiara Argentini
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Laura Maria Vergna
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | | | - Giulia Longhi
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.,GenProbio srl, Parma, Italy
| | | | - Giuseppe Taurino
- Laboratory of General Pathology, Department of Medicine and Surgery, University of Parma, Parma, Italy.,Microbiome Research Hub, University of Parma, Parma, Italy
| | - Martina Chiu
- Laboratory of General Pathology, Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Francesca Turroni
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.,Microbiome Research Hub, University of Parma, Parma, Italy
| | - Ovidio Bussolati
- Laboratory of General Pathology, Department of Medicine and Surgery, University of Parma, Parma, Italy.,Microbiome Research Hub, University of Parma, Parma, Italy
| | - Douwe van Sinderen
- APC Microbiome Institute and School of Microbiology, Bioscience Institute, National University of Ireland, Cork, Ireland
| | - Christian Milani
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.,Microbiome Research Hub, University of Parma, Parma, Italy
| | - Marco Ventura
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.,Microbiome Research Hub, University of Parma, Parma, Italy
| |
Collapse
|
5
|
Ma Z, Lu YY, Wang Y, Lin R, Yang Z, Zhang F, Wang Y. Metric learning for comparing genomic data with triplet network. Brief Bioinform 2022; 23:6679451. [DOI: 10.1093/bib/bbac345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/20/2022] [Accepted: 07/26/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Many biological applications are essentially pairwise comparison problems, such as evolutionary relationships on genomic sequences, contigs binning on metagenomic data, cell type identification on gene expression profiles of single-cells, etc. To make pair-wise comparison, it is necessary to adopt suitable dissimilarity metric. However, not all the metrics can be fully adapted to all possible biological applications. It is necessary to employ metric learning based on data adaptive to the application of interest. Therefore, in this study, we proposed MEtric Learning with Triplet network (MELT), which learns a nonlinear mapping from original space to the embedding space in order to keep similar data closer and dissimilar data far apart. MELT is a weakly supervised and data-driven comparison framework that offers more adaptive and accurate dissimilarity learned in the absence of the label information when the supervised methods are not applicable. We applied MELT in three typical applications of genomic data comparison, including hierarchical genomic sequences, longitudinal microbiome samples and longitudinal single-cell gene expression profiles, which have no distinctive grouping information. In the experiments, MELT demonstrated its empirical utility in comparison to many widely used dissimilarity metrics. And MELT is expected to accommodate a more extensive set of applications in large-scale genomic comparisons. MELT is available at https://github.com/Ying-Lab/MELT.
Collapse
Affiliation(s)
- Zhi Ma
- Department of Automation, Xiamen University , China
- National Institute for Data Science in Health and Medicine, Xiamen University
| | - Yang Young Lu
- Cheriton School of Computer Science, University of Waterloo , Waterloo, Ontario , Canada
| | - Yiwen Wang
- Department of Automation, Xiamen University , China
| | - Renhao Lin
- Department of Automation, Xiamen University , China
| | - Zizi Yang
- Department of Automation, Xiamen University , China
| | - Fang Zhang
- Cheriton School of Computer Science, University of Waterloo , Waterloo, Ontario , Canada
| | - Ying Wang
- Department of Automation, Xiamen University , China
- National Institute for Data Science in Health and Medicine, Xiamen University
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision , Xiamen, Fujian 361005 , China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms , Xiamen, 361100 , China
| |
Collapse
|
6
|
Lo R, Dougan KE, Chen Y, Shah S, Bhattacharya D, Chan CX. Alignment-Free Analysis of Whole-Genome Sequences From Symbiodiniaceae Reveals Different Phylogenetic Signals in Distinct Regions. FRONTIERS IN PLANT SCIENCE 2022; 13:815714. [PMID: 35557718 PMCID: PMC9087856 DOI: 10.3389/fpls.2022.815714] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/04/2022] [Indexed: 05/24/2023]
Abstract
Dinoflagellates of the family Symbiodiniaceae are predominantly essential symbionts of corals and other marine organisms. Recent research reveals extensive genome sequence divergence among Symbiodiniaceae taxa and high phylogenetic diversity hidden behind subtly different cell morphologies. Using an alignment-free phylogenetic approach based on sub-sequences of fixed length k (i.e. k-mers), we assessed the phylogenetic signal among whole-genome sequences from 16 Symbiodiniaceae taxa (including the genera of Symbiodinium, Breviolum, Cladocopium, Durusdinium and Fugacium) and two strains of Polarella glacialis as outgroup. Based on phylogenetic trees inferred from k-mers in distinct genomic regions (i.e. repeat-masked genome sequences, protein-coding sequences, introns and repeats) and in protein sequences, the phylogenetic signal associated with protein-coding DNA and the encoded amino acids is largely consistent with the Symbiodiniaceae phylogeny based on established markers, such as large subunit rRNA. The other genome sequences (introns and repeats) exhibit distinct phylogenetic signals, supporting the expected differential evolutionary pressure acting on these regions. Our analysis of conserved core k-mers revealed the prevalence of conserved k-mers (>95% core 23-mers among all 18 genomes) in annotated repeats and non-genic regions of the genomes. We observed 180 distinct repeat types that are significantly enriched in genomes of the symbiotic versus free-living Symbiodinium taxa, suggesting an enhanced activity of transposable elements linked to the symbiotic lifestyle. We provide evidence that representation of alignment-free phylogenies as dynamic networks enhances the ability to generate new hypotheses about genome evolution in Symbiodiniaceae. These results demonstrate the potential of alignment-free phylogenetic methods as a scalable approach for inferring comprehensive, unbiased whole-genome phylogenies of dinoflagellates and more broadly of microbial eukaryotes.
Collapse
Affiliation(s)
- Rosalyn Lo
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Katherine E. Dougan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Yibi Chen
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Sarah Shah
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Cheong Xin Chan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
7
|
Dougan KE, González-Pech RA, Stephens TG, Shah S, Chen Y, Ragan MA, Bhattacharya D, Chan CX. Genome-powered classification of microbial eukaryotes: focus on coral algal symbionts. Trends Microbiol 2022; 30:831-840. [DOI: 10.1016/j.tim.2022.02.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 01/20/2022] [Accepted: 02/01/2022] [Indexed: 12/20/2022]
|
8
|
Tang D, Li Y, Tan D, Fu J, Tang Y, Lin J, Zhao R, Du H, Zhao Z. KCOSS: an ultra-fast k-mer counter for assembled genome analysis. Bioinformatics 2022; 38:933-940. [PMID: 34849595 DOI: 10.1093/bioinformatics/btab797] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 10/13/2021] [Accepted: 11/19/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The k-mer frequency in whole genome sequences provides researchers with an insightful perspective on genomic complexity, comparative genomics, metagenomics and phylogeny. The current k-mer counting tools are typically slow, and they require large memory and hard disk for assembled genome analysis. RESULTS We propose a novel and ultra-fast k-mer counting algorithm, KCOSS, to fulfill k-mer counting mainly for assembled genomes with segmented Bloom filter, lock-free queue, lock-free thread pool and cuckoo hash table. We optimize running time and memory consumption by recycling memory blocks, merging multiple consecutive first-occurrence k-mers into C-read, and writing a set of C-reads to disk asynchronously. KCOSS was comparatively tested with Jellyfish2, CHTKC and KMC3 on seven assembled genomes and three sequencing datasets in running time, memory consumption, and hard disk occupation. The experimental results show that KCOSS counts k-mer with less memory and disk while having a shorter running time on assembled genomes. KCOSS can be used to calculate the k-mer frequency not only for assembled genomes but also for sequencing data. AVAILABILITYAND IMPLEMENTATION The KCOSS software is implemented in C++. It is freely available on GitHub: https://github.com/kcoss-2021/KCOSS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deyou Tang
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yucheng Li
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Daqiang Tan
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Juan Fu
- School of Medicine, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Yelei Tang
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Jiabin Lin
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Rong Zhao
- School of Software Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| |
Collapse
|
9
|
Kauffman KM, Chang WK, Brown JM, Hussain FA, Yang J, Polz MF, Kelly L. Resolving the structure of phage-bacteria interactions in the context of natural diversity. Nat Commun 2022; 13:372. [PMID: 35042853 PMCID: PMC8766483 DOI: 10.1038/s41467-021-27583-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/12/2021] [Indexed: 12/12/2022] Open
Abstract
Microbial communities are shaped by viral predators. Yet, resolving which viruses (phages) and bacteria are interacting is a major challenge in the context of natural levels of microbial diversity. Thus, fundamental features of how phage-bacteria interactions are structured and evolve in the wild remain poorly resolved. Here we use large-scale isolation of environmental marine Vibrio bacteria and their phages to obtain estimates of strain-level phage predator loads, and use all-by-all host range assays to discover how phage and host genomic diversity shape interactions. We show that lytic interactions in environmental interaction networks (as observed in agar overlay) are sparse-with phage predator loads being low for most bacterial strains, and phages being host-strain-specific. Paradoxically, we also find that although overlap in killing is generally rare between tailed phages, recombination is common. Together, these results suggest that recombination during cryptic co-infections is an important mode of phage evolution in microbial communities. In the development of phages for bioengineering and therapeutics it is important to consider that nucleic acids of introduced phages may spread into local phage populations through recombination, and that the likelihood of transfer is not predictable based on lytic host range.
Collapse
Affiliation(s)
- Kathryn M Kauffman
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Oral Biology, The University at Buffalo, Buffalo, NY, 14214, USA
| | - William K Chang
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Julia M Brown
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, 04544, USA
| | - Fatima A Hussain
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Joy Yang
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Martin F Polz
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria.
| | - Libusha Kelly
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
| |
Collapse
|
10
|
Cummings TFM, Gori K, Sanchez-Pulido L, Gavriilidis G, Moi D, Wilson AR, Murchison E, Dessimoz C, Ponting CP, Christophorou MA. Citrullination Was Introduced into Animals by Horizontal Gene Transfer from Cyanobacteria. Mol Biol Evol 2021; 39:6420225. [PMID: 34730808 PMCID: PMC8826395 DOI: 10.1093/molbev/msab317] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Protein posttranslational modifications add great sophistication to biological systems. Citrullination, a key regulatory mechanism in human physiology and pathophysiology, is enigmatic from an evolutionary perspective. Although the citrullinating enzymes peptidylarginine deiminases (PADIs) are ubiquitous across vertebrates, they are absent from yeast, worms, and flies. Based on this distribution PADIs were proposed to have been horizontally transferred, but this has been contested. Here, we map the evolutionary trajectory of PADIs into the animal lineage. We present strong phylogenetic support for a clade encompassing animal and cyanobacterial PADIs that excludes fungal and other bacterial homologs. The animal and cyanobacterial PADI proteins share functionally relevant primary and tertiary synapomorphic sequences that are distinct from a second PADI type present in fungi and actinobacteria. Molecular clock calculations and sequence divergence analyses using the fossil record estimate the last common ancestor of the cyanobacterial and animal PADIs to be less than 1 billion years old. Additionally, under an assumption of vertical descent, PADI sequence change during this evolutionary time frame is anachronistically low, even when compared with products of likely endosymbiont gene transfer, mitochondrial proteins, and some of the most highly conserved sequences in life. The consilience of evidence indicates that PADIs were introduced from cyanobacteria into animals by horizontal gene transfer (HGT). The ancestral cyanobacterial PADI is enzymatically active and can citrullinate eukaryotic proteins, suggesting that the PADI HGT event introduced a new catalytic capability into the regulatory repertoire of animals. This study reveals the unusual evolution of a pleiotropic protein modification.
Collapse
Affiliation(s)
- Thomas F M Cummings
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom,Corresponding authors: E-mails: ;
| | - Kevin Gori
- Transmissible Cancer Group, Department of Veterinary Medicine, Cambridge, United Kingdom
| | - Luis Sanchez-Pulido
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Gavriil Gavriilidis
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Abigail R Wilson
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Elizabeth Murchison
- Transmissible Cancer Group, Department of Veterinary Medicine, Cambridge, United Kingdom
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland,Department of Genetics Evolution and Environment, University College London, London, United Kingdom,Department of Computer Science, University College London, London, United Kingdom
| | - Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Maria A Christophorou
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom,Epigenetics Department, The Babraham Institute, Cambridge, United Kingdom,Corresponding authors: E-mails: ;
| |
Collapse
|
11
|
Allen JP, Snitkin E, Pincus NB, Hauser AR. Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning. Trends Microbiol 2021; 29:621-633. [PMID: 33455849 PMCID: PMC8187264 DOI: 10.1016/j.tim.2020.12.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 12/15/2022]
Abstract
The advent of inexpensive and rapid sequencing technologies has allowed bacterial whole-genome sequences to be generated at an unprecedented pace. This wealth of information has revealed an unanticipated degree of strain-to-strain genetic diversity within many bacterial species. Awareness of this genetic heterogeneity has corresponded with a greater appreciation of intraspecies variation in virulence. A number of comparative genomic strategies have been developed to link these genotypic and pathogenic differences with the aim of discovering novel virulence factors. Here, we review recent advances in comparative genomic approaches to identify bacterial virulence determinants, with a focus on genome-wide association studies and machine learning.
Collapse
Affiliation(s)
- Jonathan P Allen
- Department of Microbiology and Immunology, Loyola University Chicago Stritch School of Medicine, Maywood, IL 60153, USA.
| | - Evan Snitkin
- Department of Microbiology and Immunology, Department of Internal Medicine/Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nathan B Pincus
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Alan R Hauser
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Department of Medicine/Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
12
|
Tay AP, Hosking B, Hosking C, Bauer DC, Wilson LO. INSIDER: alignment-free detection of foreign DNA sequences. Comput Struct Biotechnol J 2021; 19:3810-3816. [PMID: 34285780 PMCID: PMC8273350 DOI: 10.1016/j.csbj.2021.06.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/21/2022] Open
Abstract
External DNA sequences can be inserted into an organism's genome either through natural processes such as gene transfer, or through targeted genome engineering strategies. Being able to robustly identify such foreign DNA is a crucial capability for health and biosecurity applications, such as anti-microbial resistance (AMR) detection or monitoring gene drives. This capability does not exist for poorly characterised host genomes or with limited information about the integrated sequence. To address this, we developed the INserted Sequence Information DEtectoR (INSIDER). INSIDER analyses whole genome sequencing data and identifies segments of potentially foreign origin by their significant shift in k-mer signatures. We demonstrate the power of INSIDER to separate integrated DNA sequences from normal genomic sequences on a synthetic dataset simulating the insertion of a CRISPR-Cas gene drive into wild-type yeast. As a proof-of-concept, we use INSIDER to detect the exact AMR plasmid in whole genome sequencing data from a Citrobacter freundii patient isolate. INSIDER streamlines the process of identifying integrated DNA in poorly characterised wild species or when the insert is of unknown origin, thus enhancing the monitoring of emerging biosecurity threats.
Collapse
Affiliation(s)
- Aidan P. Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, New South Wales, Sydney, Australia
| | - Brendan Hosking
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia
| | - Cameron Hosking
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia
| | - Denis C. Bauer
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia
- Department of Biomedical Sciences, Macquarie University, New South Wales, Sydney, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, New South Wales, Sydney, Australia
| | - Laurence O.W. Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, New South Wales, Sydney, Australia
| |
Collapse
|
13
|
Lu YY, Bai J, Wang Y, Wang Y, Sun F. CRAFT: Compact genome Representation toward large-scale Alignment-Free daTabase. Bioinformatics 2021; 37:155-161. [PMID: 32766810 DOI: 10.1093/bioinformatics/btaa699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/11/2020] [Accepted: 07/28/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption. RESULTS We report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102-104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures. AVAILABILITY AND IMPLEMENTATION CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/CRAFT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Young Lu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jiaxing Bai
- Department of Automation, Xiamen University, Xiamen 361000, China
| | - Yiwen Wang
- Department of Automation, Xiamen University, Xiamen 361000, China
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen 361000, China.,Xiamen Key Lab. of Big Data Intelligent Analysis and Decision, Xiamen 361000, China
| | - Fengzhu Sun
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
14
|
Jacobus AP, Stephens TG, Youssef P, González-Pech R, Ciccotosto-Camp MM, Dougan KE, Chen Y, Basso LC, Frazzon J, Chan CX, Gross J. Comparative Genomics Supports That Brazilian Bioethanol Saccharomyces cerevisiae Comprise a Unified Group of Domesticated Strains Related to Cachaça Spirit Yeasts. Front Microbiol 2021; 12:644089. [PMID: 33936002 PMCID: PMC8082247 DOI: 10.3389/fmicb.2021.644089] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 03/08/2021] [Indexed: 01/05/2023] Open
Abstract
Ethanol production from sugarcane is a key renewable fuel industry in Brazil. Major drivers of this alcoholic fermentation are Saccharomyces cerevisiae strains that originally were contaminants to the system and yet prevail in the industrial process. Here we present newly sequenced genomes (using Illumina short-read and PacBio long-read data) of two monosporic isolates (H3 and H4) of the S. cerevisiae PE-2, a predominant bioethanol strain in Brazil. The assembled genomes of H3 and H4, together with 42 draft genomes of sugarcane-fermenting (fuel ethanol plus cachaça) strains, were compared against those of the reference S288C and diverse S. cerevisiae. All genomes of bioethanol yeasts have amplified SNO2(3)/SNZ2(3) gene clusters for vitamin B1/B6 biosynthesis, and display ubiquitous presence of a particular family of SAM-dependent methyl transferases, rare in S. cerevisiae. Widespread amplifications of quinone oxidoreductases YCR102C/YLR460C/YNL134C, and the structural or punctual variations among aquaporins and components of the iron homeostasis system, likely represent adaptations to industrial fermentation. Interesting is the pervasive presence among the bioethanol/cachaça strains of a five-gene cluster (Region B) that is a known phylogenetic signature of European wine yeasts. Combining genomes of H3, H4, and 195 yeast strains, we comprehensively assessed whole-genome phylogeny of these taxa using an alignment-free approach. The 197-genome phylogeny substantiates that bioethanol yeasts are monophyletic and closely related to the cachaça and wine strains. Our results support the hypothesis that biofuel-producing yeasts in Brazil may have been co-opted from a pool of yeasts that were pre-adapted to alcoholic fermentation of sugarcane for the distillation of cachaça spirit, which historically is a much older industry than the large-scale fuel ethanol production.
Collapse
Affiliation(s)
- Ana Paula Jacobus
- Laboratory for Genomics and Experimental Evolution of Yeasts, Institute for Bioenergy Research, São Paulo State University, Rio Claro, Brazil
| | - Timothy G Stephens
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Pierre Youssef
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Raul González-Pech
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Michael M Ciccotosto-Camp
- Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Katherine E Dougan
- Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Yibi Chen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Luiz Carlos Basso
- Biological Science Department, Escola Superior de Agricultura Luiz de Queiroz, University of São Paulo (USP), Piracicaba, Brazil
| | - Jeverson Frazzon
- Institute of Food Science and Technology, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.,Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Jeferson Gross
- Laboratory for Genomics and Experimental Evolution of Yeasts, Institute for Bioenergy Research, São Paulo State University, Rio Claro, Brazil
| |
Collapse
|
15
|
González-Pech RA, Stephens TG, Chen Y, Mohamed AR, Cheng Y, Shah S, Dougan KE, Fortuin MDA, Lagorce R, Burt DW, Bhattacharya D, Ragan MA, Chan CX. Comparison of 15 dinoflagellate genomes reveals extensive sequence and structural divergence in family Symbiodiniaceae and genus Symbiodinium. BMC Biol 2021; 19:73. [PMID: 33849527 PMCID: PMC8045281 DOI: 10.1186/s12915-021-00994-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/25/2021] [Indexed: 02/07/2023] Open
Abstract
Background Dinoflagellates in the family Symbiodiniaceae are important photosynthetic symbionts in cnidarians (such as corals) and other coral reef organisms. Breakdown of the coral-dinoflagellate symbiosis due to environmental stress (i.e. coral bleaching) can lead to coral death and the potential collapse of reef ecosystems. However, evolution of Symbiodiniaceae genomes, and its implications for the coral, is little understood. Genome sequences of Symbiodiniaceae remain scarce due in part to their large genome sizes (1–5 Gbp) and idiosyncratic genome features. Results Here, we present de novo genome assemblies of seven members of the genus Symbiodinium, of which two are free-living, one is an opportunistic symbiont, and the remainder are mutualistic symbionts. Integrating other available data, we compare 15 dinoflagellate genomes revealing high sequence and structural divergence. Divergence among some Symbiodinium isolates is comparable to that among distinct genera of Symbiodiniaceae. We also recovered hundreds of gene families specific to each lineage, many of which encode unknown functions. An in-depth comparison between the genomes of the symbiotic Symbiodinium tridacnidorum (isolated from a coral) and the free-living Symbiodinium natans reveals a greater prevalence of transposable elements, genetic duplication, structural rearrangements, and pseudogenisation in the symbiotic species. Conclusions Our results underscore the potential impact of lifestyle on lineage-specific gene-function innovation, genome divergence, and the diversification of Symbiodinium and Symbiodiniaceae. The divergent features we report, and their putative causes, may also apply to other microbial eukaryotes that have undergone symbiotic phases in their evolutionary history. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-00994-6.
Collapse
Affiliation(s)
- Raúl A González-Pech
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia. .,Present address: Department of Integrative Biology, University of South Florida, Tampa, FL, 33620, USA.
| | - Timothy G Stephens
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Present address: Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Yibi Chen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, 4072, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Amin R Mohamed
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) Agriculture and Food, Queensland Bioscience Precinct, St Lucia, QLD, 4072, Australia.,Present address: Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Yuanyuan Cheng
- UQ Genomics Initiative, The University of Queensland, Brisbane, QLD, 4072, Australia.,Present address: School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Sarah Shah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, 4072, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Katherine E Dougan
- Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, 4072, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Michael D A Fortuin
- Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, 4072, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Rémi Lagorce
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,École Polytechnique Universitaire de l'Université de Nice, Université Nice-Sophia-Antipolis, 06410, Nice, Provence-Alpes-Côte d'Azur, France
| | - David W Burt
- UQ Genomics Initiative, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Mark A Ragan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia. .,Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, 4072, Australia. .,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
16
|
Bize A, Midoux C, Mariadassou M, Schbath S, Forterre P, Da Cunha V. Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history. BMC Genomics 2021; 22:186. [PMID: 33726663 PMCID: PMC7962313 DOI: 10.1186/s12864-021-07471-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/24/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors. RESULTS For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile. CONCLUSION This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.
Collapse
Affiliation(s)
- Ariane Bize
- Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.
| | - Cédric Midoux
- Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.,Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Mahendra Mariadassou
- Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Sophie Schbath
- Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Patrick Forterre
- Institut Pasteur, Unité de Virologie des Archées, Département de Microbiologie, 25 Rue du Docteur Roux, 75015, Paris, France. .,Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Violette Da Cunha
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
17
|
Arredondo-Alonso S, Top J, Corander J, Willems RJL, Schürch AC. Mode and dynamics of vanA-type vancomycin resistance dissemination in Dutch hospitals. Genome Med 2021; 13:9. [PMID: 33472670 PMCID: PMC7816424 DOI: 10.1186/s13073-020-00825-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 12/30/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Enterococcus faecium is a commensal of the gastrointestinal tract of animals and humans but also a causative agent of hospital-acquired infections. Resistance against glycopeptides and to vancomycin has motivated the inclusion of E. faecium in the WHO global priority list. Vancomycin resistance can be conferred by the vanA gene cluster on the transposon Tn1546, which is frequently present in plasmids. The vanA gene cluster can be disseminated clonally but also horizontally either by plasmid dissemination or by Tn1546 transposition between different genomic locations. METHODS We performed a retrospective study of the genomic epidemiology of 309 vancomycin-resistant E. faecium (VRE) isolates across 32 Dutch hospitals (2012-2015). Genomic information regarding clonality and Tn1546 characterization was extracted using hierBAPS sequence clusters (SC) and TETyper, respectively. Plasmids were predicted using gplas in combination with a network approach based on shared k-mer content. Next, we conducted a pairwise comparison between isolates sharing a potential epidemiological link to elucidate whether clonal, plasmid, or Tn1546 spread accounted for vanA-type resistance dissemination. RESULTS On average, we estimated that 59% of VRE cases with a potential epidemiological link were unrelated which was defined as VRE pairs with a distinct Tn1546 variant. Clonal dissemination accounted for 32% cases in which the same SC and Tn1546 variants were identified. Horizontal plasmid dissemination accounted for 7% of VRE cases, in which we observed VRE pairs belonging to a distinct SC but carrying an identical plasmid and Tn1546 variant. In 2% of cases, we observed the same Tn1546 variant in distinct SC and plasmid types which could be explained by mixed and consecutive events of clonal and plasmid dissemination. CONCLUSIONS In related VRE cases, the dissemination of the vanA gene cluster in Dutch hospitals between 2012 and 2015 was dominated by clonal spread. However, we also identified outbreak settings with high frequencies of plasmid dissemination in which the spread of resistance was mainly driven by horizontal gene transfer (HGT). This study demonstrates the feasibility of distinguishing between modes of dissemination with short-read data and provides a novel assessment to estimate the relative contribution of nested genomic elements in the dissemination of vanA-type resistance.
Collapse
Affiliation(s)
- Sergio Arredondo-Alonso
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands.,Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Janetta Top
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway.,Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK.,Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Helsinki, Finland
| | - Rob J L Willems
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Anita C Schürch
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
18
|
Abstract
Inferring phylogenetic relationships among hundreds or thousands of microbial genomes is an increasingly common task. The conventional phylogenetic approach adopts multiple sequence alignment to compare gene-by-gene, concatenated multigene or whole-genome sequences, from which a phylogenetic tree would be inferred. These alignments follow the implicit assumption of full-length contiguity among homologous sequences. However, common events in microbial genome evolution (e.g., structural rearrangements and genetic recombination) violate this assumption. Moreover, aligning hundreds or thousands of sequences is computationally intensive and not scalable to the rate at which genome data are generated. Therefore, alignment-free methods present an attractive alternative strategy. Here we describe a scalable alignment-free strategy to infer phylogenetic relationships using complete genome sequences of bacteria and archaea, based on short, subsequences of length k (k-mers). We describe how this strategy can be extended to infer evolutionary relationships beyond a tree-like structure, to better capture both vertical and lateral signals of microbial evolution.
Collapse
|
19
|
Acman M, van Dorp L, Santini JM, Balloux F. Large-scale network analysis captures biological features of bacterial plasmids. Nat Commun 2020; 11:2452. [PMID: 32415210 PMCID: PMC7229196 DOI: 10.1038/s41467-020-16282-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/23/2020] [Indexed: 11/30/2022] Open
Abstract
Many bacteria can exchange genetic material through horizontal gene transfer (HGT) mediated by plasmids and plasmid-borne transposable elements. Here, we study the population structure and dynamics of over 10,000 bacterial plasmids, by quantifying their genetic similarities and reconstructing a network based on their shared k-mer content. We use a community detection algorithm to assign plasmids into cliques, which correlate with plasmid gene content, bacterial host range, GC content, and existing classifications based on replicon and mobility (MOB) types. Further analysis of plasmid population structure allows us to uncover candidates for yet undescribed replicon genes, and to identify transposable elements as the main drivers of HGT at broad phylogenetic scales. Our work illustrates the potential of network-based analyses of the bacterial 'mobilome' and opens up the prospect of a natural, exhaustive classification framework for bacterial plasmids.
Collapse
Affiliation(s)
- Mislav Acman
- UCL Genetics Institute, University College London, Gower Street, London, WC1E 6BT, UK.
| | - Lucy van Dorp
- UCL Genetics Institute, University College London, Gower Street, London, WC1E 6BT, UK
| | - Joanne M Santini
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Francois Balloux
- UCL Genetics Institute, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
20
|
Li X, Wang H, Tong W, Feng L, Wang L, Rahman SU, Wei G, Tao S. Exploring the evolutionary dynamics of Rhizobium plasmids through bipartite network analysis. Environ Microbiol 2019; 22:934-951. [PMID: 31361937 DOI: 10.1111/1462-2920.14762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/24/2019] [Accepted: 07/25/2019] [Indexed: 10/26/2022]
Abstract
The genus Rhizobium usually has a multipartite genome architecture with a chromosome and several plasmids, making these bacteria a perfect candidate for plasmid biology studies. As there are no universally shared genes among typical plasmids, network analyses can complement traditional phylogenetics in a broad-scale study of plasmid evolution. Here, we present an exhaustive analysis of 216 plasmids from 49 complete genomes of Rhizobium by constructing a bipartite network that consists of two classes of nodes, the plasmids and homologous protein families that connect them. Dissection of the network using a hierarchical clustering strategy reveals extensive variety, with 34 homologous plasmid clusters. Four large clusters including one cluster of symbiotic plasmids and two clusters of chromids carrying some truly essential genes are widely distributed among Rhizobium. In contrast, the other clusters are quite small and rare. Symbiotic clusters and rare accessory clusters are exogenetic and do not appear to have co-evolved with the common accessory clusters; the latter ones have a large coding potential and functional complementarity for different lifestyles in Rhizobium. The bipartite network also provides preliminary evidence of Rhizobium plasmid variation and formation including genetic exchange, plasmid fusion and fission, exogenetic plasmid transfer, host plant selection, and environmental adaptation.
Collapse
Affiliation(s)
- Xiangchen Li
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Hao Wang
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Wenjun Tong
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Li Feng
- College of Enology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Lina Wang
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Siddiq Ur Rahman
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Khyber Pakhtunkhwa, 27200, Pakistan
| | - Gehong Wei
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Biology in Arid Areas, Shaanxi Key Laboratory of Agricultural and Environmental Microbiology, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.,Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, 712100, China
| |
Collapse
|
21
|
Zielezinski A, Girgis HZ, Bernard G, Leimeister CA, Tang K, Dencker T, Lau AK, Röhling S, Choi JJ, Waterman MS, Comin M, Kim SH, Vinga S, Almeida JS, Chan CX, James BT, Sun F, Morgenstern B, Karlowski WM. Benchmarking of alignment-free sequence comparison methods. Genome Biol 2019; 20:144. [PMID: 31345254 PMCID: PMC6659240 DOI: 10.1186/s13059-019-1755-7] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 07/03/2019] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland
| | - Hani Z Girgis
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | | | - Chris-Andre Leimeister
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Kujin Tang
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Anna Katharina Lau
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Sophie Röhling
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Jae Jin Choi
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Michael S Waterman
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas S Almeida
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NIH/NCI), Bethesda, USA
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin T James
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | - Fengzhu Sun
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
22
|
Forsdyke DR. Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic classification, including that of bacteria and viruses, is increasingly recognized. However, its biological basis eludes many 21st century practitioners. A path from the 19th century recognition of the informational basis of heredity to the modern era can be discerned. Crick’s DNA ‘unpairing postulate’ predicted that recombinational pairing of homologous DNAs during meiosis would be mediated by short k-mers in the loops of stem-loop structures extruded from classical duplex helices. The complementary ‘kissing’ duplex loops – like tRNA anticodon–codon k-mer duplexes – would seed a more extensive pairing that would then extend until limited by lack of homology or other factors. Indeed, this became the principle behind alignment-based methods that assessed similarity by degree of DNA–DNA reassociation in vitro. These are now seen as less sensitive than alignment-free methods that are closely consistent, both theoretically and mechanistically, with chromosomal anti-recombination models for the initiation of divergence into new species. The analytical power of k-mer differences supports the theses that evolutionary advance sometimes serves the needs of nucleic acids (genomes) rather than proteins (genes), and that such differences can play a role in early speciation events.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|