1
|
Steenwyk JL, King N. The promise and pitfalls of synteny in phylogenomics. PLoS Biol 2024; 22:e3002632. [PMID: 38768403 PMCID: PMC11105162 DOI: 10.1371/journal.pbio.3002632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
Reconstructing the tree of life remains a central goal in biology. Early methods, which relied on small numbers of morphological or genetic characters, often yielded conflicting evolutionary histories, undermining confidence in the results. Investigations based on phylogenomics, which use hundreds to thousands of loci for phylogenetic inquiry, have provided a clearer picture of life's history, but certain branches remain problematic. To resolve difficult nodes on the tree of life, 2 recent studies tested the utility of synteny, the conserved collinearity of orthologous genetic loci in 2 or more organisms, for phylogenetics. Synteny exhibits compelling phylogenomic potential while also raising new challenges. This Essay identifies and discusses specific opportunities and challenges that bear on the value of synteny data and other rare genomic changes for phylogenomic studies. Synteny-based analyses of highly contiguous genome assemblies mark a new chapter in the phylogenomic era and the quest to reconstruct the tree of life.
Collapse
Affiliation(s)
- Jacob L. Steenwyk
- Howard Hughes Medical Institute, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| | - Nicole King
- Howard Hughes Medical Institute, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| |
Collapse
|
2
|
Legeai F, Romain S, Capblancq T, Doniol-Valcroze P, Joron M, Lemaitre C, Després L. Chromosome-Level Assembly and Annotation of the Pearly Heath Coenonympha arcania Butterfly Genome. Genome Biol Evol 2024; 16:evae055. [PMID: 38491969 PMCID: PMC10980516 DOI: 10.1093/gbe/evae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/07/2024] [Accepted: 03/13/2024] [Indexed: 03/18/2024] Open
Abstract
We present the first chromosome-level genome assembly and annotation of the pearly heath Coenonympha arcania, generated with a PacBio HiFi sequencing approach and complemented with Hi-C data. We additionally compare synteny, gene, and repeat content between C. arcania and other Lepidopteran genomes. This reference genome will enable future population genomics studies with Coenonympha butterflies, a species-rich genus that encompasses some of the most highly endangered butterfly taxa in Europe.
Collapse
Affiliation(s)
- Fabrice Legeai
- Inria, CNRS, IRISA, University of Rennes, 35000 Rennes, France
- IGEPP, INRAE, Institut Agro, University of Rennes, 35653 Le Rheu, France
| | - Sandra Romain
- Inria, CNRS, IRISA, University of Rennes, 35000 Rennes, France
| | - Thibaut Capblancq
- LECA, CNRS, Université Grenoble-Alpes, Université Savoie Mont Blanc, Grenoble, France
| | | | - Mathieu Joron
- CEFE, CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France
| | - Claire Lemaitre
- Inria, CNRS, IRISA, University of Rennes, 35000 Rennes, France
| | - Laurence Després
- LECA, CNRS, Université Grenoble-Alpes, Université Savoie Mont Blanc, Grenoble, France
| |
Collapse
|
3
|
Pibiri GE, Fan J, Patro R. Meta-colored compacted de Bruijn graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550101. [PMID: 37546988 PMCID: PMC10401949 DOI: 10.1101/2023.07.21.550101] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
MOTIVATION The colored compacted de Bruijn graph (c-dBG) has become a fundamental tool used across several areas of genomics and pangenomics. For example, it has been widely adopted by methods that perform read mapping or alignment, abundance estimation, and subsequent downstream analyses. These applications essentially regard the c-dBG as a map from k-mers to the set of references in which they appear. The c-dBG data structure should retrieve this set -- the color of the k-mer -- efficiently for any given k-mer, while using little memory. To aid retrieval, the colors are stored explicitly in the data structure and take considerable space for large reference collections, even when compressed. Reducing the space of the colors is therefore of utmost importance for large-scale sequence indexing. RESULTS We describe the meta-colored compacted de Bruijn graph (Mac-dBG) -- a new colored de Bruijn graph data structure where colors are represented holistically, i.e., taking into account their redundancy across the whole collection being indexed, rather than individually as atomic integer lists. This allows the factorization and compression of common sub-patterns across colors. While optimizing the space of our data structure is NP-hard, we propose a simple heuristic algorithm that yields practically good solutions. Results show that the Mac-dBG data structure improves substantially over the best previous space/time trade-off, by providing remarkably better compression effectiveness for the same (or better) query efficiency. This improved space/time trade-off is robust across different datasets and query workloads. Code availability: A C++17 implementation of the Mac-dBG is publicly available on GitHub at: https://github.com/jermp/fulgor.
Collapse
|
4
|
Fruzangohar M, Moolhuijzen P, Bakaj N, Taylor J. CoreDetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes. Bioinformatics 2023; 39:btad628. [PMID: 37878789 PMCID: PMC10663985 DOI: 10.1093/bioinformatics/btad628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/20/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Whole genome alignment of eukaryote species remains an important method for the determination of sequence and structural variations and can also be used to ascertain the representative non-redundant core-genome sequence of a population. Many whole genome alignment tools were first developed for the more mature analysis of prokaryote species with few current tools containing the functionality to process larger genomes of eukaryotes as well as genomes of more divergent species. In addition, the functionality of these tools becomes computationally prohibitive due to the significant compute resources needed to handle larger genomes. RESULTS In this research, we present CoreDetector, an easy-to-use general-purpose program that can align the core-genome sequences for a range of genome sizes and divergence levels. To illustrate the flexibility of CoreDetector, we conducted alignments of a large set of closely related fungal pathogen and hexaploid wheat cultivar genomes as well as more divergent fly and rodent species genomes. In all cases, compared to existing multiple genome alignment tools, CoreDetector exhibited improved flexibility, efficiency, and competitive accuracy in tested cases. AVAILABILITY AND IMPLEMENTATION CoreDetector was developed in the cross platform, and easily deployable, Java language. A packaged pipeline is readily executable in a bash terminal without any external need for Perl or Python environments. Installation, example data, and usage instructions for CoreDetector are freely available from https://github.com/mfruzan/CoreDetector.
Collapse
Affiliation(s)
- Mario Fruzangohar
- The Biometry Hub, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Paula Moolhuijzen
- Centre for Crop Disease Management, School of Molecular and Life Sciences, Curtin University, Bentley, Western Australia 6102, Australia
| | - Nicolette Bakaj
- The Biometry Hub, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Julian Taylor
- The Biometry Hub, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia 5064, Australia
| |
Collapse
|
5
|
A X, Yang Y, Chen X, Tang C, Zhang F, Dong C, Wang B, Liu P, Dai L. Complete Genome Resource of a Hypervirulent Xanthomonas oryzae pv. oryzae Strain YNCX Isolated from Yunnan Plateau Japonica Rice. PLANT DISEASE 2023; 107:3623-3626. [PMID: 37189043 DOI: 10.1094/pdis-04-23-0674-a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Xanthomonas oryzae pv. oryzae (Xoo), the causal agent of bacterial leaf blight (BLB), is one of the most destructive bacterial pathogens in rice production worldwide. Although several complete genome sequences of Xoo strains have been released in public databases, they are mainly isolated from low-altitude indica rice cultivating areas. Here, a hypervirulent strain, YNCX, isolated from the high-altitude japonica rice-growing region in Yunnan Plateau, was used to extract genomic DNA for PacBio sequencing and Illumina sequencing. After assembly, a high-quality complete genome consisting of a circular chromosome and six plasmids was generated. The genome sequence of YNCX provides a valuable resource for high-altitude races and enables the identification of new virulence TALE effectors, contributing to a better understanding of rice-Xoo interactions.
Collapse
Affiliation(s)
- Xinxiang A
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Yayun Yang
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Xifeng Chen
- College of Life Sciences, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
| | - Cuifeng Tang
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Feifei Zhang
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Chao Dong
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Bin Wang
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| | - Pengcheng Liu
- College of Life Sciences, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
| | - Luyuan Dai
- Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences/Yunnan Provincial Key Lab of Agricultural Biotechnology/Key Laboratory of Southwestern Crop Gene Resources and Germplasm Innovation, Ministry of Agriculture and Rural Affairs/Scientific Observation Station for Rice Germplasm Resources of Yunnan, Kunming 650223, Yunnan, China
| |
Collapse
|
6
|
Liao H, Ji Y, Sun Y. High-resolution strain-level microbiome composition analysis from short reads. MICROBIOME 2023; 11:183. [PMID: 37587527 PMCID: PMC10433603 DOI: 10.1186/s40168-023-01615-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 07/07/2023] [Indexed: 08/18/2023]
Abstract
BACKGROUND Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan . Video Abstract.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yongxin Ji
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
7
|
Mathers TC, Wouters RHM, Mugford ST, Biello R, van Oosterhout C, Hogenhout SA. Hybridisation has shaped a recent radiation of grass-feeding aphids. BMC Biol 2023; 21:157. [PMID: 37443008 PMCID: PMC10347838 DOI: 10.1186/s12915-023-01649-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 06/13/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Aphids are common crop pests. These insects reproduce by facultative parthenogenesis involving several rounds of clonal reproduction interspersed with an occasional sexual cycle. Furthermore, clonal aphids give birth to live young that are already pregnant. These qualities enable rapid population growth and have facilitated the colonisation of crops globally. In several cases, so-called "super clones" have come to dominate agricultural systems. However, the extent to which the sexual stage of the aphid life cycle has shaped global pest populations has remained unclear, as have the origins of successful lineages. Here, we used chromosome-scale genome assemblies to disentangle the evolution of two global pests of cereals-the English (Sitobion avenae) and Indian (Sitobion miscanthi) grain aphids. RESULTS Genome-wide divergence between S. avenae and S. miscanthi is low. Moreover, comparison of haplotype-resolved assemblies revealed that the S. miscanthi isolate used for genome sequencing is likely a hybrid, with one of its diploid genome copies closely related to S. avenae (~ 0.5% divergence) and the other substantially more divergent (> 1%). Population genomics analyses of UK and China grain aphids showed that S. avenae and S. miscanthi are part of a cryptic species complex with many highly differentiated lineages that predate the origins of agriculture. The complex consists of hybrid lineages that display a tangled history of hybridisation and genetic introgression. CONCLUSIONS Our analyses reveal that hybridisation has substantially contributed to grain aphid diversity, and hence, to the evolutionary potential of this important pest species. Furthermore, we propose that aphids are particularly well placed to exploit hybridisation events via the rapid propagation of live-born "frozen hybrids" via asexual reproduction, increasing the likelihood of hybrid lineage formation.
Collapse
Affiliation(s)
- Thomas C Mathers
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK.
- Tree of Life, Welcome Sanger Institute, Hinxton, Cambridge, UK.
| | - Roland H M Wouters
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Sam T Mugford
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Roberto Biello
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | | | - Saskia A Hogenhout
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK.
| |
Collapse
|
8
|
Joudaki A, Meterez A, Mustafa H, Groot Koerkamp R, Kahles A, Rätsch G. Aligning distant sequences to graphs using long seed sketches. Genome Res 2023; 33:1208-1217. [PMID: 37072187 PMCID: PMC10538362 DOI: 10.1101/gr.277659.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/16/2023] [Indexed: 04/20/2023]
Abstract
Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and show that it yields a better time-accuracy trade-off in settings with up to a [Formula: see text] mutation rate. We use sketches of a subset of graph nodes, which are more robust to indels, and store them in a k-nearest neighbor index to avoid the curse of dimensionality. Our approach contrasts with existing methods and highlights the important role that sketching into vector space can play in bioinformatics applications. We show that our method scales to graphs with 1 billion nodes and has quasi-logarithmic query time for queries with an edit distance of [Formula: see text] For such queries, longer sketch-based seeds yield a [Formula: see text] increase in recall compared with exact seeds. Our approach can be incorporated into other aligners, providing a novel direction for sequence-to-graph alignment.
Collapse
Affiliation(s)
- Amir Joudaki
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
| | - Alexandru Meterez
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
| | - Harun Mustafa
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | | | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland;
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- ETH AI Center, 8092 Zurich, Switzerland
| |
Collapse
|
9
|
Javkar K, Rand H, Strain E, Pop M. PRAWNS: compact pan-genomic features for whole-genome population genomics. Bioinformatics 2022; 39:6965020. [PMID: 36579850 PMCID: PMC9825322 DOI: 10.1093/bioinformatics/btac844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 11/09/2022] [Accepted: 12/28/2022] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Scientists seeking to understand the genomic basis of bacterial phenotypes, such as antibiotic resistance, today have access to an unprecedented number of complete and nearly complete genomes. Making sense of these data requires computational tools able to perform multiple-genome comparisons efficiently, yet currently available tools cannot scale beyond several tens of genomes. RESULTS We describe PRAWNS, an efficient and scalable tool for multiple-genome analysis. PRAWNS defines a concise set of genomic features (metablocks), as well as pairwise relationships between them, which can be used as a basis for large-scale genotype-phenotype association studies. We demonstrate the effectiveness of PRAWNS by identifying genomic regions associated with antibiotic resistance in Acinetobacter baumannii. AVAILABILITY AND IMPLEMENTATION PRAWNS is implemented in C++ and Python3, licensed under the GPLv3 license, and freely downloadable from GitHub (https://github.com/KiranJavkar/PRAWNS.git). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kiran Javkar
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA,Joint Institute for Food Safety and Applied Nutrition, University of Maryland, College Park, MD 20740, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, MD 20740, USA
| | - Errol Strain
- Center for Veterinary Medicine, United States Food and Drug Administration, Laurel, MD 20708, USA
| | - Mihai Pop
- To whom correspondence should be addressed.
| |
Collapse
|
10
|
Shaskolskiy B, Kravtsov D, Kandinov I, Dementieva E, Gryadunov D. Genomic Diversity and Chromosomal Rearrangements in Neisseria gonorrhoeae and Neisseria meningitidis. Int J Mol Sci 2022; 23:ijms232415644. [PMID: 36555284 PMCID: PMC9778887 DOI: 10.3390/ijms232415644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/18/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Chromosomal rearrangements in N. gonorrhoeae and N. meningitidis were studied with the determination of mobile elements and their role in rearrangements. The results of whole-genome sequencing and de novo genome assembly for 50 N. gonorrhoeae isolates collected in Russia were compared with 96 genomes of N. gonorrhoeae and 138 genomes of N. meningitidis from the databases. Rearrangement events with the determination of the coordinates of syntenic blocks were analyzed using the SibeliaZ software v.1.2.5, the minimum number of events that allow one genome to pass into another was calculated using the DCJ-indel model using the UniMoG program v.1.0. Population-level analysis revealed a stronger correlation between changes in the gene order and phylogenetic proximity for N. meningitidis in contrast to N. gonorrhoeae. Mobile elements were identified, including Correa elements; Spencer-Smith elements (in N. gonorrhoeae); Neisserial intergenic mosaic elements; IS elements of IS5, IS30, IS110, IS1595 groups; Nf1-Nf3 prophages; NgoФ1-NgoФ9 prophages; and Mu-like prophages Pnm1, Pnm2, MuMenB (in N. meningitidis). More than 44% of the observed rearrangements most likely occurred with the participation of mobile elements, including prophages. No differences were found between the Russian and global N. gonorrhoeae population both in terms of rearrangement events and in the number of transposable elements in genomes.
Collapse
|
11
|
de Oliveira NR, Kremer FS, de Brito RSA, Zamboni R, Dellagostin OA, Jorge S. Pathogenesis and Genomic Analysis of a Virulent Leptospira Interrogans Serovar Copenhageni Isolated from a Dog with Lethal Infection. Trop Med Infect Dis 2022; 7:tropicalmed7110333. [PMID: 36355875 PMCID: PMC9698576 DOI: 10.3390/tropicalmed7110333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Dogs are highly susceptible to leptospirosis and are a public health concern due to their important role as a source of spreading disease, particularly in urban settings. In this study, we present the pathogenesis, serological characterization, and complete genome sequencing of a virulent Brazilian strain (NEG7) of L. interrogans serovar Copenhageni isolated from the urine of a dog that died due to acute leptospirosis. Clinical investigation showed that the dog was presented with icteric mucous membranes, weakness, dehydration, anorexia, and kidney and liver failures. Necropsy followed by histopathological evaluation revealed lesions compatible with liver and kidney leptospirosis. The leptospires recovered from the urine were further characterized by genome analysis, which confirmed that the isolate belonged to L. interrogans serogroup icterohaemorrhagiae serovar Copenhageni. Multiple bioinformatics tools were used to characterize the genomic features, and comparisons with other available Copenhageni strains were performed. Characterization based on absence of an INDEL in the gene lic12008, associated with phylogenetic and ANI (99.99% identity) analyses, confirmed the genetic relatedness of the isolate with L. interrogans serovar Copenhageni. A better understanding of the diversity of the pathogenic Leptospira isolates could help in identifying genotypes responsible for severe infections. Moreover, it can be used to develop control and prevention strategies for Leptospira serovars associated with particular animal reservoirs.
Collapse
Affiliation(s)
- Natasha Rodrigues de Oliveira
- Centro de Desenvolvimento Tecnológico, Núcleo de Biotecnologia, Universidade Federal de Pelotas, Pelotas 96160-000, RS, Brazil
| | - Frederico Schmitt Kremer
- Centro de Desenvolvimento Tecnológico, Núcleo de Biotecnologia, Universidade Federal de Pelotas, Pelotas 96160-000, RS, Brazil
| | | | - Rosimeri Zamboni
- Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas 96160-000, RS, Brazil
| | - Odir Antônio Dellagostin
- Centro de Desenvolvimento Tecnológico, Núcleo de Biotecnologia, Universidade Federal de Pelotas, Pelotas 96160-000, RS, Brazil
- Correspondence:
| | - Sérgio Jorge
- Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas 96160-000, RS, Brazil
| |
Collapse
|
12
|
Guzmán-Moreno J, García-Ortega LF, Torres-Saucedo L, Rivas-Noriega P, Ramírez-Santoyo RM, Sánchez-Calderón L, Quiroz-Serrano IN, Vidales-Rodríguez LE. Bacillus megaterium HgT21: a Promising Metal Multiresistant Plant Growth-Promoting Bacteria for Soil Biorestoration. Microbiol Spectr 2022; 10:e0065622. [PMID: 35980185 PMCID: PMC9604106 DOI: 10.1128/spectrum.00656-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 07/26/2022] [Indexed: 12/30/2022] Open
Abstract
The environmental deterioration produced by heavy metals derived from anthropogenic activities has gradually increased. The worldwide dissemination of toxic metals in crop soils represents a threat for sustainability and biosafety in agriculture and requires strategies for the recovery of metal-polluted crop soils. The biorestoration of metal-polluted soils using technologies that combine plants and microorganisms has gained attention in recent decades due to the beneficial and synergistic effects produced by its biotic interactions. In this context, native and heavy metal-resistant plant growth-promoting bacteria (PGPB) play a crucial role in the development of strategies for sustainable biorestoration of metal-contaminated soils. In this study, we present a genomic analysis and characterization of the rhizospheric bacterium Bacillus megaterium HgT21 isolated from metal-polluted soil from Zacatecas, Mexico. The results reveal that this autochthonous bacterium contains an important set of genes related to a variety of operons associated with mercury, arsenic, copper, cobalt, cadmium, zinc and aluminum resistance. Additionally, halotolerance-, beta-lactam resistance-, phosphate solubilization-, and plant growth-promotion-related genes were identified. The analysis of resistance to metal ions revealed resistance to mercury (HgII+), arsenate [AsO4]³-, cobalt (Co2+), zinc (Zn2+), and copper (Cu2+). Moreover, the ability of the HgT21 strain to produce indole acetic acid (a phytohormone) and promote the growth of Arabidopsis thaliana seedlings in vitro was also demonstrated. The genotype and phenotype of Bacillus megaterium HgT21 reveal its potential to be used as a model of both plant growth-promoting and metal multiresistant bacteria. IMPORTANCE Metal-polluted environments are natural sources of a wide variety of PGPB adapted to cope with toxic metal concentrations. In this work, the bacterial strain Bacillus megaterium HgT21 was isolated from metal-contaminated soil and is proposed as a model for the study of metal multiresistance in spore-forming Gram-positive bacteria due to the presence of a variety of metal resistance-associated genes similar to those encountered in the metal multiresistant Gram-negative Cupriavidus metallidurans CH34. The ability of B. megaterium HgT21 to promote the growth of plants also makes it suitable for the study of plant-bacteria interactions in metal-polluted environments, which is key for the development of techniques for the biorestoration of metal-contaminated soils used for agriculture.
Collapse
Affiliation(s)
- Jesús Guzmán-Moreno
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Luis Fernando García-Ortega
- Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV), Irapuato, Guanajuato, Mexico
| | - Lilia Torres-Saucedo
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Paulina Rivas-Noriega
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Rosa María Ramírez-Santoyo
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Lenin Sánchez-Calderón
- Laboratorio de Genómica Evolutiva, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Iliana Noemi Quiroz-Serrano
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| | - Luz Elena Vidales-Rodríguez
- Laboratorio de Biología de Bacterias y Hongos Filamentosos, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, Mexico
| |
Collapse
|
13
|
Gostinčar C, Sun X, Černoša A, Fang C, Gunde-Cimerman N, Song Z. Clonality, inbreeding, and hybridization in two extremotolerant black yeasts. Gigascience 2022; 11:giac095. [PMID: 36200832 PMCID: PMC9535773 DOI: 10.1093/gigascience/giac095] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 07/29/2022] [Accepted: 09/12/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The great diversity of lifestyles and survival strategies observed in fungi is reflected in the many ways in which they reproduce and recombine. Although a complete absence of recombination is rare, it has been reported for some species, among them 2 extremotolerant black yeasts from Dothideomycetes: Hortaea werneckii and Aureobasidium melanogenum. Therefore, the presence of diploid strains in these species cannot be explained as the product of conventional sexual reproduction. RESULTS Genome sequencing revealed that the ratio of diploid to haploid strains in both H. werneckii and A. melanogenum is about 2:1. Linkage disequilibrium between pairs of polymorphic loci and a high degree of concordance between the phylogenies of different genomic regions confirmed that both species are clonal. Heterozygosity of diploid strains is high, with several hybridizing genome pairs reaching the intergenomic distances typically seen between different fungal species. The origin of diploid strains collected worldwide can be traced to a handful of hybridization events that produced diploids, which were stable over long periods of time and distributed over large geographic areas. CONCLUSIONS Our results, based on the genomes of over 100 strains of 2 black yeasts, show that although they are clonal, they occasionally form stable and highly heterozygous diploid intraspecific hybrids. The mechanism of these apparently rare hybridization events, which are not followed by meiosis or haploidization, remains unknown. Both extremotolerant yeasts, H. werneckii and even more so A. melanogenum, a close relative of the intensely recombining and biotechnologically relevant Aureobasidium pullulans, provide an attractive model for studying the role of clonality and ploidy in extremotolerant fungi.
Collapse
Affiliation(s)
- Cene Gostinčar
- Department of Biology, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
- Lars Bolund Institute of Regenerative Medicine, BGI-Qingdao, Qingdao 266555, China
| | - Xiaohuan Sun
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen 518083, China
| | - Anja Černoša
- Department of Biology, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Chao Fang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen 518083, China
| | - Nina Gunde-Cimerman
- Department of Biology, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Zewei Song
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen 518083, China
| |
Collapse
|
14
|
Khan J, Kokot M, Deorowicz S, Patro R. Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2. Genome Biol 2022; 23:190. [PMID: 36076275 PMCID: PMC9454175 DOI: 10.1186/s13059-022-02743-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.
Collapse
Affiliation(s)
- Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | - Marek Kokot
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Sebastian Deorowicz
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
15
|
Schulz T, Wittler R, Stoye J. Sequence-based pangenomic core detection. iScience 2022; 25:104413. [PMID: 35663029 PMCID: PMC9160775 DOI: 10.1016/j.isci.2022.104413] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/20/2022] [Accepted: 05/09/2022] [Indexed: 11/17/2022] Open
Abstract
One of the most basic kinds of analysis to be performed on a pangenome is the detection of its core, i.e., the information shared among all members. Pangenomic core detection is classically done on the gene level and many tools focus exclusively on core detection in prokaryotes. Here, we present a new method for sequence-based pangenomic core detection. Our model generalizes from a strict core definition allowing us to flexibly determine suitable core properties depending on the research question and the dataset under consideration. We propose an algorithm based on a colored de Bruijn graph that runs in linear time with respect to the number of k-mers in the graph. An implementation of our method is called Corer. Because of the usage of a colored de Bruijn graph, it works alignment-free, is provided with a small memory footprint, and accepts as input assembled genomes as well as sequencing reads. Pangenomic core detection for large collections of prokaryotes or higher eukaryotes Whole-genome analysis with assemblies or even read data as input Alignment-free, linear time algorithm with small memory footprint Variation tolerance and quorum for flexible core detection
Collapse
Affiliation(s)
- Tizian Schulz
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
- Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
- Graduate School “Digital Infrastructure for the Life Sciences” (DILS), Bielefeld University, Bielefeld, Germany
| | - Roland Wittler
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
- Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
- Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
- Corresponding author
| |
Collapse
|
16
|
Annotation-free delineation of prokaryotic homology groups. PLoS Comput Biol 2022; 18:e1010216. [PMID: 35675326 PMCID: PMC9212150 DOI: 10.1371/journal.pcbi.1010216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 06/21/2022] [Accepted: 05/16/2022] [Indexed: 11/19/2022] Open
Abstract
Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa. Assuming genes to be the basic evolutionary unit has been commonplace in bacterial genomics. For example, when quantifying the extent of horizontal gene transfer it is common to infer gene trees and reconcile them against a species tree to account for recombination-based processes. We have developed a new method which challenges this assumption by identifying contiguous regions of true homology without regards to gene boundaries and applied it to Enterobacteriaceae, a family of bacteria containing several important human pathogens. Our results show that genes are composed of distinct homologous regions with conflicting phylogenetic histories. We further demonstrate that failing to take account of this conflict, together with the functional biases we show exist among single-copy marker genes, significantly changes the consensus evolutionary tree of Enterobacteriaceae.
Collapse
|
17
|
Assembly and Comparison of Ca. Neoehrlichia mikurensis Genomes. Microorganisms 2022; 10:microorganisms10061134. [PMID: 35744652 PMCID: PMC9227406 DOI: 10.3390/microorganisms10061134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 05/27/2022] [Accepted: 05/30/2022] [Indexed: 11/17/2022] Open
Abstract
Ca. Neoehrlichia mikurensis is widely prevalent in I. ricinus across Europe and has been associated with human disease. However, diagnostic modalities are limited, and much is still unknown about its biology. Here, we present the first complete Ca. Neoehrlichia mikurensis genomes directly derived from wildlife reservoir host tissues, using both long- and short-read sequencing technologies. This pragmatic approach provides an alternative to obtaining sufficient material from clinical cases, a difficult task for emerging infectious diseases, and to expensive and challenging bacterial isolation and culture methods. Both genomes exhibit a larger chromosome than the currently available Ca. Neoehrlichia mikurensis genomes and expand the ability to find new targets for the development of supportive laboratory diagnostics in the future. Moreover, this method could be utilized for other tick-borne pathogens that are difficult to culture.
Collapse
|
18
|
A Polymorphic Gene within the Mycobacterium smegmatis esx1 Locus Determines Mycobacterial Self-Identity and Conjugal Compatibility. mBio 2022; 13:e0021322. [PMID: 35297678 PMCID: PMC9040860 DOI: 10.1128/mbio.00213-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Mycobacteria mediate horizontal gene transfer (HGT) by a process called distributive conjugal transfer (DCT) that is mechanistically distinct from oriT-mediated plasmid transfer. The transfer of multiple, independent donor chromosome segments generates transconjugants with genomes that are mosaic blends of their parents. Previously, we had characterized contact-dependent conjugation between two independent isolates of Mycobacterium smegmatis. Here, we expand our analyses to include five independent isolates of M. smegmatis and establish that DCT is both active and prevalent among natural isolates of M. smegmatis. Two of these five strains were recipients but exhibited distinct conjugal compatibilities with donor strains, suggesting an ability to distinguish between potential donor partners. We determined that a single gene, Msmeg0070, was responsible for conferring mating compatibility using a combination of comparative DNA sequence analysis, bacterial genome-wide association studies (GWAS), and targeted mutagenesis. Msmeg0070 maps within the esx1 secretion locus, and we establish that it confers mycobacterial self-identity with parallels to kin recognition. Similar to other kin model systems, orthologs of Msmeg0070 are highly polymorphic. The identification of a kin recognition system in M. smegmatis reinforces the concept that communication between cells is an important checkpoint prior to DCT commitment and implies that there are likely to be other, unanticipated forms of social behaviors in mycobacteria. IMPORTANCE Conjugation, unlike other forms of HGT, requires direct interaction between two viable bacteria, which must be capable of distinguishing between mating types to allow successful DNA transfer from donor to recipient. We show that the conjugal compatibility of Mycobacterium smegmatis isolates is determined by a single, polymorphic gene located within the conserved esx1 secretion locus. This gene confers self-identity; the expression of identical Msmeg0070 proteins in both donor-recipient partners prevents DNA transfer. The presence of this polymorphic locus in many environmental mycobacteria suggests that kin identification is important in promoting beneficial gene flow between nonkin mycobacteria. Cell-cell communication, mediated by kin recognition and ESX secretion, is a key checkpoint in mycobacterial conjugation and likely plays a more global role in mycobacterial biology.
Collapse
|
19
|
CovDif, a Tool to Visualize the Conservation between SARS-CoV-2 Genomes and Variants. Viruses 2022; 14:v14030561. [PMID: 35336968 PMCID: PMC8955889 DOI: 10.3390/v14030561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/17/2022] Open
Abstract
The spread of the newly emerged severe acute respiratory syndrome (SARS-CoV-2) virus has led to more than 430 million confirmed cases, including more than 5.9 million deaths, reported worldwide as of 24 February 2022. Conservation of viral genomes is important for pathogen identification and diagnosis, therapeutics development and epidemiological surveillance to detect the emergence of new viral variants. An intense surveillance of virus variants has led to the identification of Variants of Interest and Variants of Concern. Although these classifications dynamically change as the pandemic evolves, they have been useful to guide public health efforts on containment and mitigation. In this work, we present CovDif, a tool to detect conserved regions between groups of viral genomes. CovDif creates a conservation landscape for each group of genomes of interest and a differential landscape able to highlight differences in the conservation level between groups. CovDif is able to identify loss in conservation due to point mutations, deletions, inversions and chromosomal rearrangements. In this work, we applied CovDif to SARS-CoV-2 clades (G, GH, GR, GV, L, O, S and G) and variants. We identified all regions for any defining SNPs. We also applied CovDif to a group of population genomes and evaluated the conservation of primer regions for current SARS-CoV-2 detection and diagnostic protocols. We found that some of these protocols should be applied with caution as few of the primer-template regions are no longer conserved in some SARS-CoV-2 variants. We conclude that CovDif is a tool that could be widely applied to study the conservation of any group of viral genomes as long as whole genomes exist.
Collapse
|
20
|
A cattle graph genome incorporating global breed diversity. Nat Commun 2022; 13:910. [PMID: 35177600 PMCID: PMC8854726 DOI: 10.1038/s41467-022-28605-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 01/20/2022] [Indexed: 11/28/2022] Open
Abstract
Despite only 8% of cattle being found in Europe, European breeds dominate current genetic resources. This adversely impacts cattle research in other important global cattle breeds, especially those from Africa for which genomic resources are particularly limited, despite their disproportionate importance to the continent’s economies. To mitigate this issue, we have generated assemblies of African breeds, which have been integrated with genomic data for 294 diverse cattle into a graph genome that incorporates global cattle diversity. We illustrate how this more representative reference assembly contains an extra 116.1 Mb (4.2%) of sequence absent from the current Hereford sequence and consequently inaccessible to current studies. We further demonstrate how using this graph genome increases read mapping rates, reduces allelic biases and improves the agreement of structural variant calling with independent optical mapping data. Consequently, we present an improved, more representative, reference assembly that will improve global cattle research. Cattle reference genomes are valuable resources but are currently heavily biased towards European breeds. Here the authors integrate assemblies for African breeds into a more representative cattle graph genome capturing global breed diversity.
Collapse
|
21
|
Ishiya K, Kosaka H, Inaoka T, Kimura K, Nakashima N. Comparative Genome Analysis of Three Komagataeibacter Strains Used for Practical Production of Nata-de-Coco. Front Microbiol 2022; 12:798010. [PMID: 35185823 PMCID: PMC8855687 DOI: 10.3389/fmicb.2021.798010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 12/22/2021] [Indexed: 12/04/2022] Open
Abstract
We determined the whole genome sequences of three bacterial strains, designated as FNDCR1, FNDCF1, and FNDCR2, isolated from a practical nata-de-coco producing bacterial culture. Only FNDCR1 and FNDCR2 strains had the ability to produce cellulose. The 16S rDNA sequence and phylogenetic analysis revealed that all strains belonged to the Komagataeibacter genus but belonged to a different clade within the genus. Comparative genomic analysis revealed cross-strain distribution of duplicated sequences in Komagataeibacter genomes. It is particularly interesting that FNDCR1 has many duplicated sequences within the genome independently of the phylogenetic clade, suggesting that these duplications might have been obtained specifically for this strain. Analysis of the cellulose biosynthesis operon of the three determined strain genomes indicated that several cellulose synthesis-related genes, which are present in FNDCR1 and FNDCR2, were lost in the FNDCF1 strain. These findings reveal important genetic insights into practical nata de coco-producing bacteria that can be used in food development. Furthermore, our results also shed light on the variation in their cellulose-producing abilities and illustrate why genetic traits are unstable for Komagataeibacter and Komagataeibacter-related acetic acid bacteria.
Collapse
Affiliation(s)
- Koji Ishiya
- Bioproduction Research Institute, National Institute of Advanced Industrial Sciences and Technology (AIST), Sapporo, Japan
| | - Hideki Kosaka
- Research and Development Department, Fujicco Co., Ltd., Chuo-ku, Japan
| | - Takashi Inaoka
- Institute of Food Research, National Agriculture and Food Research Organization (NFRI/NARO), Tsukuba, Japan
| | - Keitarou Kimura
- Institute of Food Research, National Agriculture and Food Research Organization (NFRI/NARO), Tsukuba, Japan
| | - Nobutaka Nakashima
- Bioproduction Research Institute, National Institute of Advanced Industrial Sciences and Technology (AIST), Sapporo, Japan
- *Correspondence: Nobutaka Nakashima,
| |
Collapse
|
22
|
Zabelkin A, Yakovleva Y, Bochkareva O, Alexeev N. PaReBrick: PArallel REarrangements and BReaks identification toolkit. Bioinformatics 2021; 38:357-363. [PMID: 34601581 PMCID: PMC8723149 DOI: 10.1093/bioinformatics/btab691] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/25/2021] [Accepted: 09/29/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION High plasticity of bacterial genomes is provided by numerous mechanisms including horizontal gene transfer and recombination via numerous flanking repeats. Genome rearrangements such as inversions, deletions, insertions and duplications may independently occur in different strains, providing parallel adaptation or phenotypic diversity. Specifically, such rearrangements might be responsible for virulence, antibiotic resistance and antigenic variation. However, identification of such events requires laborious manual inspection and verification of phyletic pattern consistency. RESULTS Here, we define the term 'parallel rearrangements' as events that occur independently in phylogenetically distant bacterial strains and present a formalization of the problem of parallel rearrangements calling. We implement an algorithmic solution for the identification of parallel rearrangements in bacterial populations as a tool PaReBrick. The tool takes a collection of strains represented as a sequence of oriented synteny blocks and a phylogenetic tree as input data. It identifies rearrangements, tests them for consistency with a tree, and sorts the events by their parallelism score. The tool provides diagrams of the neighbors for each block of interest, allowing the detection of horizontally transferred blocks or their extra copies and the inversions in which copied blocks are involved. We demonstrated PaReBrick's efficiency and accuracy and showed its potential to detect genome rearrangements responsible for pathogenicity and adaptation in bacterial genomes. AVAILABILITY AND IMPLEMENTATION PaReBrick is written in Python and is available on GitHub: https://github.com/ctlab/parallel-rearrangements. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexey Zabelkin
- Computer Technologies Laboratory, ITMO University, St Petersburg 197101, Russia
- Bioinformatics Institute, St Petersburg 194100, Russia
| | - Yulia Yakovleva
- Bioinformatics Institute, St Petersburg 194100, Russia
- Department of Microbiology, Faculty of Biology, Saint Petersburg State University, St Petersburg 199034, Russia
| | | | | |
Collapse
|
23
|
Abstract
Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.
Collapse
Affiliation(s)
- Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
24
|
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.
Collapse
|
25
|
Seferbekova Z, Zabelkin A, Yakovleva Y, Afasizhev R, Dranenko NO, Alexeev N, Gelfand MS, Bochkareva OO. High Rates of Genome Rearrangements and Pathogenicity of Shigella spp. Front Microbiol 2021; 12:628622. [PMID: 33912145 PMCID: PMC8072062 DOI: 10.3389/fmicb.2021.628622] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 03/22/2021] [Indexed: 02/01/2023] Open
Abstract
Shigella are pathogens originating within the Escherichia lineage but frequently classified as a separate genus. Shigella genomes contain numerous insertion sequences (ISs) that lead to pseudogenisation of affected genes and an increase of non-homologous recombination. Here, we study 414 genomes of E. coli and Shigella strains to assess the contribution of genomic rearrangements to Shigella evolution. We found that Shigella experienced exceptionally high rates of intragenomic rearrangements and had a decreased rate of homologous recombination compared to pathogenic and non-pathogenic E. coli. The high rearrangement rate resulted in independent disruption of syntenic regions and parallel rearrangements in different Shigella lineages. Specifically, we identified two types of chromosomally encoded E3 ubiquitin-protein ligases acquired independently by all Shigella strains that also showed a high level of sequence conservation in the promoter and further in the 5′-intergenic region. In the only available enteroinvasive E. coli (EIEC) strain, which is a pathogenic E. coli with a phenotype intermediate between Shigella and non-pathogenic E. coli, we found a rate of genome rearrangements comparable to those in other E. coli and no functional copies of the two Shigella-specific E3 ubiquitin ligases. These data indicate that the accumulation of ISs influenced many aspects of genome evolution and played an important role in the evolution of intracellular pathogens. Our research demonstrates the power of comparative genomics-based on synteny block composition and an important role of non-coding regions in the evolution of genomic islands.
Collapse
Affiliation(s)
- Zaira Seferbekova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.,Institute for Information Transmission Problems (The Kharkevich Institute, RAS), Moscow, Russia
| | - Alexey Zabelkin
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia.,JetBrains Research, Saint Petersburg, Russia.,Bioinformatics Institute, Saint Petersburg, Russia
| | - Yulia Yakovleva
- Bioinformatics Institute, Saint Petersburg, Russia.,Department of Cytology and Histology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Robert Afasizhev
- Institute for Information Transmission Problems (The Kharkevich Institute, RAS), Moscow, Russia
| | - Natalia O Dranenko
- Institute for Information Transmission Problems (The Kharkevich Institute, RAS), Moscow, Russia
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems (The Kharkevich Institute, RAS), Moscow, Russia.,Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Olga O Bochkareva
- Institute for Information Transmission Problems (The Kharkevich Institute, RAS), Moscow, Russia.,Institute of Science and Technology (IST Austria), Klosterneuburg, Austria
| |
Collapse
|
26
|
Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020; 21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.
Collapse
Affiliation(s)
- Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jonas A Sibbesen
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Simon Heumos
- Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
| | - Ali Ghaffaari
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Xian Chang
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Josiah D Seaman
- Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
| | - Robin Rounthwaite
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jana Ebler
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Shilpa Garg
- Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Erik Garrison
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| |
Collapse
|