1
|
Yang J, Wu Y, Zhang P, Ma J, Yao YJ, Ma YL, Zhang L, Yang Y, Zhao C, Wu J, Fang X, Liu J. Multiple independent losses of the biosynthetic pathway for two tropane alkaloids in the Solanaceae family. Nat Commun 2023; 14:8457. [PMID: 38114555 PMCID: PMC10730914 DOI: 10.1038/s41467-023-44246-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
Hyoscyamine and scopolamine (HS), two valuable tropane alkaloids of significant medicinal importance, are found in multiple distantly related lineages within the Solanaceae family. Here we sequence the genomes of three representative species that produce HS from these lineages, and one species that does not produce HS. Our analysis reveals a shared biosynthetic pathway responsible for HS production in the three HS-producing species. We observe a high level of gene collinearity related to HS synthesis across the family in both types of species. By introducing gain-of-function and loss-of-function mutations at key sites, we confirm the reduced/lost or re-activated functions of critical genes involved in HS synthesis in both types of species, respectively. These findings indicate independent and repeated losses of the HS biosynthesis pathway since its origin in the ancestral lineage. Our results hold promise for potential future applications in the artificial engineering of HS biosynthesis in Solanaceae crops.
Collapse
Affiliation(s)
- Jiao Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Ying Wu
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Pan Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianxiang Ma
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Ying Jun Yao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yan Lin Ma
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Lei Zhang
- Key Laboratory of Ecological Protection of Agro-Pastoral Ecotones in the Yellow River Basin, National Ethnic Affairs Commission of the People's Republic of China, College of Biological Science & Engineering, North Minzu University, Yinchuan, 750021, Ningxia, China
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Changmin Zhao
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jihua Wu
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xiangwen Fang
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, China.
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China.
| |
Collapse
|
2
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
3
|
Sacristán-Horcajada E, González-de la Fuente S, Peiró-Pastor R, Carrasco-Ramiro F, Amils R, Requena JM, Berenguer J, Aguado B. ARAMIS: From systematic errors of NGS long reads to accurate assemblies. Brief Bioinform 2021; 22:bbab170. [PMID: 34013348 PMCID: PMC8574707 DOI: 10.1093/bib/bbab170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/31/2021] [Accepted: 04/11/2021] [Indexed: 01/23/2023] Open
Abstract
NGS long-reads sequencing technologies (or third generation) such as Pacific BioSciences (PacBio) have revolutionized the sequencing field over the last decade improving multiple genomic applications like de novo genome assemblies. However, their error rate, mostly involving insertions and deletions (indels), is currently an important concern that requires special attention to be solved. Multiple algorithms are available to fix these sequencing errors using short reads (such as Illumina), although they require long processing times and some errors may persist. Here, we present Accurate long-Reads Assembly correction Method for Indel errorS (ARAMIS), the first NGS long-reads indels correction pipeline that combines several correction software in just one step using accurate short reads. As a proof OF concept, six organisms were selected based on their different GC content, size and genome complexity, and their PacBio-assembled genomes were corrected thoroughly by this pipeline. We found that the presence of systematic sequencing errors in long-reads PacBio sequences affecting homopolymeric regions, and that the type of indel error introduced during PacBio sequencing are related to the GC content of the organism. The lack of knowledge of this fact leads to the existence of numerous published studies where such errors have been found and should be resolved since they may contain incorrect biological information. ARAMIS yields better results with less computational resources needed than other correction tools and gives the possibility of detecting the nature of the found indel errors found and its distribution along the genome. The source code of ARAMIS is available at https://github.com/genomics-ngsCBMSO/ARAMIS.git.
Collapse
Affiliation(s)
| | | | - R Peiró-Pastor
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - F Carrasco-Ramiro
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - R Amils
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - J M Requena
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - J Berenguer
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - B Aguado
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| |
Collapse
|
4
|
Wang Y, Li X, Wang C, Gao L, Wu Y, Ni X, Sun J, Jiang J. Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms. BMC Genomics 2021; 22:690. [PMID: 34551715 PMCID: PMC8459517 DOI: 10.1186/s12864-021-07971-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/03/2021] [Indexed: 11/10/2022] Open
Abstract
Background Miscanthus sinensis Andersson is a perennial grass that exhibits remarkable lignocellulose characteristics suitable for sustainable bioenergy production. However, knowledge of the genetic resources of this species is relatively limited, which considerably hampers further work on its biology and genetic improvement. Results In this study, through analyzing the transcriptome of mixed samples of leaves and stems using the latest PacBio Iso-Seq sequencing technology combined with Illumina HiSeq, we report the first full-length transcriptome dataset of M. sinensis with a total of 58.21 Gb clean data. An average of 15.75 Gb clean reads of each sample were obtained from the PacBio Iso-Seq system, which doubled the data size (6.68 Gb) obtained from the Illumina HiSeq platform. The integrated analyses of PacBio- and Illumina-based transcriptomic data uncovered 408,801 non-redundant transcripts with an average length of 1,685 bp. Of those, 189,406 transcripts were commonly identified by both methods, 169,149 transcripts with an average length of 619 bp were uniquely identified by Illumina HiSeq, and 51,246 transcripts with an average length of 2,535 bp were uniquely identified by PacBio Iso-Seq. Approximately 96 % of the final combined transcripts were mapped back to the Miscanthus genome, reflecting the high quality and coverage of our sequencing results. When comparing our data with genomes of four species of Andropogoneae, M. sinensis showed the closest relationship with sugarcane with up to 93 % mapping ratios, followed by sorghum with up to 80 % mapping ratios, indicating a high conservation of orthologs in these three genomes. Furthermore, 306,228 transcripts were successfully annotated against public databases including cell wall related genes and transcript factor families, thus providing many new insights into gene functions. The PacBio Iso-Seq data also helped identify 3,898 alternative splicing events and 2,963 annotated AS isoforms within 10 function categories. Conclusions Taken together, the present study provides a rich data set of full-length transcripts that greatly enriches our understanding of M. sinensis transcriptomic resources, thus facilitating further genetic improvement and molecular studies of the Miscanthus species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07971-x.
Collapse
Affiliation(s)
- Yongli Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xia Li
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Congsheng Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Lu Gao
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Yanfang Wu
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xingnan Ni
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Jianzhong Sun
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| | - Jianxiong Jiang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| |
Collapse
|
5
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
6
|
Gonzalez LM, Sevilla E, Fernández-García M, Sanchez-Flores A, Montero E. Integration of Genomic and Transcriptomic Data to Elucidate Molecular Processes in Babesia divergens. Methods Mol Biol 2021; 2369:199-215. [PMID: 34313991 DOI: 10.1007/978-1-0716-1681-9_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Emerging pathogens have developed ingenious life cycles to facilitate their growth and survival in the host organism. Detailed knowledge of the life cycle of these pathogens is increasingly necessary if we are to design new strategies to prevent infection and transmission. Multi-omics platforms provide useful data at different biological levels, and integration of these data into current approaches can facilitate holistic assessment of emerging pathogens. In this chapter, we bring together various methods and apply an integrative approach for analysis of genomic and transcriptomic data in Babesia divergens, an Apicomplexa emerging parasite that invades red blood cells and causes redwater fever in cattle and the most severe form of babesiosis in humans in Europe. The integrative methodology described herein can be helpful to identify genes active at specific points during life cycle of Apicomplexa parasites.
Collapse
Affiliation(s)
- Luis Miguel Gonzalez
- Laboratorio de Referencia e Investigación en Parasitología, Centro Nacional de Microbiología, ISCIII Majadahonda, Madrid, Spain
| | - Elena Sevilla
- Laboratorio de Referencia e Investigación en Parasitología, Centro Nacional de Microbiología, ISCIII Majadahonda, Madrid, Spain
| | - Miguel Fernández-García
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| | - Alejandro Sanchez-Flores
- Unidad Universitaria de Secuenciación Masiva y Bioinformática, Instituto de Biotecnología, Cuernavaca, Mexico.
| | - Estrella Montero
- Laboratorio de Referencia e Investigación en Parasitología, Centro Nacional de Microbiología, ISCIII Majadahonda, Madrid, Spain.
| |
Collapse
|
7
|
Li A, Liu A, Du X, Chen JY, Yin M, Hu HY, Shrestha N, Wu SD, Wang HQ, Dou QW, Liu ZP, Liu JQ, Yang YZ, Ren GP. A chromosome-scale genome assembly of a diploid alfalfa, the progenitor of autotetraploid alfalfa. HORTICULTURE RESEARCH 2020; 7:194. [PMID: 33328470 PMCID: PMC7705661 DOI: 10.1038/s41438-020-00417-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/28/2020] [Accepted: 09/04/2020] [Indexed: 05/07/2023]
Abstract
Alfalfa (Medicago sativa L.) is one of the most important and widely cultivated forage crops. It is commonly used as a vegetable and medicinal herb because of its excellent nutritional quality and significant economic value. Based on Illumina, Nanopore and Hi-C data, we assembled a chromosome-scale assembly of Medicago sativa spp. caerulea (voucher PI464715), the direct diploid progenitor of autotetraploid alfalfa. The assembled genome comprises 793.2 Mb of genomic sequence and 47,202 annotated protein-coding genes. The contig N50 length is 3.86 Mb. This genome is almost twofold larger and contains more annotated protein-coding genes than that of its close relative, Medicago truncatula (420 Mb and 44,623 genes). The more expanded gene families compared with those in M. truncatula and the expansion of repetitive elements rather than whole-genome duplication (i.e., the two species share the ancestral Papilionoideae whole-genome duplication event) may have contributed to the large genome size of M. sativa spp. caerulea. Comparative and evolutionary analyses revealed that M. sativa spp. caerulea diverged from M. truncatula ~5.2 million years ago, and the chromosomal fissions and fusions detected between the two genomes occurred during the divergence of the two species. In addition, we identified 489 resistance (R) genes and 82 and 85 candidate genes involved in the lignin and cellulose biosynthesis pathways, respectively. The near-complete and accurate diploid alfalfa reference genome obtained herein serves as an important complement to the recently assembled autotetraploid alfalfa genome and will provide valuable genomic resources for investigating the genomic architecture of autotetraploid alfalfa as well as for improving breeding strategies in alfalfa.
Collapse
Affiliation(s)
- Ao Li
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Ai Liu
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Xin Du
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Jin-Yuan Chen
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Mou Yin
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Hong-Yin Hu
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Nawal Shrestha
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Sheng-Dan Wu
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
| | - Hai-Qing Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Quan-Wen Dou
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Zhi-Peng Liu
- State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Jian-Quan Liu
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China
- Key Laboratory of Bio-Resources and Eco-Environment of the Ministry of Education & State Key Lab of Hydraulics & Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yong-Zhi Yang
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China.
| | - Guang-Peng Ren
- State Key Laboratory of Grassland Agro-Ecosystems, Institute of Innovation Ecology & School of Life Sciences, Lanzhou University, Lanzhou, China.
| |
Collapse
|
8
|
Zhao P, Xin G, Yan F, Wang H, Ren X, Woeste K, Liu W. The de novo genome assembly of Tapiscia sinensis and the transcriptomic and developmental bases of androdioecy. HORTICULTURE RESEARCH 2020; 7:191. [PMID: 33328438 PMCID: PMC7705024 DOI: 10.1038/s41438-020-00414-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 07/14/2020] [Accepted: 08/11/2020] [Indexed: 05/05/2023]
Abstract
Tapiscia sinensis (Tapisciaceae) possesses an unusual androdioecious breeding system that has attracted considerable interest from evolutionary biologists. Key aspects of T. sinensis biology, including its biogeography, genomics, and sex-linked genes, are unknown. Here, we report the first de novo assembly of the genome of T. sinensis. The genome size was 410 Mb, with 22,251 predicted genes. Based on whole-genome resequencing of 55 trees from 10 locations, an analysis of population genetic structure indicated that T. sinensis has fragmented into five lineages, with low intrapopulation genetic diversity and little gene flow among populations. By comparing whole-genome scans of male versus hermaphroditic pools, we identified 303 candidate sex-linked genes, 79 of which (25.9%) were located on scaffold 25. A 24-kb region was absent in hermaphroditic individuals, and five genes in that region, TsF-box4, TsF-box10, TsF-box13, TsSUT1, and TsSUT4, showed expression differences between mature male and hermaphroditic flowers. The results of this study shed light on the breeding system evolution and conservation genetics of the Tapisciaceae.
Collapse
Affiliation(s)
- Peng Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China
| | - Guiliang Xin
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China
| | - Feng Yan
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China
| | - Huan Wang
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China
| | - Xiaolong Ren
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China
| | - Keith Woeste
- USDA Forest Service Hardwood Tree Improvement and Regeneration Center (HTIRC), Department of Forestry and Natural Resources, Purdue University, 715 West State Street, West Lafayette, IN, 47907, USA
| | - Wenzhe Liu
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, Shaanxi, 710069, China.
| |
Collapse
|
9
|
Bharti R, Grimm DG. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 2019; 22:178-193. [PMID: 31848574 PMCID: PMC7820839 DOI: 10.1093/bib/bbz155] [Citation(s) in RCA: 209] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/23/2019] [Accepted: 11/06/2019] [Indexed: 12/15/2022] Open
Abstract
Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).
Collapse
Affiliation(s)
- Richa Bharti
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| | - Dominik G Grimm
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| |
Collapse
|
10
|
Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations. Curr Microbiol 2019; 77:79-84. [PMID: 31722044 DOI: 10.1007/s00284-019-01808-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/02/2019] [Indexed: 10/25/2022]
Abstract
The generation of genomic data from microorganisms has revolutionized our abilities to understand their biology, but it is still challenging to obtain complete genome sequences of microbes in an automated high-throughput and cost-effective manner. While the advent of second-generation sequencing technologies provided significantly higher throughput, their shorter lengths and more pronounced sequence-context bias led to a shift towards resequencing applications. Recently, single molecule real-time (SMRT) DNA sequencing has been used to generate sequencing reads that are much longer than other sequencing platforms, facilitating de novo genome assembly and genome finishing. Here we introduced a novel multiplex strategy to make full use of the capacity and characteristics of SMRT sequencing in microbe genome assembly. We used error-free simulations to evaluate the practicability of assembling SMRT genomic sequencing data from multiple microbes into finished genomes once at a time. Then we compared the influence of two key factors, including sequencing coverage and read length, on multiplex assembling. Our results showed that long-read genomic sequencing inherently provided the ability to assemble genomic sequencing data from multiple microbes into finished genomes due to its long length. This approach might be helpful for the various groups of microbial genome projects or metagenomics research.
Collapse
|
11
|
Filosa JN, Berry CT, Ruthel G, Beverley SM, Warren WC, Tomlinson C, Myler PJ, Dudkin EA, Povelones ML, Povelones M. Dramatic changes in gene expression in different forms of Crithidia fasciculata reveal potential mechanisms for insect-specific adhesion in kinetoplastid parasites. PLoS Negl Trop Dis 2019; 13:e0007570. [PMID: 31356610 PMCID: PMC6687205 DOI: 10.1371/journal.pntd.0007570] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 08/08/2019] [Accepted: 06/22/2019] [Indexed: 01/08/2023] Open
Abstract
Kinetoplastids are a group of parasites that includes several medically-important species. These human-infective species are transmitted by insect vectors in which the parasites undergo specific developmental transformations. For each species, this includes a stage in which parasites adhere to insect tissue via a hemidesmosome-like structure. Although this structure has been described morphologically, it has never been molecularly characterized. We are using Crithidia fasciculata, an insect parasite that produces large numbers of adherent parasites inside its mosquito host, as a model kinetoplastid to investigate both the mechanism of adherence and the signals required for differentiation to an adherent form. An advantage of C. fasciculata is that adherent parasites can be generated both in vitro, allowing a direct comparison to cultured swimming forms, as well as in vivo within the mosquito. Using RNAseq, we identify genes associated with adherence in C. fasciculata. As almost all of these genes have orthologs in other kinetoplastid species, our findings may reveal shared mechanisms of adherence, allowing investigation of a crucial step in parasite development and disease transmission. In addition, dual-RNAseq allowed us to explore the interaction between the parasites and the mosquito. Although the infection is well-tolerated, anti-microbial peptides and other components of the mosquito innate immune system are upregulated. Our findings indicate that C. fasciculata is a powerful model system for probing kinetoplastid-insect interactions. Kinetoplastids are single-celled parasites that cause devastating human diseases worldwide. Although this group includes many species that infect a variety of hosts, they have a great deal of shared biology. One relatively unexplored aspect of the kinetoplastid life cycle is their ability to adhere to insect tissue. For pathogenic species, adherence is critical for transmission by insect vectors. We have used an insect parasite called Crithidia fasciculata as a model kinetoplastid to reveal shared mechanisms of insect adherence. We have compared gene expression profiles of motile, non-adherent C. fasciculata to those of C. fasciculata adhered to non-living substrates and those attached to the hindgut of mosquitoes. Through this analysis, we have identified a large number of candidate proteins that may mediate adhesion in these and related parasites. In addition, our findings suggest that the mosquito immune system is responding to the presence of parasites in the gut. These results establish a new, robust system to explore the interaction between kinetoplastids and their insect hosts.
Collapse
Affiliation(s)
- John N. Filosa
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, Philadelphia, Pennsylvania, United States of America
| | - Corbett T. Berry
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, Philadelphia, Pennsylvania, United States of America
| | - Gordon Ruthel
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, Philadelphia, Pennsylvania, United States of America
| | - Stephen M. Beverley
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Wesley C. Warren
- University of Missouri, Bond Life Sciences Center, Columbia, Missouri, United States of America
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Peter J. Myler
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle, Washington, United States of America
- Department of Global Health, University of Washington, Seattle, Washington, United States of America
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Elizabeth A. Dudkin
- Department of Biology, Penn State Brandywine, Media, Pennsylvania, United States of America
| | - Megan L. Povelones
- Department of Biology, Penn State Brandywine, Media, Pennsylvania, United States of America
- * E-mail: (MLP); (MP)
| | - Michael Povelones
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MLP); (MP)
| |
Collapse
|
12
|
Boldogkői Z, Moldován N, Balázs Z, Snyder M, Tombácz D. Long-Read Sequencing – A Powerful Tool in Viral Transcriptome Research. Trends Microbiol 2019; 27:578-592. [DOI: 10.1016/j.tim.2019.01.010] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/21/2019] [Accepted: 01/30/2019] [Indexed: 12/16/2022]
|
13
|
Efficiency of PacBio long read correction by 2nd generation Illumina sequencing. Genomics 2019; 111:43-49. [DOI: 10.1016/j.ygeno.2017.12.011] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 12/11/2017] [Accepted: 12/17/2017] [Indexed: 12/17/2022]
|
14
|
Grau JH, Hackl T, Koepfli KP, Hofreiter M. Improving draft genome contiguity with reference-derived in silico mate-pair libraries. Gigascience 2018; 7:4980916. [PMID: 29688527 PMCID: PMC5967465 DOI: 10.1093/gigascience/giy029] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 03/20/2018] [Indexed: 11/29/2022] Open
Abstract
Background Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. Findings In order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. Conclusions We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Collapse
Affiliation(s)
- José Horacio Grau
- Museum für Naturkunde Berlin, Leibniz-Institut für Evolutions- und Biodiversitätsforschung an der Humboldt-Universität zu Berlin. Invalidenstraße 43, 10115. Berlin, Germany
| | - Thomas Hackl
- Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, 15 Vassar Street, Cambridge, MA, 02139, USA
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, 3001 Connecticut Avenue NW, Washington, D.C. 20008, USA.,Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, Sredniy Prospekt 41A, St. Petersburg, 199004, Russia
| | - Michael Hofreiter
- Faculty of Mathematics and Life Sciences, Institute of Biochemistry and Biology, Unit of General Zoology-Evolutionary Adaptive Genomics, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany
| |
Collapse
|
15
|
Sohn JI, Nam JW. The present and future of de novo whole-genome assembly. Brief Bioinform 2018; 19:23-40. [PMID: 27742661 DOI: 10.1093/bib/bbw096] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 12/15/2022] Open
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Collapse
|
16
|
Tombácz D, Balázs Z, Csabai Z, Snyder M, Boldogkői Z. Long-Read Sequencing Revealed an Extensive Transcript Complexity in Herpesviruses. Front Genet 2018; 9:259. [PMID: 30065753 PMCID: PMC6056645 DOI: 10.3389/fgene.2018.00259] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/27/2018] [Indexed: 12/28/2022] Open
Abstract
Long-read sequencing (LRS) techniques are very recent advancements, but they have already been used for transcriptome research in all of the three subfamilies of herpesviruses. These techniques have multiplied the number of known transcripts in each of the examined viruses. Meanwhile, they have revealed a so far hidden complexity of the herpesvirus transcriptome with the discovery of a large number of novel RNA molecules, including coding and non-coding RNAs, as well as transcript isoforms, and polycistronic RNAs. Additionally, LRS techniques have uncovered an intricate meshwork of transcriptional overlaps between adjacent and distally located genes. Here, we review the contribution of LRS to herpesvirus transcriptomics and present the complexity revealed by this technology, while also discussing the functional significance of this phenomenon.
Collapse
Affiliation(s)
- Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Zsolt Balázs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, United States
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| |
Collapse
|
17
|
Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018; 16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.
Collapse
|
18
|
A Whole Genome Assembly of the Horn Fly, Haematobia irritans, and Prediction of Genes with Roles in Metabolism and Sex Determination. G3-GENES GENOMES GENETICS 2018; 8:1675-1686. [PMID: 29602812 PMCID: PMC5940159 DOI: 10.1534/g3.118.200154] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Haematobia irritans, commonly known as the horn fly, is a globally distributed blood-feeding pest of cattle that is responsible for significant economic losses to cattle producers. Chemical insecticides are the primary means for controlling this pest but problems with insecticide resistance have become common in the horn fly. To provide a foundation for identification of genomic loci for insecticide resistance and for discovery of new control technology, we report the sequencing, assembly, and annotation of the horn fly genome. The assembled genome is 1.14 Gb, comprising 76,616 scaffolds with N50 scaffold length of 23 Kb. Using RNA-Seq data, we have predicted 34,413 gene models of which 19,185 have been assigned functional annotations. Comparative genomics analysis with the Dipteran flies Musca domestica L., Drosophila melanogaster, and Lucilia cuprina, show that the horn fly is most closely related to M. domestica, sharing 8,748 orthologous clusters followed by D. melanogaster and L. cuprina, sharing 7,582 and 7,490 orthologous clusters respectively. We also identified a gene locus for the sodium channel protein in which mutations have been previously reported that confers target site resistance to the most common class of pesticides used in fly control. Additionally, we identified 276 genomic loci encoding members of metabolic enzyme gene families such as cytochrome P450s, esterases and glutathione S-transferases, and several genes orthologous to sex determination pathway genes in other Dipteran species.
Collapse
|
19
|
Utturkar SM, Klingeman DM, Hurt RA, Brown SD. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. Front Microbiol 2017; 8:1272. [PMID: 28769883 PMCID: PMC5513972 DOI: 10.3389/fmicb.2017.01272] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open
Abstract
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Collapse
Affiliation(s)
- Sagar M Utturkar
- Graduate School of Genome Science and Technology, University of TennesseeKnoxville, TN, United States
| | - Dawn M Klingeman
- Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States.,BioEnergy Science CenterOak Ridge, TN, United States
| | - Richard A Hurt
- Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States
| | - Steven D Brown
- Graduate School of Genome Science and Technology, University of TennesseeKnoxville, TN, United States.,Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States.,BioEnergy Science CenterOak Ridge, TN, United States
| |
Collapse
|
20
|
Margos G, Hepner S, Mang C, Marosevic D, Reynolds SE, Krebs S, Sing A, Derdakova M, Reiter MA, Fingerle V. Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi. BMC Genomics 2017; 18:422. [PMID: 28558786 PMCID: PMC5450258 DOI: 10.1186/s12864-017-3804-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 05/17/2017] [Indexed: 11/21/2022] Open
Abstract
Background Borrelia (B.) burgdorferi sensu lato, including the tick-transmitted agents of human Lyme borreliosis, have particularly complex genomes, consisting of a linear main chromosome and numerous linear and circular plasmids. The number and structure of plasmids is variable even in strains within a single genospecies. Genes on these plasmids are known to play essential roles in virulence and pathogenicity as well as host and vector associations. For this reason, it is essential to explore methods for rapid and reliable characterisation of molecular level changes on plasmids. In this study we used three strains: a low passage isolate of B. burgdorferi sensu stricto strain B31(−NRZ) and two closely related strains (PAli and PAbe) that were isolated from human patients. Sequences of these strains were compared to the previously sequenced reference strain B31 (available in GenBank) to obtain proof-of-principle information on the suitability of next generation sequencing (NGS) library construction and sequencing methods on the assembly of bacterial plasmids. We tested the effectiveness of different short read assemblers on Illumina sequences, and of long read generation methods on sequence data from Pacific Bioscience single-molecule real-time (SMRT) and nanopore (Oxford Nanopore Technologies) sequencing technology. Results Inclusion of mate pair library reads improved the assembly in some plasmids as did prior enrichment of plasmids. While cp32 plasmids remained refractory to assembly using only short reads they were effectively assembled by long read sequencing methods. The long read SMRT and nanopore sequences came, however, at the cost of indels (insertions or deletions) appearing in an unpredictable manner. Using long and short read technologies together allowed us to show that the three B. burgdorferi s.s. strains investigated here, whilst having similar plasmid structures to each other (apart from fusion of cp32 plasmids), differed significantly from the reference strain B31-GB, especially in the case of cp32 plasmids. Conclusion Short read methods are sufficient to assemble the main chromosome and many of the plasmids in B. burgdorferi. However, a combination of short and long read sequencing methods is essential for proper assembly of all plasmids including cp32 and thus, for gaining an understanding of host- or vector adaptations. An important conclusion from our work is that the evolution of Borrelia plasmids appears to be dynamic. This has important implications for the development of useful research strategies to monitor the risk of Lyme disease occurrence and how to medically manage it. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3804-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- G Margos
- German National Reference Centre for Borrelia (NRZ), Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany.
| | - S Hepner
- German National Reference Centre for Borrelia (NRZ), Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany
| | - C Mang
- German National Reference Centre for Borrelia (NRZ), Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany
| | - D Marosevic
- Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany.,European Programme for Public Health Microbiology Training, European Centre of Disease Prevention and Control (ECDC), Stockholm, Sweden
| | - S E Reynolds
- Department of Biology and Biochemistry, University of Bath, Claverton Down, BA2 7AY, Bath, UK
| | - S Krebs
- Gene Centre, Laboratory for Functional Genome Analysis, LMU Munich, Feodor-Lynen-Strasse 25, 81377, Munich, Germany
| | - A Sing
- German National Reference Centre for Borrelia (NRZ), Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany
| | - M Derdakova
- Institute of Zoology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - M A Reiter
- Institut für Hygiene und Angewandte Immunologie, Medizinische Universität Wien, Kinderspitalgasse 15, A-1090, Wien, Austria
| | - V Fingerle
- German National Reference Centre for Borrelia (NRZ), Bavarian Health and Food Safety Authority (LGL), Veterinärstrasse 2, 85764, Oberschleissheim, Germany
| |
Collapse
|
21
|
|
22
|
Morán Losada P, Tümmler B. SNP synteny analysis of Staphylococcus aureus and Pseudomonas aeruginosa population genomics. FEMS Microbiol Lett 2016; 363:fnw229. [PMID: 27702754 DOI: 10.1093/femsle/fnw229] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/31/2016] [Accepted: 09/29/2016] [Indexed: 01/04/2023] Open
Abstract
Genomic sequence diversity of a bacterial species mainly results from the frequency distribution of single nucleotide polymorphisms (SNPs). Here we report on an SNP matrix-based binary algorithm to determine the intraclonal or interclonal genomic diversity by the number of shared sequential SNPs, the so-called SNP synteny or haplotype. All SNP positions and the frequency and length distribution of haplotypes are determined from pairwise alignment of completely sequenced genomes. This metric is invariant regarding the reference genome chosen. Information is obtained about the size of haplotypes, genomic gradients of recombination frequency, relatedness of strains and population composition of a taxon or clonal populations. The approach is illustrated with whole genome data sets of Staphylococcus aureus and Pseudomonas aeruginosa strains.
Collapse
Affiliation(s)
- Patricia Morán Losada
- Clinical Research Group, 'Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics', OE 6710, Hannover Medical School, Hannover D-30625, Germany
| | - Burkhard Tümmler
- Clinical Research Group, 'Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics', OE 6710, Hannover Medical School, Hannover D-30625, Germany.,Biomedical Research in Endstage and Obstructive Lung Disease, German Center for Lung Research, Hannover D-30625, Germany
| |
Collapse
|
23
|
Whole-Genome Sequence of Multidrug-Resistant Pseudomonas aeruginosa Strain BAMCPA07-48, Isolated from a Combat Injury Wound. GENOME ANNOUNCEMENTS 2016; 4:4/4/e00547-16. [PMID: 27389262 PMCID: PMC4939779 DOI: 10.1128/genomea.00547-16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We report here the complete genome sequence of Pseudomonas aeruginosa strain BAMCPA07-48, isolated from a combat injury wound. The closed genome sequence of this isolate is a valuable resource for pathogenome characterization of P. aeruginosa associated with wounds, which will aid in the development of a higher-resolution phylogenomic framework for molecular-guided pathogen-surveillance.
Collapse
|
24
|
Deschamps S, Mudge J, Cameron C, Ramaraj T, Anand A, Fengler K, Hayes K, Llaca V, Jones TJ, May G. Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens. Sci Rep 2016; 6:28625. [PMID: 27350167 PMCID: PMC4923883 DOI: 10.1038/srep28625] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 06/06/2016] [Indexed: 01/23/2023] Open
Abstract
The MinION is a portable single-molecule DNA sequencing instrument that was released by Oxford Nanopore Technologies in 2014, producing long sequencing reads by measuring changes in ionic flow when single-stranded DNA molecules translocate through the pores. While MinION long reads have an error rate substantially higher than the ones produced by short-read sequencing technologies, they can generate de novo assemblies of microbial genomes, after an initial correction step that includes alignment of Illumina sequencing data or detection of overlaps between Oxford Nanopore reads to improve accuracy. In this study, MinION reads were generated from the multi-chromosome genome of Agrobacterium tumefaciens strain LBA4404. Errors in the consensus two-directional (sense and antisense) “2D” sequences were first characterized by way of comparison with an internal reference assembly. Both Illumina-based correction and self-correction were performed and the resulting corrected reads assembled into high-quality hybrid and non-hybrid assemblies. Corrected read datasets and assemblies were subsequently compared. The results shown here indicate that both hybrid and non-hybrid methods can be used to assemble Oxford Nanopore reads into informative multi-chromosome assemblies, each with slightly different outcomes in terms of contiguity and accuracy.
Collapse
Affiliation(s)
| | - Joann Mudge
- National Center for Genome Resources, Santa Fe, NM, USA
| | | | | | | | | | | | | | | | | |
Collapse
|