151
|
Satoh A, Takasu M, Yano K, Terai Y. De novo assembly and annotation of the mangrove cricket genome. BMC Res Notes 2021; 14:387. [PMID: 34627387 PMCID: PMC8502352 DOI: 10.1186/s13104-021-05798-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 09/27/2021] [Indexed: 11/10/2022] Open
Abstract
Objectives The mangrove cricket, Apteronemobius asahinai, shows endogenous activity rhythms that synchronize with the tidal cycle (i.e., a free-running rhythm with a period of ~ 12.4 h [the circatidal rhythm]). Little is known about the molecular mechanisms underlying the circatidal rhythm. We present the draft genome of the mangrove cricket to facilitate future molecular studies of the molecular mechanisms behind this rhythm. Data description The draft genome contains 151,060 scaffolds with a total length of 1.68 Gb (N50: 27 kb) and 92% BUSCO completeness. We obtained 28,831 predicted genes, of which 19,896 (69%) were successfully annotated using at least one of two databases (UniProtKB/SwissProt database and Pfam database).
Collapse
Affiliation(s)
- Aya Satoh
- Department of Evolutionary Studies of Biosystems, SOKENDAI (The Graduate University for Advanced Studies), Shonan Village, Hayama, Kanagawa, 240-0193, Japan. .,School of Agriculture, Meiji University, Kawasaki, Kanagawa, 214-8571, Japan.
| | - Miwako Takasu
- Department of Evolutionary Studies of Biosystems, SOKENDAI (The Graduate University for Advanced Studies), Shonan Village, Hayama, Kanagawa, 240-0193, Japan
| | - Kentaro Yano
- School of Agriculture, Meiji University, Kawasaki, Kanagawa, 214-8571, Japan
| | - Yohey Terai
- Department of Evolutionary Studies of Biosystems, SOKENDAI (The Graduate University for Advanced Studies), Shonan Village, Hayama, Kanagawa, 240-0193, Japan
| |
Collapse
|
152
|
Zhang M, Zhang Y, Han X, Wang J, Yang Y, Ren B, Xia M, Li G, Fang R, He H, Jia Y. Whole genome sequencing of Enterobacter mori, an emerging pathogen of kiwifruit and the potential genetic adaptation to pathogenic lifestyle. AMB Express 2021; 11:129. [PMID: 34533621 PMCID: PMC8448808 DOI: 10.1186/s13568-021-01290-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 09/03/2021] [Indexed: 11/10/2022] Open
Abstract
Members of the Enterobacter genus are gram-negative bacteria, which are used as plant growth-promoting bacteria, and increasingly recovered from economic plants as emerging pathogens. A new Enterobacter mori strain, designated CX01, was isolated as an emerging bacterial pathogen of a recent outbreak of kiwifruit canker-like disease in China. The main symptoms associated with this syndrome are bleeding cankers on the trunk and branch, and brown leaf spots. The genome sequence of E. mori CX01 was determined as a single chromosome of 4,966,908 bp with 4640 predicted open reading frames (ORFs). To better understand the features of the genus and its potential pathogenic mechanisms, five available Enterobacter genomes were compared and a pan-genome of 4870 COGs with 3158 core COGs were revealed. An important feature of the E. mori CX01 genome is that it lacks a type III secretion system often found in pathogenic bacteria, instead it is equipped with type I, II, and VI secretory systems. Besides, the genes encoding putative virulence effectors, two-component systems, nutrient acquisition systems, proteins involved in phytohormone synthesis, which may contribute to the virulence and adaption to the host plant niches are included. The genome sequence of E. mori CX01 has high similarity with that of E. mori LMG 25,706, though the rearrangements occur throughout two genomes. Further pathogenicity assay showed that both strains can either invade kiwifruit or mulberry, indicating they may have similar host range. Comparison with a closely related isolate enabled us to understand its pathogenesis and ecology.
Collapse
|
153
|
Draft Genome Sequence of Vibrio chagasii 18LP, Isolated from Gilthead Seabream (Sparus aurata) Larvae Reared in Aquaculture. Microbiol Resour Announc 2021; 10:e0065821. [PMID: 34528822 PMCID: PMC8444970 DOI: 10.1128/mra.00658-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We report the draft genome sequence of Vibrio chagasii strain 18LP, isolated from gilthead seabream larvae at a fish hatchery research station in Portugal. The genome presents numerous features underlying opportunistic behavior, including genes coding for toxin biosynthesis and tolerance, host cell invasion, and heavy metal resistance.
Collapse
|
154
|
Leinonen M, Salmela L. Extraction of long k-mers using spaced seeds. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:1-1. [PMID: 34529572 DOI: 10.1109/tcbb.2021.3113131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The extraction of k-mers from reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used k-mers are unique in the analyzed DNA, and thus the use of longer k-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of k. Here we propose LoMeX which uses spaced seeds to extract long k-mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long k-mers from current Illumina reads with a similar or higher recall than a standard k-mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases enabling even longer k-mers, the performance of standard k-mer counters declines, whereas LoMeX still extracts long k-mers successfully.
Collapse
|
155
|
De novo Assembly, Annotation, and Analysis of Transcriptome Data of the Ladakh Ground Skink Provide Genetic Information on High-Altitude Adaptation. Genes (Basel) 2021; 12:genes12091423. [PMID: 34573405 PMCID: PMC8466045 DOI: 10.3390/genes12091423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/13/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022] Open
Abstract
The Himalayan Arc is recognized as a global biodiversity hotspot. Among its numerous cryptic and undiscovered organisms, this composite high-mountain ecosystem harbors many taxa with adaptations to life in high elevations. However, evolutionary patterns and genomic features have been relatively rarely studied in Himalayan vertebrates. Here, we provide the first well-annotated transcriptome of a Greater Himalayan reptile species, the Ladakh Ground skink Asymblepharus ladacensis (Squamata: Scincidae). Based on tissues from the brain, an embryonic disc, and pooled organ material, using pair-end Illumina NextSeq 500 RNAseq, we assembled ~77,000 transcripts, which were annotated using seven functional databases. We tested ~1600 genes, known to be under positive selection in anurans and reptiles adapted to high elevations, and potentially detected positive selection for 114 of these genes in Asymblepharus. Even though the strength of these results is limited due to the single-animal approach, our transcriptome resource may be valuable data for further studies on squamate reptile evolution in the Himalayas as a hotspot of biodiversity.
Collapse
|
156
|
You MP, Akhatar J, Mittal M, Barbetti MJ, Maina S, Banga SS. Comparative analysis of draft genome assemblies developed from whole genome sequences of two Hyaloperonospora brassicae isolate samples differing in field virulence on Brassica napus. BIOTECHNOLOGY REPORTS 2021; 31:e00653. [PMID: 34258242 PMCID: PMC8254085 DOI: 10.1016/j.btre.2021.e00653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 05/28/2021] [Accepted: 06/16/2021] [Indexed: 11/25/2022]
Abstract
We report first draft genome assemblies for two isolates Hyaloperonospora brassicae, differing for their virulence. These revealed genome sizes of genome sizes of 72.762 and 76.950Mb and 6,438 and 6,470 scaffolds respectively. In silico annotation allowed understanding of the genome architecture of H. brassicae in terms of genes for pathogenicity and virulence. The observed reduction in virulence or loss of pathogenicity in a larger number of genes in the sample with low virulence in comparison to sample with high virulence may reflect differential rates of mutation and selection during host–parasite co‐evolution. Genomic resources develop will aid in monitoring field prevalence of H. brassicae pathotypes and to detect early any virulence changes within pathogen populations.
Hyaloperonospora brassicae causes downy mildew, a major disease of Brassicaceae species. We sequenced the genomes of two H. brassicae isolates of high (Sample B) and low (Sample C) virulence. Sequencing reads were first assembled de novo with software's SOAPdenovo2, ABySS V2.1 and Velvet V1.1 and later combined to create meta-assemblies with genome sizes of 72.762 and 76.950Mb and predicted gene densities of 1628 and 1644 /Mb, respectively. We could annotate 12.255 and 13,030 genes with high proportions (91-92%) of complete BUSCOs for Sample B and C, respectively. Comparative analysis revealed conserved and varied molecular machinery underlying the physiological specialisation and infection capabilities. BLAST analysis against PHI gene database suggested a relatively higher loss of genes for virulence and pathogenicity in Sample C compared to Sample B, reflecting pathogen evolution through differential rates of mutation and selection. These studies will enable identification and monitoring of H. brassicae virulence factors prevailing in-field.
Collapse
|
157
|
Chong LC, Lim WL, Ban KHK, Khan AM. An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage. BIOLOGY 2021; 10:biology10090853. [PMID: 34571730 PMCID: PMC8466476 DOI: 10.3390/biology10090853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 11/16/2022]
Abstract
The study of viral diversity is imperative in understanding sequence change and its implications for intervention strategies. The widely used alignment-dependent approaches to study viral diversity are limited in their utility as sequence dissimilarity increases, particularly when expanded to the genus or higher ranks of viral species lineage. Herein, we present an alignment-independent algorithm, implemented as a tool, UNIQmin, to determine the effective viral sequence diversity at any rank of the viral taxonomy lineage. This is done by performing an exhaustive search to generate the minimal set of sequences for a given viral non-redundant sequence dataset. The minimal set is comprised of the smallest possible number of unique sequences required to capture the diversity inherent in the complete set of overlapping k-mers encoded by all the unique sequences in the given dataset. Such dataset compression is possible through the removal of unique sequences, whose entire repertoire of overlapping k-mers can be represented by other sequences, thus rendering them redundant to the collective pool of sequence diversity. A significant reduction, namely ~44%, ~45%, and ~53%, was observed for all reported unique sequences of species Dengue virus, genus Flavivirus, and family Flaviviridae, respectively, while still capturing the entire repertoire of nonamer (9-mer) viral peptidome diversity present in the initial input dataset. The algorithm is scalable for big data as it was applied to ~2.2 million non-redundant sequences of all reported viruses. UNIQmin is open source and publicly available on GitHub. The concept of a minimal set is generic and, thus, potentially applicable to other pathogenic microorganisms of non-viral origin, such as bacteria.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur 50490, Malaysia;
| | - Wei Lun Lim
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63100, Malaysia;
| | - Kenneth Hon Kim Ban
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore;
| | - Asif M. Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur 50490, Malaysia;
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, 34820 Istanbul, Turkey
- Correspondence: or
| |
Collapse
|
158
|
Conservative and Atypical Ferritins of Sponges. Int J Mol Sci 2021; 22:ijms22168635. [PMID: 34445356 PMCID: PMC8395497 DOI: 10.3390/ijms22168635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 08/05/2021] [Accepted: 08/07/2021] [Indexed: 12/26/2022] Open
Abstract
Ferritins comprise a conservative family of proteins found in all species and play an essential role in resistance to redox stress, immune response, and cell differentiation. Sponges (Porifera) are the oldest Metazoa that show unique plasticity and regenerative potential. Here, we characterize the ferritins of two cold-water sponges using proteomics, spectral microscopy, and bioinformatic analysis. The recently duplicated conservative HdF1a/b and atypical HdF2 genes were found in the Halisarca dujardini genome. Multiple related transcripts of HpF1 were identified in the Halichondria panicea transcriptome. Expression of HdF1a/b was much higher than that of HdF2 in all annual seasons and regulated differently during the sponge dissociation/reaggregation. The presence of the MRE and HRE motifs in the HdF1 and HdF2 promotor regions and the IRE motif in mRNAs of HdF1 and HpF indicates that sponge ferritins expression depends on the cellular iron and oxygen levels. The gel electrophoresis combined with specific staining and mass spectrometry confirmed the presence of ferric ions and ferritins in multi-subunit complexes. The 3D modeling predicts the iron-binding capacity of HdF1 and HpF1 at the ferroxidase center and the absence of iron-binding in atypical HdF2. Interestingly, atypical ferritins lacking iron-binding capacity were found in genomes of many invertebrate species. Their function deserves further research.
Collapse
|
159
|
Fritz A, Bremges A, Deng ZL, Lesker TR, Götting J, Ganzenmueller T, Sczyrba A, Dilthey A, Klawonn F, McHardy AC. Haploflow: strain-resolved de novo assembly of viral genomes. Genome Biol 2021; 22:212. [PMID: 34281604 PMCID: PMC8287296 DOI: 10.1186/s13059-021-02426-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 06/29/2021] [Indexed: 01/03/2023] Open
Abstract
AbstractWith viral infections, multiple related viral strains are often present due to coinfection or within-host evolution. We describe Haploflow, a deBruijn graph-based assembler for de novo genome assembly of viral strains from mixed sequence samples using a novel flow algorithm. We assess Haploflow across multiple benchmark data sets of increasing complexity, showing that Haploflow is faster and more accurate than viral haplotype assemblers and generic metagenome assemblers not aiming to reconstruct strains. We show Haploflow reconstructs viral strain genomes from patient HCMV samples and SARS-CoV-2 wastewater samples identical to clinical isolates.
Collapse
Affiliation(s)
- Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany
| | - Andreas Bremges
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany
| | - Zhi-Luo Deng
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Robin Lesker
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany
| | - Jasper Götting
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Tina Ganzenmueller
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute for Medical Virology, University Hospital Tuebingen, Tuebingen, Germany
| | - Alexander Sczyrba
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Alexander Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA
| | - Frank Klawonn
- Department of Computer Science, Ostfalia University of Applied Sciences, Wolfenbuettel, Germany
- Biostatistics Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Alice Carolyn McHardy
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
- German Centre for Infection Research (DZIF), Site Hannover-Braunschweig, Braunschweig, Germany.
| |
Collapse
|
160
|
Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. ACTA ACUST UNITED AC 2021; 70:e102. [PMID: 32559359 DOI: 10.1002/cpbi.102] [Citation(s) in RCA: 1444] [Impact Index Per Article: 361.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.
Collapse
Affiliation(s)
- Andrey Prjibelski
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Alla Lapidus
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Cytology and Histology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| |
Collapse
|
161
|
Alanko J, Alipanahi B, Settle J, Boucher C, Gagie T. Buffering updates enables efficient dynamic de Bruijn graphs. Comput Struct Biotechnol J 2021; 19:4067-4078. [PMID: 34377371 PMCID: PMC8326735 DOI: 10.1016/j.csbj.2021.06.047] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/29/2021] [Accepted: 06/29/2021] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. RESULTS With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude.
Collapse
Affiliation(s)
- Jarno Alanko
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
| | - Bahar Alipanahi
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Jonathen Settle
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Travis Gagie
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
162
|
Estimation of Genome Size in the Endemic Species Reseda pentagyna and the Locally Rare Species Reseda lutea Using comparative Analyses of Flow Cytometry and K-Mer Approaches. PLANTS 2021; 10:plants10071362. [PMID: 34371565 PMCID: PMC8309327 DOI: 10.3390/plants10071362] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/01/2021] [Accepted: 07/01/2021] [Indexed: 11/17/2022]
Abstract
Genome size is one of the fundamental cytogenetic features of a species, which is critical for the design and initiation of any genome sequencing projects and can provide essential insights in studying taxonomy, cytogenetics, phylogenesis, and evolutionary studies. However, this key cytogenetic information is almost lacking in the endemic species Reseda pentagyna and the locally rare species Reseda lutea in Saudi Arabia. Therefore, genome size was analyzed by propidium iodide PI flow cytometry and compared to k-mer analysis methods. The standard method for genome size measures (flow cytometry) estimated the genome size of R. lutea and R. pentagyna with nuclei isolation MB01 buffer were found to be 1.91 ± 0.02 and 2.09 ± 0.03 pg/2 °C, respectively, which corresponded approximately to a haploid genome size of 934 and 1.022 Mbp, respectively. For validation, K-mer analysis was performed on both species' Illumina paired-end sequencing data from both species. Five k-mer analysis approaches were examined for biocomputational estimation of genome size: A general formula and four well-known programs (CovEST, Kmergenie, FindGSE, and GenomeScope). The parameter preferences had a significant impact on GenomeScope and Kmergenie estimates. While the general formula estimations did not differ considerably, with an average genome size of 867.7 and 896. Mbp. The differences across flow cytometry and biocomputational predictions may be due to the high repeat content, particularly long repetitive regions in both genomes, 71% and 57%, which interfered with k-mer analysis. GenomeScope allowed quantification of high heterozygosity levels (1.04 and 1.37%) of R. lutea and R. pentagyna genomes, respectively. Based on our observations, R. lutea may have a tetraploid genome or higher. Our results revealed fundamental cytogenetic information for R. lutea and R. pentagyna, which should be used in future taxonomic studies and whole-genome sequencing.
Collapse
|
163
|
Valdebenito-Maturana B, Riadi G. GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation. Interface Focus 2021; 11:20200077. [PMID: 34123359 DOI: 10.1098/rsfs.2020.0077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2021] [Indexed: 01/07/2023] Open
Abstract
The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and 'best k-mer', to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k-mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER.
Collapse
Affiliation(s)
| | - Gonzalo Riadi
- ANID - Millennium Science Initiative Program, Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD); Center for Bioinformatics, Simulation and Modeling (CBSM); Department of Bioinformatics, Faculty of Engineering, University of Talca, Campus Talca, Chile
| |
Collapse
|
164
|
Heikema AP, Jansen R, Hiltemann SD, Hays JP, Stubbs AP. WeFaceNano: a user-friendly pipeline for complete ONT sequence assembly and detection of antibiotic resistance in multi-plasmid bacterial isolates. BMC Microbiol 2021; 21:171. [PMID: 34098864 PMCID: PMC8186029 DOI: 10.1186/s12866-021-02225-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 05/13/2021] [Indexed: 11/10/2022] Open
Abstract
Background Bacterial plasmids often carry antibiotic resistance genes and are a significant factor in the spread of antibiotic resistance. The ability to completely assemble plasmid sequences would facilitate the localization of antibiotic resistance genes, the identification of genes that promote plasmid transmission and the accurate tracking of plasmid mobility. However, the complete assembly of plasmid sequences using the currently most widely used sequencing platform (Illumina-based sequencing) is restricted due to the generation of short sequence lengths. The long-read Oxford Nanopore Technologies (ONT) sequencing platform overcomes this limitation. Still, the assembly of plasmid sequence data remains challenging due to software incompatibility with long-reads and the error rate generated using ONT sequencing. Bioinformatics pipelines have been developed for ONT-generated sequencing but require computational skills that frequently are beyond the abilities of scientific researchers. To overcome this challenge, the authors developed ‘WeFaceNano’, a user-friendly Web interFace for rapid assembly and analysis of plasmid DNA sequences generated using the ONT platform. WeFaceNano includes: a read statistics report; two assemblers (Miniasm and Flye); BLAST searching; the detection of antibiotic resistance- and replicon genes and several plasmid visualizations. A user-friendly interface displays the main features of WeFaceNano and gives access to the analysis tools. Results Publicly available ONT sequence data of 21 plasmids were used to validate WeFaceNano, with plasmid assemblages and anti-microbial resistance gene detection being concordant with the published results. Interestingly, the “Flye” assembler with “meta” settings generated the most complete plasmids. Conclusions WeFaceNano is a user-friendly open-source software pipeline suitable for accurate plasmid assembly and the detection of anti-microbial resistance genes in (clinical) samples where multiple plasmids can be present.
Collapse
Affiliation(s)
- Astrid P Heikema
- Department of Medical Microbiology and Infectious Diseases, Erasmus University Medical Center (Erasmus MC), Rotterdam, the Netherlands.
| | - Rick Jansen
- Department of Pathology, Clinical Bioinformatics Unit, Erasmus University Medical Center (Erasmus MC), Rotterdam, The Netherlands
| | - Saskia D Hiltemann
- Department of Pathology, Clinical Bioinformatics Unit, Erasmus University Medical Center (Erasmus MC), Rotterdam, The Netherlands
| | - John P Hays
- Department of Medical Microbiology and Infectious Diseases, Erasmus University Medical Center (Erasmus MC), Rotterdam, the Netherlands
| | - Andrew P Stubbs
- Department of Pathology, Clinical Bioinformatics Unit, Erasmus University Medical Center (Erasmus MC), Rotterdam, The Netherlands
| |
Collapse
|
165
|
Alickovic L, Johnson KP, Boyd BM. The reduced genome of a heritable symbiont from an ectoparasitic feather feeding louse. BMC Ecol Evol 2021; 21:108. [PMID: 34078265 PMCID: PMC8173840 DOI: 10.1186/s12862-021-01840-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 05/23/2021] [Indexed: 11/10/2022] Open
Abstract
Background Feather feeding lice are abundant and diverse ectoparasites that complete their entire life cycle on an avian host. The principal or sole source of nutrition for these lice is feathers. Feathers appear to lack four amino acids that the lice would require to complete development and reproduce. Several insect groups have acquired heritable and intracellular bacteria that can synthesize metabolites absent in an insect’s diet, allowing insects to feed exclusively on nutrient-poor resources. Multiple species of feather feeding lice have been shown to harbor heritable and intracellular bacteria. We expected that these bacteria augment the louse’s diet with amino acids and facilitated the evolution of these diverse and specialized parasites. Heritable symbionts of insects often have small genomes that contain a minimal set of genes needed to maintain essential cell functions and synthesize metabolites absent in the host insect’s diet. Therefore, we expected the genome of a bacterial endosymbiont in feather lice would be small, but encode pathways for biosynthesis of amino acids. Results We sequenced the genome of a bacterial symbiont from a feather feeding louse (Columbicola wolffhuegeli) that parasitizes the Pied Imperial Pigeon (Ducula bicolor) and used its genome to predict metabolism of amino acids based on the presence or absence of genes. We found that this bacterial symbiont has a small genome, similar to the genomes of heritable symbionts described in other insect groups. However, we failed to identify many of the genes that we expected would support metabolism of amino acids in the symbiont genome. We also evaluated other gene pathways and features of the highly reduced genome of this symbiotic bacterium. Conclusions Based on the data collected in this study, it does not appear that this bacterial symbiont can synthesize amino acids needed to complement the diet of a feather feeding louse. Our results raise additional questions about the biology of feather chewing lice and the roles of symbiotic bacteria in evolution of diverse avian parasites.
Collapse
Affiliation(s)
- Leila Alickovic
- Center for the Study of Biological Complexity, Virginia Commonwealth University, 1000 W. Cary St., Suite 111, Richmond, VA, 23284-2030, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, USA
| | - Bret M Boyd
- Center for the Study of Biological Complexity, Virginia Commonwealth University, 1000 W. Cary St., Suite 111, Richmond, VA, 23284-2030, USA.
| |
Collapse
|
166
|
Liu L, Tumi L, Suni ML, Arakaki M, Wang ZF, Ge XJ. Draft genome of Puya raimondii (Bromeliaceae), the Queen of the Andes. Genomics 2021; 113:2537-2546. [PMID: 34089785 DOI: 10.1016/j.ygeno.2021.05.042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 05/16/2021] [Accepted: 05/31/2021] [Indexed: 01/20/2023]
Abstract
Puya raimondii, the Queen of the Andes, is an endangered high Andean species in the Bromeliaceae family. Here, we report its first genome to promote its conservation and evolutionary study. Comparative genomics showed P. raimondii diverged from Ananas comosus about 14.8 million years ago, and the long terminal repeats were likely to contribute to the genus diversification in last 3.5 million years. The gene families related to plant reproductive development and stress responses significantly expanded in the genome. At the same time, gene families involved in disease defense, photosynthesis and carbohydrate metabolism significantly contracted, which may be an evolutionary strategy to adapt to the harsh conditions in high Andes. The demographic history analysis revealed the P. raimondii population size sharply declined in the Pleistocene and then increased in the Holocene. We also designed and tested 46 pairs of universal primers for amplifying orthologous single-copy nuclear genes in Puya species.
Collapse
Affiliation(s)
- Lu Liu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; University of Chinese Academy of Sciences, Beijing, China
| | - Liscely Tumi
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Mery L Suni
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Monica Arakaki
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Zheng-Feng Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China; South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
| | - Xue-Jun Ge
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China; South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
| |
Collapse
|
167
|
Michell C, Wutke S, Aranda M, Nyman T. Genomes of the willow-galling sawflies Euura lappo and Eupontania aestiva (Hymenoptera: Tenthredinidae): a resource for research on ecological speciation, adaptation, and gall induction. G3 (BETHESDA, MD.) 2021; 11:jkab094. [PMID: 33788947 PMCID: PMC8104934 DOI: 10.1093/g3journal/jkab094] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/09/2021] [Indexed: 12/14/2022]
Abstract
Hymenoptera is a hyperdiverse insect order represented by over 153,000 different species. As many hymenopteran species perform various crucial roles for our environments, such as pollination, herbivory, and parasitism, they are of high economic and ecological importance. There are 99 hymenopteran genomes in the NCBI database, yet only five are representative of the paraphyletic suborder Symphyta (sawflies, woodwasps, and horntails), while the rest represent the suborder Apocrita (bees, wasps, and ants). Here, using a combination of 10X Genomics linked-read sequencing, Oxford Nanopore long-read technology, and Illumina short-read data, we assembled the genomes of two willow-galling sawflies (Hymenoptera: Tenthredinidae: Nematinae: Euurina): the bud-galling species Euura lappo and the leaf-galling species Eupontania aestiva. The final assembly for E. lappo is 259.85 Mbp in size, with a contig N50 of 209.0 kbp and a BUSCO score of 93.5%. The E. aestiva genome is 222.23 Mbp in size, with a contig N50 of 49.7 kbp and a 90.2% complete BUSCO score. De novo annotation of repetitive elements showed that 27.45% of the genome was composed of repetitive elements in E. lappo and 16.89% in E. aestiva, which is a marked increase compared to previously published hymenopteran genomes. The genomes presented here provide a resource for inferring phylogenetic relationships among basal hymenopterans, comparative studies on host-related genomic adaptation in plant-feeding insects, and research on the mechanisms of plant manipulation by gall-inducing insects.
Collapse
Affiliation(s)
- Craig Michell
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80100, Finland
| | - Saskia Wutke
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80100, Finland
| | - Manuel Aranda
- Biological and Environmental Sciences & Engineering Division, Red Sea Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Tommi Nyman
- Department of Ecosystems in the Barents Region, Norwegian Institute of Bioeconomy Research, Svanvik, 9925, Norway
| |
Collapse
|
168
|
Duckett DJ, Sullivan J, Pirro S, Carstens BC. Genomic Resources for the North American Water Vole ( Microtus richardsoni) and the Montane Vole ( Microtus montanus). GIGABYTE 2021; 2021:gigabyte19. [PMID: 36824326 PMCID: PMC9631978 DOI: 10.46471/gigabyte.19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/04/2021] [Indexed: 11/09/2022] Open
Abstract
Voles of the genus Microtus are important research organisms, yet genomic resources are lacking. Such resources would benefit future studies of immunology, phylogeography, cryptic diversity, and more. We sequenced and assembled nuclear genomes from two subspecies of water vole (Microtus richardsoni) and from the montane vole (Microtus montanus). The water vole genomes were sequenced with Illumina and 10× Chromium plus Illumina sequencing, resulting in assemblies with ∼1600,000 and ∼30,000 scaffolds, respectively. The montane vole was also assembled into ∼13,000 scaffolds using Illumina sequencing. Mitochondrial genome assemblies were also performed for both species. Structural and functional annotation for the best water vole nuclear genome resulted in ∼24,500 annotated genes, with 83% of these having functional annotations. Assembly quality statistics for our nuclear assemblies fall within the range of genomes previously published in the genus Microtus, making the water vole and montane vole genomes useful additions to currently available genomic resources.
Collapse
Affiliation(s)
- Drew J. Duckett
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus, OH 43212, USA
| | - Jack Sullivan
- Department of Biological Sciences, University of Idaho, Box 443051, Moscow, ID 83844-3051, USA
| | - Stacy Pirro
- Iridian Genomes, Inc., 6213 Swords Way, Bethesda, MD 20817, USA
| | - Bryan C. Carstens
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus, OH 43212, USA
| |
Collapse
|
169
|
Compromised Function of the Pancreatic Transcription Factor PDX1 in a Lineage of Desert Rodents. J MAMM EVOL 2021. [DOI: 10.1007/s10914-021-09544-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractGerbils are a subfamily of rodents living in arid regions of Asia and Africa. Recent studies have shown that several gerbil species have unusual amino acid changes in the PDX1 protein, a homeodomain transcription factor essential for pancreatic development and β-cell function. These changes were linked to strong GC-bias in the genome that may be caused by GC-biased gene conversion, and it has been hypothesized that this caused accumulation of deleterious changes. Here we use two approaches to examine if the unusual changes are adaptive or deleterious. First, we compare PDX1 protein sequences between 38 rodents to test for association with habitat. We show the PDX1 homeodomain is almost totally conserved in rodents, apart from gerbils, regardless of habitat. Second, we use ectopic gene overexpression and gene editing in cell culture to compare functional properties of PDX1 proteins. We show that the divergent gerbil PDX1 protein inefficiently binds an insulin gene promoter and ineffectively regulates insulin expression in response to high glucose in rat cells. The protein has, however, retained the ability to regulate some other β-cell genes. We suggest that during the evolution of gerbils, the selection-blind process of biased gene conversion pushed fixation of mutations adversely affecting function of a normally conserved homeodomain protein. We argue these changes were not entirely adaptive and may be associated with metabolic disorders in gerbil species on high carbohydrate diets. This unusual pattern of molecular evolution could have had a constraining effect on habitat and diet choice in the gerbil lineage.
Collapse
|
170
|
Islam R, Raju RS, Tasnim N, Shihab IH, Bhuiyan MA, Araf Y, Islam T. Choice of assemblers has a critical impact on de novo assembly of SARS-CoV-2 genome and characterizing variants. Brief Bioinform 2021; 22:6210065. [PMID: 33822878 PMCID: PMC8083570 DOI: 10.1093/bib/bbab102] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/06/2021] [Accepted: 03/08/2021] [Indexed: 12/18/2022] Open
Abstract
Background Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic following its initial emergence in China. SARS-CoV-2 has a positive-sense single-stranded RNA virus genome of around 30Kb. Using next-generation sequencing technologies, a large number of SARS-CoV-2 genomes are being sequenced at an unprecedented rate and being deposited in public repositories. For the de novo assembly of the SARS-CoV-2 genomes, a myriad of assemblers is being used, although their impact on the assembly quality has not been characterized for this virus. In this study, we aim to understand the variabilities on assembly qualities due to the choice of the assemblers. Results We performed 6648 de novo assemblies of 416 SARS-CoV-2 samples using eight different assemblers with different k-mer lengths. We used Illumina paired-end sequencing reads and compared the assembly quality of those assemblers. We showed that the choice of assembler plays a significant role in reconstructing the SARS-CoV-2 genome. Two metagenomic assemblers, e.g. MEGAHIT and metaSPAdes, performed better compared with others in most of the assembly quality metrics including, recovery of a larger fraction of the genome, constructing larger contigs and higher N50, NA50 values, etc. We showed that at least 09% (259/2873) of the variants present in the assemblies between MEGAHIT and metaSPAdes are unique to one of the assembly methods. Conclusion Our analyses indicate the critical role of assembly methods for assembling SARS-CoV-2 genome using short reads and their impact on variant characterization. This study could help guide future studies to determine the best-suited assembler for the de novo assembly of virus genomes.
Collapse
Affiliation(s)
| | - Rajan Saha Raju
- Computer Science and Engineering from the Shahjalal University of Science and Technology
| | - Nazia Tasnim
- Computer Science and Engineering at the Shahjalal University of Science and Technology
| | - Istiak Hossain Shihab
- Department of Computer Science and Engineering, Shahjalal University of Science and Technology
| | - Maruf Ahmed Bhuiyan
- Doctor of Medicine (MD) in Virology at Bangabandhu Sheikh Mujib Medical University
| | - Yusha Araf
- Genetic Engineering and Biotechnology at the Shahjalal University of Science and Technology
| | - Tofazzal Islam
- Institute of Biotechnology and Genetic Engineering, Bangabandhu Sheikh Mujibur Rahman Agricultural University, Bangladesh
| |
Collapse
|
171
|
Longitudinal study of the scalp microbiome suggests coconut oil to enrich healthy scalp commensals. Sci Rep 2021; 11:7220. [PMID: 33790324 PMCID: PMC8012655 DOI: 10.1038/s41598-021-86454-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 02/04/2021] [Indexed: 01/03/2023] Open
Abstract
Dandruff is a recurrent chronic scalp disorder, affecting majority of the population worldwide. Recently a metagenomic study of the Indian scalp microbiome described an imperative role of bacterial commensals in providing essential vitamins and amino acids to the scalp. Coconut oil and its formulations are commonly applied on the scalp in several parts of the world to maintain scalp health. Thus, in this study we examined the effect of topical application of coconut oil on the scalp microbiome (bacterial and fungal) at the taxonomic and functional levels and their correlation with scalp physiological parameters. A 16-weeks-long time-course study was performed including 12-weeks of treatment and 4-weeks of relapse phase on a cohort of 140 (70 healthy and 70 dandruff) Indian women, resulting in ~ 900 metagenomic samples. After the treatment phase, an increase in the abundance of Cutibacterium acnes and Malassezia globosa in dandruff scalp was observed, which were negatively correlated to dandruff parameters. At the functional level, an enrichment of healthy scalp-related bacterial pathways, such as biotin metabolism and decrease in the fungal pathogenesis pathways was observed. The study provides novel insights on the effect of coconut oil in maintaining a healthy scalp and in modulating the scalp microbiome.
Collapse
|
172
|
Molecular organization of recombinant human-Arabidopsis chromosomes in hybrid cell lines. Sci Rep 2021; 11:7160. [PMID: 33785802 PMCID: PMC8009911 DOI: 10.1038/s41598-021-86130-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 03/08/2021] [Indexed: 12/14/2022] Open
Abstract
Although plants and animals are evolutionarily distant, the structure and function of their chromosomes are largely conserved. This allowed the establishment of a human-Arabidopsis hybrid cell line in which a neo-chromosome was formed by insertion of segments of Arabidopsis chromosomes into human chromosome 15. We used this unique system to investigate how the introgressed part of a plant genome was maintained in human genetic background. The analysis of the neo-chromosome in 60- and 300-day-old cell cultures by next-generation sequencing and molecular cytogenetics suggested its origin by fusion of DNA fragments of different sizes from Arabidopsis chromosomes 2, 3, 4, and 5, which were randomly intermingled rather than joined end-to-end. The neo-chromosome harbored Arabidopsis centromeric repeats and terminal human telomeres. Arabidopsis centromere wasn’t found to be functional. Most of the introgressed Arabidopsis DNA was eliminated during the culture, and the Arabidopsis genome in 300-day-old culture showed significant variation in copy number as compared with the copy number variation in the 60-day-old culture. Amplified Arabidopsis centromere DNA and satellite repeats were localized at particular loci and some fragments were inserted into various positions of human chromosome. Neo-chromosome reorganization and behavior in somatic cell hybrids between the plant and animal kingdoms are discussed.
Collapse
|
173
|
Draft genome of the glucose tolerant β-glucosidase producing rare Aspergillus unguis reveals complete cellulolytic machinery with multiple beta-glucosidase genes. Fungal Genet Biol 2021; 151:103551. [PMID: 33737204 DOI: 10.1016/j.fgb.2021.103551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 02/24/2021] [Accepted: 03/07/2021] [Indexed: 11/20/2022]
Abstract
Draft genome sequence of the glucose tolerant beta glucosidase (GT-BGL) producing rare fungus Aspergillus unguis NII 08,123 was generated through Next Generation Sequencing (NGS). The genome size of the fungus was estimated to be 37.1 Mb. A total of 3116 contigs were assembled using SPades, and 15,161 proteins were predicted using AUGUSTUS 3.1. Among them, 13,850 proteins were annotated using UniProt. Distribution of CAZyme genes specifically those encoding lignocellulose degrading enzymes were analyzed and compared with those from the industrial cellulase producer Trichoderma reesei in view of the huge differences in detectable enzyme activities between the fungi, despite the ability of A. unguis to grow on lignocellulose as sole carbon source. Full length gene sequence of the inducible GT-BGL could be identified through tracing back from peptide mass fingerprint. A total of 403 CAZymes were predicted from the genome, which includes 232 glycoside hydrolases (GHs), 12 carbohydrate esterases (CEs), 109 glycosyl transferases (GTs), 15 polysaccharide lyases (PLs), and 35 genes with auxiliary activities (AAs). The high level of zinc finger motif containing transcription factors could possibly hint a tight regulation of the cellulolytic machinery, which may also explain the low cellulase activities even when a complete repertoire of cellulase degrading enzyme genes are present in the fungus.
Collapse
|
174
|
Whole genome survey analysis and microsatellite motif identification of Sebastiscus marmoratus. Biosci Rep 2021; 40:222120. [PMID: 32090250 PMCID: PMC7040462 DOI: 10.1042/bsr20192252] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 02/04/2020] [Accepted: 02/13/2020] [Indexed: 01/17/2023] Open
Abstract
The marbled rockfish Sebastiscus marmoratus is an ecologically and economically important marine fish species distributed along the northwestern Pacific coast from Japan to the Philippines. Here, next-generation sequencing was used to generate a whole genome survey dataset to provide fundamental information of its genome and develop genome-wide microsatellite markers for S. marmoratus. The genome size of S. marmoratus was estimated as approximate 800 Mb by using K-mer analyses, and its heterozygosity ratio and repeat sequence ratio were 0.17% and 39.65%, respectively. The preliminary assembled genome was nearly 609 Mb with GC content of 41.3%, and the data were used to develop microsatellite markers. A total of 191,592 microsatellite motifs were identified. The most frequent repeat motif was dinucleotide with a frequency of 76.10%, followed by 19.63% trinucleotide, 3.91% tetranucleotide, and 0.36% pentanucleotide motifs. The AC, GAG, and ATAG repeats were the most abundant motifs of dinucleotide, trinucleotide, and tetranucleotide motifs, respectively. In summary, a wide range of candidate microsatellite markers were identified and characterized in the present study using genome survey analysis. High-quality whole genome sequence based on the “Illumina+PacBio+Hi-C” strategy is warranted for further comparative genomics and evolutionary biology studies in this species.
Collapse
|
175
|
Wang L, Niu D, Wang X, Khan J, Shen Q, Xue Y. A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency. Foods 2021; 10:foods10030550. [PMID: 33800877 PMCID: PMC7999667 DOI: 10.3390/foods10030550] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 12/22/2022] Open
Abstract
Strategies to screen antihypertensive peptides with high throughput and rapid speed will doubtlessly contribute to the treatment of hypertension. Food-derived antihypertensive peptides can reduce blood pressure without side effects. In the present study, a novel model based on the eXtreme Gradient Boosting (XGBoost) algorithm was developed and compared with the dominating machine learning models. To further reflect on the reliability of the method in a real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of the k-mer peptides cutting from six key proteins in bovine milk, and the peptide-protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance, with an accuracy of 86.50% and area under the receiver operating characteristic curve of 94.11%, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide-protein docking results, and was more efficient. Our results indicate that using the XGBoost algorithm as a novel auxiliary tool is feasible to screen for antihypertensive peptides derived from food, with high throughput and high efficiency.
Collapse
Affiliation(s)
- Liyang Wang
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (J.K.); (Q.S.)
| | - Dantong Niu
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China;
| | - Xiaoya Wang
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (J.K.); (Q.S.)
| | - Jabir Khan
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (J.K.); (Q.S.)
| | - Qun Shen
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (J.K.); (Q.S.)
| | - Yong Xue
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (J.K.); (Q.S.)
- Correspondence:
| |
Collapse
|
176
|
Polonio Á, Díaz-Martínez L, Fernández-Ortuño D, de Vicente A, Romero D, López-Ruiz FJ, Pérez-García A. A Hybrid Genome Assembly Resource for Podosphaera xanthii, the Main Causal Agent of Powdery Mildew Disease in Cucurbits. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2021; 34:319-324. [PMID: 33141618 DOI: 10.1094/mpmi-08-20-0237-a] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Podosphaera xanthii is the main causal agent of powdery mildew in cucurbits and, arguably, the most important fungal pathogen of cucurbit crops. Here, we present the first reference genome assembly for P. xanthii. We performed a hybrid genome assembly, using reads from Illumina NextSeq550 and PacBio Sequel S3. The short and long reads were assembled into 1,727 scaffolds with an N50 size of 163,173 bp, resulting in a 142-Mb genome size. The combination of homology-based and ab initio predictions allowed the prediction of 14,911 complete genes. Repetitive sequences comprised 76.2% of the genome. Our P. xanthii genome assembly improves considerably the molecular resources for research on P. xanthii-cucurbit interactions and provides new opportunities for further genomics, transcriptomics, and evolutionary studies in powdery mildew fungi.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Álvaro Polonio
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Luis Díaz-Martínez
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Dolores Fernández-Ortuño
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Antonio de Vicente
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Diego Romero
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| | - Francisco J López-Ruiz
- Centre for Crop and Disease Management, School of Molecular and Life Sciences, Curtin University, Perth, WA 6102, Australia
| | - Alejandro Pérez-García
- Departamento de Microbiología, Facultad de Ciencias, Universidad de Málaga, 29071 Málaga, Spain
- Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), 29071 Málaga, Spain
| |
Collapse
|
177
|
Romey A, Lamglait B, Blanchard Y, Touzain F, Quenault H, Relmy A, Zientara S, Blaise-Boisseau S, Bakkali-Kassimi L. Molecular characterization of encephalomyocarditis virus strains isolated from an African elephant and rats in a French zoo. J Vet Diagn Invest 2021; 33:313-321. [PMID: 33292091 PMCID: PMC7953090 DOI: 10.1177/1040638720978389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In November 2013, a fatal encephalomyocarditis virus (EMCV) case in a captive African elephant (Loxodonta africana) occurred at the Réserve Africaine de Sigean, a zoo in the south of France. Here we report the molecular characterization of the EMCV strains isolated from samples collected from the dead elephant and from 3 rats (Rattus rattus) captured in the zoo at the same time. The EMCV infection was confirmed by reverse-transcription real-time PCR (RT-rtPCR) and genome sequencing. Complete genome sequencing and sequence alignment indicated that the elephant's EMCV strain was 98.1-99.9% identical to the rat EMCV isolates at the nucleotide sequence level. Phylogenetic analysis of the ORF, P1, VP1, and 3D sequences revealed that the elephant and rat strains clustered into lineage A of the EMCV 1 group. To our knowledge, molecular characterization of EMCV in France and Europe has not been reported previously in a captive elephant. The full genome analyses of EMCV isolated from an elephant and rats in the same outbreak emphasizes the role of rodents in EMCV introduction and circulation in zoos.
Collapse
Affiliation(s)
- Aurore Romey
- Animal Health Laboratory, UMR1161 Virology, INRAE, ANSES, ENVA, Paris-Est University, Maisons-Alfort, France
| | | | - Yannick Blanchard
- Unit of Viral Genetics and Biosafety, Ploufragan Laboratory, ANSES, Ploufragan, France
| | - Fabrice Touzain
- Unit of Viral Genetics and Biosafety, Ploufragan Laboratory, ANSES, Ploufragan, France
| | - Helene Quenault
- Unit of Viral Genetics and Biosafety, Ploufragan Laboratory, ANSES, Ploufragan, France
| | - Anthony Relmy
- Animal Health Laboratory, UMR1161 Virology, INRAE, ANSES, ENVA, Paris-Est University, Maisons-Alfort, France
| | - Stephan Zientara
- Animal Health Laboratory, UMR1161 Virology, INRAE, ANSES, ENVA, Paris-Est University, Maisons-Alfort, France
| | - Sandra Blaise-Boisseau
- Animal Health Laboratory, UMR1161 Virology, INRAE, ANSES, ENVA, Paris-Est University, Maisons-Alfort, France
| | - Labib Bakkali-Kassimi
- Animal Health Laboratory, UMR1161 Virology, INRAE, ANSES, ENVA, Paris-Est University, Maisons-Alfort, France
| |
Collapse
|
178
|
Silva R, Padovani K, Góes F, Alves R. geneRFinder: gene finding in distinct metagenomic data complexities. BMC Bioinformatics 2021; 22:87. [PMID: 33632132 PMCID: PMC7905635 DOI: 10.1186/s12859-021-03997-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/04/2021] [Indexed: 12/01/2022] Open
Abstract
Background Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. Results We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. Conclusions We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.
Collapse
Affiliation(s)
- Raíssa Silva
- Vale Institute of Technology, Boaventura da Silva, 955, Belém, BR, 66055-090, Brazil.,PPGCC, Federal University of Pará, Augusto Corrêa, 01, Belém, BR, 66075-110, Brazil
| | - Kleber Padovani
- PPGCC, Federal University of Pará, Augusto Corrêa, 01, Belém, BR, 66075-110, Brazil
| | - Fabiana Góes
- ICMC, University of São Paulo, Trab. São Carlense, 400, São Carlos, BR, 13566-590, Brazil
| | - Ronnie Alves
- Vale Institute of Technology, Boaventura da Silva, 955, Belém, BR, 66055-090, Brazil. .,PPGCC, Federal University of Pará, Augusto Corrêa, 01, Belém, BR, 66075-110, Brazil.
| |
Collapse
|
179
|
Apriyanto A, Tambunan VB. Draft genome sequence, annotation, and SSR mining data of Elaeidobius kamerunicus Faust., an essential oil palm pollinating weevil. Data Brief 2021; 34:106745. [PMID: 33537371 PMCID: PMC7843393 DOI: 10.1016/j.dib.2021.106745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/30/2020] [Accepted: 01/08/2021] [Indexed: 11/19/2022] Open
Abstract
Elaeidobius kamerunicus Faust. (Coleoptera: Curculionidae) is an essential insect pollinator in oil palm plantations. Recently, researches have been undertaken to improve pollination efficiency using this species. A fundamental understanding of the genes related to this pollinator behavior is necessary to achieve this goal. Here, we present the draft genome sequence, annotation, and simple sequence repeat (SSR) marker data for this pollinator. In total, 34.97 Gb of sequence data from one male individual (monoisolate) were obtained using Illumina short-read platform NextSeq 500. The draft genome assembly was found to be 269.79 Mb and about 59.9% of completeness based on Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment. Functional gene annotation predicted about 26.566 genes. Also, a total of 281.668 putative SSR markers were identified. This draft genome sequence is a valuable resource for understanding the population genetics, phylogenetics, dispersal patterns, and behavior of this species.
Collapse
Affiliation(s)
- Ardha Apriyanto
- Research and Development, PT. Astra Agro Lestari Tbk, Jl. Puloayang Raya Blok OR I, Kawasan Industri Pulogadung, Jakarta Timur, Indonesia
- Biopolymer Analytics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Building 20, Potsdam-Golm, Germany
- Corresponding author.
| | - Van Basten Tambunan
- Research and Development, PT. Astra Agro Lestari Tbk, Jl. Puloayang Raya Blok OR I, Kawasan Industri Pulogadung, Jakarta Timur, Indonesia
| |
Collapse
|
180
|
Huang H, Liang J, Tan Q, Ou L, Li X, Zhong C, Huang H, Møller IM, Wu X, Song S. Insights into triterpene synthesis and unsaturated fatty-acid accumulation provided by chromosomal-level genome analysis of Akebia trifoliata subsp. australis. HORTICULTURE RESEARCH 2021; 8:33. [PMID: 33518712 PMCID: PMC7848005 DOI: 10.1038/s41438-020-00458-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 11/16/2020] [Accepted: 11/20/2020] [Indexed: 05/10/2023]
Abstract
Akebia trifoliata subsp. australis is a well-known medicinal and potential woody oil plant in China. The limited genetic information available for A. trifoliata subsp. australis has hindered its exploitation. Here, a high-quality chromosome-level genome sequence of A. trifoliata subsp. australis is reported. The de novo genome assembly of 682.14 Mb was generated with a scaffold N50 of 43.11 Mb. The genome includes 25,598 protein-coding genes, and 71.18% (485.55 Mb) of the assembled sequences were identified as repetitive sequences. An ongoing massive burst of long terminal repeat (LTR) insertions, which occurred ~1.0 million years ago, has contributed a large proportion of LTRs in the genome of A. trifoliata subsp. australis. Phylogenetic analysis shows that A. trifoliata subsp. australis is closely related to Aquilegia coerulea and forms a clade with Papaver somniferum and Nelumbo nucifera, which supports the well-established hypothesis of a close relationship between basal eudicot species. The expansion of UDP-glucoronosyl and UDP-glucosyl transferase gene families and β-amyrin synthase-like genes and the exclusive contraction of terpene synthase gene families may be responsible for the abundant oleanane-type triterpenoids in A. trifoliata subsp. australis. Furthermore, the acyl-ACP desaturase gene family, including 12 stearoyl-acyl-carrier protein desaturase (SAD) genes, has expanded exclusively. A combined transcriptome and fatty-acid analysis of seeds at five developmental stages revealed that homologs of SADs, acyl-lipid desaturase omega fatty acid desaturases (FADs), and oleosins were highly expressed, consistent with the rapid increase in the content of fatty acids, especially unsaturated fatty acids. The genomic sequences of A. trifoliata subsp. australis will be a valuable resource for comparative genomic analyses and molecular breeding.
Collapse
Affiliation(s)
- Hui Huang
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
- Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Juan Liang
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Qi Tan
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Linfeng Ou
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Xiaolin Li
- State Key Laboratory Breeding Base of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Caihong Zhong
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Huilin Huang
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Ian Max Møller
- Department of Molecular Biology and Genetics, Aarhus University, Flakkebjerg, DK-4200, Slagelse, Denmark
| | - Xianjin Wu
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China
| | - Songquan Song
- Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, College of Biological and Food Engineering, Huaihua University, Huaihua, 418000, China.
- Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China.
| |
Collapse
|
181
|
Meyer A, Schloissnig S, Franchini P, Du K, Woltering JM, Irisarri I, Wong WY, Nowoshilow S, Kneitz S, Kawaguchi A, Fabrizius A, Xiong P, Dechaud C, Spaink HP, Volff JN, Simakov O, Burmester T, Tanaka EM, Schartl M. Giant lungfish genome elucidates the conquest of land by vertebrates. Nature 2021; 590:284-289. [PMID: 33461212 PMCID: PMC7875771 DOI: 10.1038/s41586-021-03198-8] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/06/2021] [Indexed: 01/29/2023]
Abstract
Lungfishes belong to lobe-fined fish (Sarcopterygii) that, in the Devonian period, 'conquered' the land and ultimately gave rise to all land vertebrates, including humans1-3. Here we determine the chromosome-quality genome of the Australian lungfish (Neoceratodus forsteri), which is known to have the largest genome of any animal. The vast size of this genome, which is about 14× larger than that of humans, is attributable mostly to huge intergenic regions and introns with high repeat content (around 90%), the components of which resemble those of tetrapods (comprising mainly long interspersed nuclear elements) more than they do those of ray-finned fish. The lungfish genome continues to expand independently (its transposable elements are still active), through mechanisms different to those of the enormous genomes of salamanders. The 17 fully assembled lungfish macrochromosomes maintain synteny to other vertebrate chromosomes, and all microchromosomes maintain conserved ancient homology with the ancestral vertebrate karyotype. Our phylogenomic analyses confirm previous reports that lungfish occupy a key evolutionary position as the closest living relatives to tetrapods4,5, underscoring the importance of lungfish for understanding innovations associated with terrestrialization. Lungfish preadaptations to living on land include the gain of limb-like expression in developmental genes such as hoxc13 and sall1 in their lobed fins. Increased rates of evolution and the duplication of genes associated with obligate air-breathing, such as lung surfactants and the expansion of odorant receptor gene families (which encode proteins involved in detecting airborne odours), contribute to the tetrapod-like biology of lungfishes. These findings advance our understanding of this major transition during vertebrate evolution.
Collapse
Affiliation(s)
- Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany.
| | | | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Kang Du
- Developmental Biochemistry, Biocenter, University of Würzburg, Würzburg, Germany
- The Xiphophorus Genetic Stock Center, Texas State University, San Marcos, TX, USA
| | | | - Iker Irisarri
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales (MNCN-CSIC), Madrid, Spain
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Wai Yee Wong
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
| | | | - Susanne Kneitz
- Biochemistry and Cell Biology, Biocenter, University of Würzburg, Würzburg, Germany
| | - Akane Kawaguchi
- Research Institute of Molecular Pathology (IMP), Vienna, Austria
| | | | - Peiwen Xiong
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Corentin Dechaud
- Institut de Génomique Fonctionnelle, École Normale Superieure, Université Claude Bernard, Lyon, France
| | - Herman P Spaink
- Faculty of Science, Universiteit Leiden, Leiden, The Netherlands
| | - Jean-Nicolas Volff
- Institut de Génomique Fonctionnelle, École Normale Superieure, Université Claude Bernard, Lyon, France
| | - Oleg Simakov
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria.
| | | | - Elly M Tanaka
- Research Institute of Molecular Pathology (IMP), Vienna, Austria.
| | - Manfred Schartl
- Developmental Biochemistry, Biocenter, University of Würzburg, Würzburg, Germany.
- The Xiphophorus Genetic Stock Center, Texas State University, San Marcos, TX, USA.
| |
Collapse
|
182
|
Fritz A, Bremges A, Deng ZL, Lesker TR, Götting J, Ganzenmüller T, Sczyrba A, Dilthey A, Klawonn F, McHardy A. Haploflow: Strain-resolved de novo assembly of viral genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.25.428049. [PMID: 33532769 PMCID: PMC7852260 DOI: 10.1101/2021.01.25.428049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
In viral infections often multiple related viral strains are present, due to coinfection or within-host evolution. We describe Haploflow, a de Bruijn graph-based assembler for de novo genome assembly of viral strains from mixed sequence samples using a novel flow algorithm. We assessed Haploflow across multiple benchmark data sets of increasing complexity, showing that Haploflow is faster and more accurate than viral haplotype assemblers and generic metagenome assemblers not aiming to reconstruct strains. Haplotype reconstructed high-quality strain-resolved assemblies from clinical HCMV samples and SARS-CoV-2 genomes from wastewater metagenomes identical to genomes from clinical isolates.
Collapse
Affiliation(s)
- A. Fritz
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- DZIF, German Centre for Infection Research
| | - A. Bremges
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- DZIF, German Centre for Infection Research
| | - Z.-L. Deng
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - T.-R. Lesker
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - J. Götting
- DZIF, German Centre for Infection Research
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - T. Ganzenmüller
- DZIF, German Centre for Infection Research
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute for Medical Virology, University Hospital Tuebingen, Tuebingen, Germany
| | - A. Sczyrba
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - A. Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA
| | - F. Klawonn
- Department of Computer Science, Ostfalia University of Applied Sciences, Wolfenbuettel, Germany
- Biostatistics Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - A.C. McHardy
- BIFO, Department of Computational Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- DZIF, German Centre for Infection Research
| |
Collapse
|
183
|
Kushwaha B, Pandey M, Das P, Joshi CG, Nagpure NS, Kumar R, Kumar D, Agarwal S, Srivastava S, Singh M, Sahoo L, Jayasankar P, Meher PK, Shah TM, Hinsu AT, Patel N, Koringa PG, Das SP, Patnaik S, Bit A, Iquebal MA, Jaiswal S, Jena J. The genome of walking catfish Clarias magur (Hamilton, 1822) unveils the genetic basis that may have facilitated the development of environmental and terrestrial adaptation systems in air-breathing catfishes. DNA Res 2021; 28:6070145. [PMID: 33416875 PMCID: PMC7934567 DOI: 10.1093/dnares/dsaa031] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 12/21/2020] [Indexed: 11/14/2022] Open
Abstract
The walking catfish Clarias magur (Hamilton, 1822) (magur) is an important catfish species inhabiting the Indian subcontinent. It is considered as a highly nutritious food fish and has the capability to walk to some distance, and survive a considerable period without water. Assembly, scaffolding and several rounds of iterations resulted in 3,484 scaffolds covering ∼94% of estimated genome with 9.88 Mb largest scaffold, and N50 1.31 Mb. The genome possessed 23,748 predicted protein encoding genes with annotation of 19,279 orthologous genes. A total of 166 orthologous groups represented by 222 genes were found to be unique for this species. The Computational Analysis of gene Family Evolution (CAFE) analysis revealed expansion of 207 gene families and 100 gene families have rapidly evolved. Genes specific to important environmental and terrestrial adaptation, viz. urea cycle, vision, locomotion, olfactory and vomeronasal receptors, immune system, anti-microbial properties, mucus, thermoregulation, osmoregulation, air-breathing, detoxification, etc. were identified and critically analysed. The analysis clearly indicated that C. magur genome possessed several unique and duplicate genes similar to that of terrestrial or amphibians’ counterparts in comparison to other teleostean species. The genome information will be useful in conservation genetics, not only for this species but will also be very helpful in such studies in other catfishes.
Collapse
Affiliation(s)
- Basdeo Kushwaha
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Manmohan Pandey
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Paramananda Das
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Chaitanya G Joshi
- Department of Animal Biotechnology, Anand Agricultural University, Anand, Gujarat 388110, India
| | - Naresh S Nagpure
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Ravindra Kumar
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Dinesh Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Suyash Agarwal
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Shreya Srivastava
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Mahender Singh
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| | - Lakshman Sahoo
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Pallipuram Jayasankar
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Prem K Meher
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Tejas M Shah
- Department of Animal Biotechnology, Anand Agricultural University, Anand, Gujarat 388110, India
| | - Ankit T Hinsu
- Department of Animal Biotechnology, Anand Agricultural University, Anand, Gujarat 388110, India
| | - Namrata Patel
- Department of Animal Biotechnology, Anand Agricultural University, Anand, Gujarat 388110, India
| | - Prakash G Koringa
- Department of Animal Biotechnology, Anand Agricultural University, Anand, Gujarat 388110, India
| | - Sofia P Das
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Siddhi Patnaik
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Amrita Bit
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Freshwater Aquaculture, Bhubaneswar, Odisha 751002, India
| | - Mir A Iquebal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Sarika Jaiswal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Joykrushna Jena
- Molecular Biology and Biotechnology Division, ICAR-National Bureau of Fish Genetic Resources, Lucknow, Uttar Pradesh 226002, India
| |
Collapse
|
184
|
Xue X, Suvorov A, Fujimoto S, Dilman AR, Adams BJ. Genome analysis of Plectus murrayi, a nematode from continental Antarctica. G3-GENES GENOMES GENETICS 2021; 11:6044189. [PMID: 33561244 PMCID: PMC8022722 DOI: 10.1093/g3journal/jkaa045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 12/08/2020] [Indexed: 01/23/2023]
Abstract
Plectus murrayi is one of the most common and locally abundant invertebrates of continental Antarctic ecosystems. Because it is readily cultured on artificial medium in the laboratory and highly tolerant to an extremely harsh environment, P. murrayi is emerging as a model organism for understanding the evolutionary origin and maintenance of adaptive responses to multiple environmental stressors, including freezing and desiccation. The de novo assembled genome of P. murrayi contains 225.741 million base pairs and a total of 14,689 predicted genes. Compared to Caenorhabditis elegans, the architectural components of P. murrayi are characterized by a lower number of protein-coding genes, fewer transposable elements, but more exons, than closely related taxa from less harsh environments. We compared the transcriptomes of lab-reared P. murrayi with wild-caught P. murrayi and found genes involved in growth and cellular processing were up-regulated in lab-cultured P. murrayi, while a few genes associated with cellular metabolism and freeze tolerance were expressed at relatively lower levels. Preliminary comparative genomic and transcriptomic analyses suggest that the observed constraints on P. murrayi genome architecture and functional gene expression, including genome decay and intron retention, may be an adaptive response to persisting in a biotically simplified, yet consistently physically harsh environment.
Collapse
Affiliation(s)
- Xia Xue
- Precision Medicine Center, Academy of Medical Sciences, Zhengzhou University, Zhengzhou 450000, China.,Department of Biology, Evolutionary Ecology Laboratories, and Monte L. Bean Museum, Brigham Young University, Provo, UT, USA
| | - Anton Suvorov
- Department of Biology, Evolutionary Ecology Laboratories, and Monte L. Bean Museum, Brigham Young University, Provo, UT, USA
| | - Stanley Fujimoto
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Adler R Dilman
- Department of Nematology, University of California, Riverside, CA, USA
| | - Byron J Adams
- Department of Biology, Evolutionary Ecology Laboratories, and Monte L. Bean Museum, Brigham Young University, Provo, UT, USA
| |
Collapse
|
185
|
Chromosome-level genome assembly of Ophiorrhiza pumila reveals the evolution of camptothecin biosynthesis. Nat Commun 2021; 12:405. [PMID: 33452249 PMCID: PMC7810986 DOI: 10.1038/s41467-020-20508-2] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 12/07/2020] [Indexed: 01/29/2023] Open
Abstract
Plant genomes remain highly fragmented and are often characterized by hundreds to thousands of assembly gaps. Here, we report chromosome-level reference and phased genome assembly of Ophiorrhiza pumila, a camptothecin-producing medicinal plant, through an ordered multi-scaffolding and experimental validation approach. With 21 assembly gaps and a contig N50 of 18.49 Mb, Ophiorrhiza genome is one of the most complete plant genomes assembled to date. We also report 273 nitrogen-containing metabolites, including diverse monoterpene indole alkaloids (MIAs). A comparative genomics approach identifies strictosidine biogenesis as the origin of MIA evolution. The emergence of strictosidine biosynthesis-catalyzing enzymes precede downstream enzymes' evolution post γ whole-genome triplication, which occurred approximately 110 Mya in O. pumila, and before the whole-genome duplication in Camptotheca acuminata identified here. Combining comparative genome analysis, multi-omics analysis, and metabolic gene-cluster analysis, we propose a working model for MIA evolution, and a pangenome for MIA biosynthesis, which will help in establishing a sustainable supply of camptothecin.
Collapse
|
186
|
De-la-Cruz IM, Hallab A, Olivares-Pinto U, Tapia-López R, Velázquez-Márquez S, Piñero D, Oyama K, Usadel B, Núñez-Farfán J. Genomic signatures of the evolution of defence against its natural enemies in the poisonous and medicinal plant Datura stramonium (Solanaceae). Sci Rep 2021; 11:882. [PMID: 33441607 PMCID: PMC7806989 DOI: 10.1038/s41598-020-79194-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 12/03/2020] [Indexed: 01/22/2023] Open
Abstract
Tropane alkaloids and terpenoids are widely used in the medicine and pharmaceutic industry and evolved as chemical defenses against herbivores and pathogens in the annual herb Datura stramonium (Solanaceae). Here, we present the first draft genomes of two plants from contrasting environments of D. stramonium. Using these de novo assemblies, along with other previously published genomes from 11 Solanaceae species, we carried out comparative genomic analyses to provide insights on the genome evolution of D. stramonium within the Solanaceae family, and to elucidate adaptive genomic signatures to biotic and abiotic stresses in this plant. We also studied, in detail, the evolution of four genes of D. stramonium-Putrescine N-methyltransferase, Tropinone reductase I, Tropinone reductase II and Hyoscyamine-6S-dioxygenase-involved in the tropane alkaloid biosynthesis. Our analyses revealed that the genomes of D. stramonium show signatures of expansion, physicochemical divergence and/or positive selection on proteins related to the production of tropane alkaloids, terpenoids, and glycoalkaloids as well as on R defensive genes and other important proteins related with biotic and abiotic pressures such as defense against natural enemies and drought.
Collapse
Affiliation(s)
- I M De-la-Cruz
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - A Hallab
- IBG-4 Bioinformatics, CEPLAS, Forschungszentrum Jülich, Julich, Germany
| | - U Olivares-Pinto
- Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México (UNAM), Campus Juriquilla, Querétaro, Mexico
| | - R Tapia-López
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - S Velázquez-Márquez
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - D Piñero
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - K Oyama
- Escuela Nacional de Estudios Superiores and Laboratorio Nacional de Análisis y Síntesis Ecológica (LANASE), Universidad Nacional Autónoma de México (UNAM), Campus Morelia, Morelia, Michoacán, Mexico
| | - B Usadel
- IBG-4 Bioinformatics, CEPLAS, Forschungszentrum Jülich, Julich, Germany
- Institute for Biology I, RWTH Aachen University, Aachen, Germany
| | - J Núñez-Farfán
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico.
| |
Collapse
|
187
|
Luhmann N, Holley G, Achtman M. BlastFrost: fast querying of 100,000s of bacterial genomes in Bifrost graphs. Genome Biol 2021; 22:30. [PMID: 33430919 PMCID: PMC7798312 DOI: 10.1186/s13059-020-02237-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 12/11/2020] [Indexed: 12/30/2022] Open
Abstract
BlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes. BlastFrost is available at https://github.com/nluhmann/BlastFrost/tree/master/data .
Collapse
Affiliation(s)
- Nina Luhmann
- Warwick Medical School, University of Warwick, Coventry, UK.
| | - Guillaume Holley
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland
| | - Mark Achtman
- Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
188
|
Carvalho R, Aburjaile F, Canario M, Nascimento AMA, Chartone-Souza E, de Jesus L, Zamyatnin AA, Brenig B, Barh D, Ghosh P, Goes-Neto A, Figueiredo HCP, Soares S, Ramos R, Pinto A, Azevedo V. Genomic Characterization of Multidrug-Resistant Escherichia coli BH100 Sub-strains. Front Microbiol 2021; 11:549254. [PMID: 33584554 PMCID: PMC7874104 DOI: 10.3389/fmicb.2020.549254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 12/09/2020] [Indexed: 01/17/2023] Open
Abstract
The rapid emergence of multidrug-resistant (MDR) bacteria is a global health problem. Mobile genetic elements like conjugative plasmids, transposons, and integrons are the major players in spreading resistance genes in uropathogenic Escherichia coli (UPEC) pathotype. The E. coli BH100 strain was isolated from the urinary tract of a Brazilian woman in 1974. This strain presents two plasmids carrying MDR cassettes, pBH100, and pAp, with conjugative and mobilization properties, respectively. However, its transposable elements have not been characterized. In this study, we attempted to unravel the factors involved in the mobilization of virulence and drug-resistance genes by assessing genomic rearrangements in four BH100 sub-strains (BH100 MG2014, BH100 MG2017, BH100L MG2017, and BH100N MG2017). Therefore, the complete genomes of the BH100 sub-strains were achieved through Next Generation Sequencing and submitted to comparative genomic analyses. Our data shows recombination events between the two plasmids in the sub-strain BH100 MG2017 and between pBH100 and the chromosome in BH100L MG2017. In both cases, IS3 and IS21 elements were detected upstream of Tn21 family transposons associated with MDR genes at the recombined region. These results integrated with Genomic island analysis suggest pBH100 might be involved in the spreading of drug resistance through the formation of resistance islands. Regarding pathogenicity, our results reveal that BH100 strain is closely related to UPEC strains and contains many IS3 and IS21-transposase-enriched genomic islands associated with virulence. This study concludes that those IS elements are vital for the evolution and adaptation of BH100 strain.
Collapse
Affiliation(s)
- Rodrigo Carvalho
- Institute of Molecular Medicine, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Flavia Aburjaile
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.,Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil
| | - Marcus Canario
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Andréa M A Nascimento
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Edmar Chartone-Souza
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Luis de Jesus
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Andrey A Zamyatnin
- Institute of Molecular Medicine, Sechenov First Moscow State Medical University, Moscow, Russia.,Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Bertram Brenig
- Institute of Veterinary Medicine, University of Göttingen, Göttingen, Germany
| | - Debmalya Barh
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.,Institute of Integrative Omics and Applied Biotechnology, Purba Medinipur, India
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Aristoteles Goes-Neto
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Henrique C P Figueiredo
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Siomar Soares
- Departmento de Microbiologia, Imunologia e Parasitologia, Universidade Federal do Triangulo Mineiro, Uberaba, Brazil
| | | | - Anne Pinto
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Vasco Azevedo
- Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
189
|
Marchet C, Boucher C, Puglisi SJ, Medvedev P, Salson M, Chikhi R. Data structures based on k-mers for querying large collections of sequencing data sets. Genome Res 2021; 31:1-12. [PMID: 33328168 PMCID: PMC7849385 DOI: 10.1101/gr.260604.119] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 09/14/2020] [Indexed: 12/19/2022]
Abstract
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
Collapse
Affiliation(s)
- Camille Marchet
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Simon J Puglisi
- Department of Computer Science, University of Helsinki, FI-00014, Helsinki, Finland
| | - Paul Medvedev
- Department of Computer Science, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Mikaël Salson
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Rayan Chikhi
- Institut Pasteur & CNRS, C3BI USR 3756, F-75015 Paris, France
| |
Collapse
|
190
|
Comparative genomics with a multidrug-resistant Klebsiella pneumoniae isolate reveals the panorama of unexplored diversity in Northeast Brazil. Gene 2020; 772:145386. [PMID: 33373662 DOI: 10.1016/j.gene.2020.145386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 12/14/2020] [Accepted: 12/18/2020] [Indexed: 02/06/2023]
Abstract
The emergence of community acquired infections increases the public health concern on K. pneumoniae and closely related bacteria among which antimicrobial resistance spreads. We report a multidrug-resistant K. pneumoniae isolate, B31, of a patient infected in the community and admitted to an intensive care unit in Northeast Brazil. Antimicrobial susceptibility and genome information were thoroughly investigated to characterize B31 in front of 172 sequenced strains of different countries. Assigned to the Sequence Type 15, which is globally spread, B31 presented extended spectrum beta-lactamase, tigecycline and ciprofloxacin resistance. Genome sequencing revealed most resistance genes being carried by plasmids with high dissemination potential. The absence of main virulence factors, like yersiniabactin and colibactin, apparently suggests a mild pathogenic strain which, on the contrary, persisted and caused severe infection in a previously healthy patient. The present study contributes to unveil the unclear genomic scenario of virulent and multidrug-resistant K. pneumoniae in Brazil.
Collapse
|
191
|
Draft Genome Sequences of Fructobacillus fructosus DPC 7238 and Leuconostoc mesenteroides DPC 7261, Mannitol-Producing Organisms Isolated from Fructose-Rich Honeybee-Resident Flowers on an Irish Farm. Microbiol Resour Announc 2020; 9:9/50/e01297-20. [PMID: 33303674 PMCID: PMC7729423 DOI: 10.1128/mra.01297-20] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Certain bacterial species, including some fructophilic lactic acid bacteria, are known to naturally produce the sugar alcohol mannitol. Here, we announce the draft genome sequences of the mannitol-producing organisms Fructobacillus fructosus DPC 7238 and Leuconostoc mesenteroides DPC 7261, which were isolated from fructose-rich honeybee-resident flowers found on an Irish farm. Certain bacterial species, including some fructophilic lactic acid bacteria, are known to naturally produce the sugar alcohol mannitol. Here, we announce the draft genome sequences of the mannitol-producing organisms Fructobacillus fructosus DPC 7238 and Leuconostoc mesenteroides DPC 7261, which were isolated from fructose-rich honeybee-resident flowers found on an Irish farm.
Collapse
|
192
|
Genome-Wide Analysis of Nubian Ibex Reveals Candidate Positively Selected Genes That Contribute to Its Adaptation to the Desert Environment. Animals (Basel) 2020; 10:ani10112181. [PMID: 33266380 PMCID: PMC7700370 DOI: 10.3390/ani10112181] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/31/2020] [Accepted: 11/03/2020] [Indexed: 12/21/2022] Open
Abstract
Simple Summary The Nubian ibex is a wild relative of the domestic goat found in hot deserts of Northern Africa and Arabia. The domestic goat is an important livestock species that is mainly found in arid and semi-arid regions of Africa and Asia. The Nubian ibex is well adapted to challenging environments in hot deserts characterized by high diurnal temperatures, intense solar radiation, and scarce water resources. It is therefore important to understand the genetic basis of its adaptation for scientific and economic importance. To identify genes with adaptive traits, the Nubian ibex genome was sequenced and compared with that of related mammals. We identified twenty-five genes under selection in the Nubian ibex that play diverse biological roles such as immune response, visual development, signal transduction, and reproduction. Three other genes under adaptive evolution involved in protective functions of the skin against damaging solar radiation in the desert were identified in Nubian ibex genome. Our finding provides valuable genomic insights into the adaptation of Nubian ibex to desert environments. The genomic information generated in this study can be used in developing appropriate breeding programs aimed at enhancing adaptation of local goats to less favorable habitats in response to changing climates. Abstract The domestic goat (Capra hircus) is an important livestock species with a geographic range spanning all continents, including arid and semi-arid regions of Africa and Asia. The Nubian ibex (Capra nubiana), a wild relative of the domestic goat inhabiting the hot deserts of Northern Africa and the Arabian Peninsula, is well-adapted to challenging environments in hot deserts characterized by intense solar radiation, thermal extremes, and scarce water resources. The economic importance of C. hircus breeds, as well as the current trends of global warming, highlights the need to understand the genetic basis of adaptation of C. nubiana to the desert environments. In this study, the genome of a C. nubiana individual was sequenced at an average of 37x coverage. Positively selected genes were identified by comparing protein-coding DNA sequences of C. nubiana and related species using dN/dS statistics. A total of twenty-two positively selected genes involved in diverse biological functions such as immune response, protein ubiquitination, olfactory transduction, and visual development were identified. In total, three of the twenty-two positively selected genes are involved in skin barrier development and function (ATP binding cassette subfamily A member 12, Achaete-scute family bHLH transcription factor 4, and UV stimulated scaffold protein A), suggesting that C. nubiana has evolved skin protection strategies against the damaging solar radiations that prevail in deserts. The positive selection signatures identified here provide new insights into the potential adaptive mechanisms to hot deserts in C. nubiana.
Collapse
|
193
|
El Jeni R, Ghedira K, El Bour M, Abdelhak S, Benkahla A, Bouhaouala-Zahar B. High-quality genome sequence assembly of R.A73 Enterococcus faecium isolated from freshwater fish mucus. BMC Microbiol 2020; 20:322. [PMID: 33096980 PMCID: PMC7584074 DOI: 10.1186/s12866-020-01980-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 09/18/2020] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Whole-genome sequencing using high throughput technologies has revolutionized and speeded up the scientific investigation of bacterial genetics, biochemistry, and molecular biology. Lactic acid bacteria (LABs) have been extensively used in fermentation and more recently as probiotics in food products that promote health. Genome sequencing and functional genomics investigations of LABs varieties provide rapid and important information about their diversity and their evolution, revealing a significant molecular basis. This study investigated the whole genome sequences of the Enterococcus faecium strain (HG937697), isolated from the mucus of freshwater fish in Tunisian dams. Genomic DNA was extracted using the Quick-GDNA kit and sequenced using the Illumina HiSeq2500 system. Sequences quality assessment was performed using FastQC software. The complete genome annotation was carried out with the Rapid Annotation using Subsystem Technology (RAST) web server then NCBI PGAAP. RESULTS The Enterococcus faecium R.A73 assembled in 28 contigs consisting of 2,935,283 bps. The genome annotation revealed 2884 genes in total including 2834 coding sequences and 50 RNAs containing 3 rRNAs (one rRNA 16 s, one rRNA 23 s and one rRNA 5 s) and 47 tRNAs. Twenty-two genes implicated in bacteriocin production are identified within the Enterococcus faecium R.A73 strain. CONCLUSION Data obtained provide insights to further investigate the effective strategy for testing this Enterococcus faecium R.A73 strain in the industrial manufacturing process. Studying their metabolism with bioinformatics tools represents the future challenge and contribution to improving the utilization of the multi-purpose bacteria in food.
Collapse
Affiliation(s)
- Rim El Jeni
- Laboratory of Microbiology and Pathology of Aquatic Organisms, Institut National des Sciences et Technologies de la Mer (INSTM), Tunis, Tunisia
- Laboratory of Venoms and Therapeutic Molecules, Pasteur Institute of Tunis, Tunis, Tunisia
| | - Kais Ghedira
- Bioinformatics and Biostatistics Laboratory (LR16IPT09), Pasteur Institute of Tunis, Tunis, Tunisia
| | - Monia El Bour
- Laboratory of Microbiology and Pathology of Aquatic Organisms, Institut National des Sciences et Technologies de la Mer (INSTM), Tunis, Tunisia
| | - Sonia Abdelhak
- Biomedical Genomics and Oncogenetics Laboratory LR16IPT05, Pasteur Institute of Tunis, Tunis, Tunisia
| | - Alia Benkahla
- Bioinformatics and Biostatistics Laboratory (LR16IPT09), Pasteur Institute of Tunis, Tunis, Tunisia
| | - Balkiss Bouhaouala-Zahar
- Laboratory of Venoms and Therapeutic Molecules, Pasteur Institute of Tunis, Tunis, Tunisia
- Medical School of Tunis, University of Tunis El Manar, 1007 Tunis, Tunisia
| |
Collapse
|
194
|
Mora-Márquez F, Vázquez-Poletti JL, Chano V, Collada C, Soto Á, de Heredia UL. Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud. Curr Bioinform 2020. [DOI: 10.2174/1574893615666191219095817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics software for RNA-seq analysis has a high computational
requirement in terms of the number of CPUs, RAM size, and processor characteristics.
Specifically, de novo transcriptome assembly demands large computational infrastructure due to
the massive data size, and complexity of the algorithms employed. Comparative studies on the
quality of the transcriptome yielded by de novo assemblers have been previously published,
lacking, however, a hardware efficiency-oriented approach to help select the assembly hardware
platform in a cost-efficient way.
Objective:
We tested the performance of two popular de novo transcriptome assemblers, Trinity
and SOAPdenovo-Trans (SDNT), in terms of cost-efficiency and quality to assess limitations, and
provided troubleshooting and guidelines to run transcriptome assemblies efficiently.
Methods:
We built virtual machines with different hardware characteristics (CPU number, RAM
size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and
real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and
large data set assemblies.
Results:
For small data sets, SDNT outperformed Trinity by an order the magnitude, significantly
reducing the time duration and costs of the assembly. For large data sets, Trinity performed better
than SDNT. Both the assemblers provide good quality transcriptomes.
Conclusion:
The selection of the optimal transcriptome assembler and provision of computational
resources depend on the combined effect of size and complexity of RNA-seq experiments.
Collapse
Affiliation(s)
- Fernando Mora-Márquez
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - José Luis Vázquez-Poletti
- GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automatica, Facultad de Informatica, Universidad Complutense de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Víctor Chano
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Carmen Collada
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Álvaro Soto
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Unai López de Heredia
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| |
Collapse
|
195
|
Nielsen ES, Henriques R, Beger M, Toonen RJ, von der Heyden S. Multi-model seascape genomics identifies distinct environmental drivers of selection among sympatric marine species. BMC Evol Biol 2020; 20:121. [PMID: 32938400 PMCID: PMC7493327 DOI: 10.1186/s12862-020-01679-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 08/24/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND As global change and anthropogenic pressures continue to increase, conservation and management increasingly needs to consider species' potential to adapt to novel environmental conditions. Therefore, it is imperative to characterise the main selective forces acting on ecosystems, and how these may influence the evolutionary potential of populations and species. Using a multi-model seascape genomics approach, we compare putative environmental drivers of selection in three sympatric southern African marine invertebrates with contrasting ecology and life histories: Cape urchin (Parechinus angulosus), Common shore crab (Cyclograpsus punctatus), and Granular limpet (Scutellastra granularis). RESULTS Using pooled (Pool-seq), restriction-site associated DNA sequencing (RAD-seq), and seven outlier detection methods, we characterise genomic variation between populations along a strong biogeographical gradient. Of the three species, only S. granularis showed significant isolation-by-distance, and isolation-by-environment driven by sea surface temperatures (SST). In contrast, sea surface salinity (SSS) and range in air temperature correlated more strongly with genomic variation in C. punctatus and P. angulosus. Differences were also found in genomic structuring between the three species, with outlier loci contributing to two clusters in the East and West Coasts for S. granularis and P. angulosus, but not for C. punctatus. CONCLUSION The findings illustrate distinct evolutionary potential across species, suggesting that species-specific habitat requirements and responses to environmental stresses may be better predictors of evolutionary patterns than the strong environmental gradients within the region. We also found large discrepancies between outlier detection methodologies, and thus offer a novel multi-model approach to identifying the principal environmental selection forces acting on species. Overall, this work highlights how adding a comparative approach to seascape genomics (both with multiple models and species) can elucidate the intricate evolutionary responses of ecosystems to global change.
Collapse
Affiliation(s)
- Erica S Nielsen
- Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa
| | - Romina Henriques
- Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa.,Technical University of Denmark, National Institute of Aquatic Resources, Section for Marine Living Resources, Velsøvej 39, 8600, Silkeborg, Denmark
| | - Maria Beger
- School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Robert J Toonen
- Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Kāne'ohe, HI, 96744, USA
| | - Sophie von der Heyden
- Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa.
| |
Collapse
|
196
|
Kayama M, Chen JF, Nakada T, Nishimura Y, Shikanai T, Azuma T, Miyashita H, Takaichi S, Kashiyama Y, Kamikawa R. A non-photosynthetic green alga illuminates the reductive evolution of plastid electron transport systems. BMC Biol 2020; 18:126. [PMID: 32938439 PMCID: PMC7495860 DOI: 10.1186/s12915-020-00853-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/21/2020] [Indexed: 11/12/2022] Open
Abstract
Background Plastid electron transport systems are essential not only for photosynthesis but also for dissipating excess reducing power and sinking excess electrons generated by various redox reactions. Although numerous organisms with plastids have lost their photoautotrophic lifestyles, there is a spectrum of known functions of remnant plastids in non-photosynthetic algal/plant lineages; some of non-photosynthetic plastids still retain diverse metabolic pathways involving redox reactions while others, such as apicoplasts of apicomplexan parasites, possess highly reduced sets of functions. However, little is known about underlying mechanisms for redox homeostasis in functionally versatile non-photosynthetic plastids and thus about the reductive evolution of plastid electron transport systems. Results Here we demonstrated that the central component for plastid electron transport systems, plastoquinone/plastoquinol pool, is still retained in a novel strain of an obligate heterotrophic green alga lacking the photosynthesis-related thylakoid membrane complexes. Microscopic and genome analyses revealed that the Volvocales green alga, chlamydomonad sp. strain NrCl902, has non-photosynthetic plastids and a plastid DNA that carries no genes for the photosynthetic electron transport system. Transcriptome-based in silico prediction of the metabolic map followed by liquid chromatography analyses demonstrated carotenoid and plastoquinol synthesis, but no trace of chlorophyll pigments in the non-photosynthetic green alga. Transient RNA interference knockdown leads to suppression of plastoquinone/plastoquinol synthesis. The alga appears to possess genes for an electron sink system mediated by plastid terminal oxidase, plastoquinone/plastoquinol, and type II NADH dehydrogenase. Other non-photosynthetic algae/land plants also possess key genes for this system, suggesting a broad distribution of an electron sink system in non-photosynthetic plastids. Conclusion The plastoquinone/plastoquinol pool and thus the involved electron transport systems reported herein might be retained for redox homeostasis and might represent an intermediate step towards a more reduced set of the electron transport system in many non-photosynthetic plastids. Our findings illuminate a broadly distributed but previously hidden step of reductive evolution of plastid electron transport systems after the loss of photosynthesis.
Collapse
Affiliation(s)
- Motoki Kayama
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida nihonmatsu cho, Sakyo ku, Kyoto, Kyoto, 606-8501, Japan
| | - Jun-Feng Chen
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida nihonmatsu cho, Sakyo ku, Kyoto, Kyoto, 606-8501, Japan
| | - Takashi Nakada
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
| | | | | | - Tomonori Azuma
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida nihonmatsu cho, Sakyo ku, Kyoto, Kyoto, 606-8501, Japan
| | - Hideaki Miyashita
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida nihonmatsu cho, Sakyo ku, Kyoto, Kyoto, 606-8501, Japan
| | - Shinichi Takaichi
- Department of Molecular Microbiology, Tokyo University of Agriculture, Tokyo, Japan
| | - Yuichiro Kashiyama
- Graduate School of Engineering, Fukui University of Technology, Fukui, Japan
| | - Ryoma Kamikawa
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida nihonmatsu cho, Sakyo ku, Kyoto, Kyoto, 606-8501, Japan. .,Graduate School of Agriculture, Kyoto University, Kitashirakawa oiwake cho, Sakyo ku, Kyoto, Kyoto, 606-8502, Japan.
| |
Collapse
|
197
|
Wu Z, Liao R, Yang T, Dong X, Lan D, Qin R, Liu H. Analysis of six chloroplast genomes provides insight into the evolution of Chrysosplenium (Saxifragaceae). BMC Genomics 2020; 21:621. [PMID: 32912155 PMCID: PMC7488271 DOI: 10.1186/s12864-020-07045-4] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 09/01/2020] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Chrysosplenium L. (Saxifragaceae) is a genus of plants widely distributed in Northern Hemisphere and usually found in moist, shaded valleys and mountain slopes. This genus is ideal for studying plant adaptation to low light conditions. Although some progress has been made in the systematics and biogeography of Chrysosplenium, its chloroplast genome evolution remains to be investigated. RESULTS To fill this gap, we sequenced the chloroplast genomes of six Chrysosplenium species and analyzed their genome structure, GC content, and nucleotide diversity. Moreover, we performed a phylogenetic analysis and calculated non-synonymous (Ka) /synonymous (Ks) substitution ratios using the combined protein-coding genes of 29 species within Saxifragales and two additional species as outgroups, as well as a pair-wise estimation for each gene within Chrysosplenium. Compared with the outgroups in Saxifragaceae, the six Chrysosplenium chloroplast genomes had lower GC contents; they also had conserved boundary regions and gene contents, as only the rpl32 gene was lost in four of the Chrysosplenium chloroplast genomes. Phylogenetic analyses suggested that the Chrysosplenium separated to two major clades (the opposite group and the alternate group). The selection pressure estimation (Ka/Ks ratios) of genes in the Chrysosplenium species showed that matK and ycf2 were subjected to positive selection. CONCLUSION This study provides genetic resources for exploring the phylogeny of Chrysosplenium and sheds light on plant adaptation to low light conditions. The lower average GC content and the lacking gene of rpl32 indicated selective pressure in their unique habitats. Different from results previously reported, our selective pressure estimation suggested that the genes related to photosynthesis (such as ycf2) were under positive selection at sites in the coding region.
Collapse
Affiliation(s)
- Zhihua Wu
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China
| | - Rui Liao
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China
| | - Tiange Yang
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China
| | - Xiang Dong
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Deqing Lan
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China
| | - Rui Qin
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China
| | - Hong Liu
- Hubei Provincial Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of China, Key Laboratory of State Ethnic Affairs Commission for Biological Technology, College of Life Sciences, South-Central University for Nationalities, Wuhan, 430074, Hubei, China.
| |
Collapse
|
198
|
Ghosh A, Johnson MG, Osmanski AB, Louha S, Bayona-Vásquez NJ, Glenn TC, Gongora J, Green RE, Isberg S, Stevens RD, Ray DA. A High-Quality Reference Genome Assembly of the Saltwater Crocodile, Crocodylus porosus, Reveals Patterns of Selection in Crocodylidae. Genome Biol Evol 2020; 12:3635-3646. [PMID: 31821505 PMCID: PMC6946029 DOI: 10.1093/gbe/evz269] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2019] [Indexed: 12/14/2022] Open
Abstract
Crocodilians are an economically, culturally, and biologically important group. To improve researchers’ ability to study genome structure, evolution, and gene regulation in the clade, we generated a high-quality de novo genome assembly of the saltwater crocodile, Crocodylus porosus, from Illumina short read data from genomic libraries and in vitro proximity-ligation libraries. The assembled genome is 2,123.5 Mb, with N50 scaffold size of 17.7 Mb and N90 scaffold size of 3.8 Mb. We then annotated this new assembly, increasing the number of annotated genes by 74%. In total, 96% of 23,242 annotated genes were associated with a functional protein domain. Furthermore, multiple noncoding functional regions and mappable genetic markers were identified. Upon analysis and overlapping the results of branch length estimation and site selection tests for detecting potential selection, we found 16 putative genes under positive selection in crocodilians, 10 in C. porosus and 6 in Alligator mississippiensis. The annotated C. porosus genome will serve as an important platform for osmoregulatory, physiological, and sex determination studies, as well as an important reference in investigating the phylogenetic relationships of crocodilians, birds, and other tetrapods.
Collapse
Affiliation(s)
- Arnab Ghosh
- Department of Biological Sciences, Texas Tech University
| | | | | | - Swarnali Louha
- Department of Environmental Health Science and Institute of Bioinformatics, University of Georgia
| | - Natalia J Bayona-Vásquez
- Department of Environmental Health Science and Institute of Bioinformatics, University of Georgia
| | - Travis C Glenn
- Department of Environmental Health Science and Institute of Bioinformatics, University of Georgia
| | - Jaime Gongora
- Sydney School of Veterinary Science, University of Sydney, Australia
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz
| | - Sally Isberg
- Sydney School of Veterinary Science, University of Sydney, Australia.,Centre for Crocodile Research, University of Sydney and Charles Darwin University, Australia
| | | | - David A Ray
- Department of Biological Sciences, Texas Tech University
| |
Collapse
|
199
|
Draft Genome Sequence of Enterobacter kobei M4-VN, Isolated from Potatoes with Soft Rot Disease. Microbiol Resour Announc 2020; 9:9/36/e00908-20. [PMID: 32883798 PMCID: PMC7471391 DOI: 10.1128/mra.00908-20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Enterobacter kobei M4-VN, isolated from potatoes with soft rot disease in Vietnam, contains a total of 4,754,309 bp with 4,424 predicted coding sequences and a G+C content of 55.1%.
Collapse
|
200
|
He C, Lin G, Wei H, Tang H, White FF, Valent B, Liu S. Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences. NAR Genom Bioinform 2020; 2:lqaa075. [PMID: 33575622 PMCID: PMC7671381 DOI: 10.1093/nargab/lqaa075] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 08/02/2020] [Accepted: 09/01/2020] [Indexed: 12/25/2022] Open
Abstract
Genome sequences provide genomic maps with a single-base resolution for exploring genetic contents. Sequencing technologies, particularly long reads, have revolutionized genome assemblies for producing highly continuous genome sequences. However, current long-read sequencing technologies generate inaccurate reads that contain many errors. Some errors are retained in assembled sequences, which are typically not completely corrected by using either long reads or more accurate short reads. The issue commonly exists, but few tools are dedicated for computing error rates or determining error locations. In this study, we developed a novel approach, referred to as k-mer abundance difference (KAD), to compare the inferred copy number of each k-mer indicated by short reads and the observed copy number in the assembly. Simple KAD metrics enable to classify k-mers into categories that reflect the quality of the assembly. Specifically, the KAD method can be used to identify base errors and estimate the overall error rate. In addition, sequence insertion and deletion as well as sequence redundancy can also be detected. Collectively, KAD is valuable for quality evaluation of genome assemblies and, potentially, provides a diagnostic tool to aid in precise error correction. KAD software has been developed to facilitate public uses.
Collapse
Affiliation(s)
- Cheng He
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Guifang Lin
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Haibao Tang
- Center for Genomics and Biotechnology and Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fujian 350002, China
| | - Frank F White
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611-0680, USA
| | - Barbara Valent
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| |
Collapse
|